Papers
Research papers from arXiv and related sources
Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA
Large Language Models (LLMs) obey consistent scaling laws -- empirical power-law fits that predict how loss decreases with compute, data, and parameters. While predictive, these laws are descriptiv...
Hai Huang, Yann LeCun, Randall Balestriero
Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery
Vision-language foundation models (VLFMs) promise zero-shot and retrieval understanding for Earth observation. While operational satellite systems often lack full multi-spectral coverage, making RG...
Minh Kha Do, Wei Xiang, Kang Han, Di Wu, Khoa Phan, Yi-Ping Phoebe Chen, Gaowen Liu, Ramana Rao K...
CoLyricist: Enhancing Lyric Writing with AI through Workflow-Aligned Support
We propose CoLyricist, an AI-assisted lyric writing tool designed to support the typical workflows of experienced lyricists and enhance their creative efficiency. While lyricists have unique proces...
Masahiro Yoshida, Bingxuan Li, Songyan Zhao, Qinyi Zhou, Shiwei Hu, Xiang Anthony Chen, Nanyun Peng
SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning
Long-running agentic tasks, such as deep research, require multi-hop reasoning over information distributed across multiple webpages and documents. In such tasks, the LLM context is dominated by to...
Sanjay Kariyappa, G. Edward Suh
Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets
The reliability of multilingual Large Language Model (LLM) evaluation is currently compromised by the inconsistent quality of translated benchmarks. Existing resources often suffer from semantic dr...
Hanna Yukhymenko, Anton Alexandrov, Martin Vechev
Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes
Advances in Generative AI (GenAI) have led to the development of various protection strategies to prevent the unauthorized use of images. These methods rely on adding imperceptible protective pertu...
Xavier Pleimling, Sifat Muhammad Abdullah, Gunjan Balde, Peng Gao, Mainack Mondal, Murtuza Jadliw...
Reimagining Data Work: Participatory Annotation Workshops as Feminist Practice
AI systems depend on the invisible and undervalued labor of data workers, who are often treated as interchangeable units rather than collaborators with meaningful expertise. Critical scholars and p...
Yujia Gao, Isadora Araujo Cruxên, Helena Suárez Val, Alessandra Jungs de Almeida, Catherine D'Ign...
Codesigning Ripplet: an LLM-Assisted Assessment Authoring System Grounded in a Conceptual Model of Teachers' Workflows
Assessments are critical in education, but creating them can be difficult. To address this challenge in a grounded way, we partnered with 13 teachers in a seven-month codesign process. We developed...
Yuan Cui, Annabel Goldman, Jovy Zhou, Xiaolin Liu, Clarissa Shieh, Joshua Yao, Mia Cheng, Matthew...
A Taxonomy of Human--MLLM Interaction in Early-Stage Sketch-Based Design Ideation
As multimodal large language models (MLLMs) are increasingly integrated into early-stage design tools, it is important to understand how designers collaborate with AI during ideation. In a user stu...
Weiayn Shi, Kenny Tsu Wei Choo
LLMTailor: A Layer-wise Tailoring Tool for Efficient Checkpointing of Large Language Models
Checkpointing is essential for fault tolerance in training large language models (LLMs). However, existing methods, regardless of their I/O strategies, periodically store the entire model and optim...
Minqiu Sun, Xin Huang, Luanzheng Guo, Nathan R. Tallent, Kento Sato, Dong Dai
Dynamic Personality Adaptation in Large Language Models via State Machines
The inability of Large Language Models (LLMs) to modulate their personality expression in response to evolving dialogue dynamics hinders their performance in complex, interactive contexts. We propo...
Leon Pielage, Ole Hätscher, Mitja Back, Bernhard Marschall, Benjamin Risse
Provable Last-Iterate Convergence for Multi-Objective Safe LLM Alignment via Optimistic Primal-Dual
Reinforcement Learning from Human Feedback (RLHF) plays a significant role in aligning Large Language Models (LLMs) with human preferences. While RLHF with expected reward constraints can be formul...
Yining Li, Peizhong Ju, Ness Shroff
When AI Writes, Whose Voice Remains? Quantifying Cultural Marker Erasure Across World English Varieties in Large Language Models
Large Language Models (LLMs) are increasingly used to ``professionalize'' workplace communication, often at the cost of linguistic identity. We introduce "Cultural Ghosting", the systematic erasure...
Satyam Kumar Navneet, Joydeep Chandra, Yong Zhang
WeaveTime: Stream from Earlier Frames into Emergent Memory in VideoLLMs
Recent advances in Multimodal Large Language Models have greatly improved visual understanding and reasoning, yet their quadratic attention and offline training protocols make them ill-suited for s...
Yulin Zhang, Cheng Shi, Sibei Yang
Secure Semantic Communications via AI Defenses: Fundamentals, Solutions, and Future Directions
Semantic communication (SemCom) redefines wireless communication from reproducing symbols to transmitting task-relevant semantics. However, this AI-native architecture also introduces new vulnerabi...
Lan Zhang, Chengsi Liang, Zeming Zhuang, Yao Sun, Fang Fang, Xiaoyong Yuan, Dusit Niyato
IndicIFEval: A Benchmark for Verifiable Instruction-Following Evaluation in 14 Indic Languages
Instruction-following benchmarks remain predominantly English-centric, leaving a critical evaluation gap for the hundreds of millions of Indic language speakers. We introduce IndicIFEval, a benchma...
Thanmay Jayakumar, Mohammed Safi Ur Rahman Khan, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan
Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference
Large Language Models (LLMs) have revolutionized inference across diverse natural language tasks, with larger models performing better but at higher computational costs. We propose a confidence-dri...
Bo-Wei Chen, Chung-Chi Chen, An-Zi Yen
Transmission Delay Minimization for NOMA-Based F-RANs
A novel non-orthogonal multiple access (NOMA) based low-delay service framework is proposed for fog radio access networks (F-RANs). Fog access points (FAPs) leverage NOMA for local delivery of cach...
Yuan Ai, Xidong Mu, Pengbo Si, Yuanwei Liu
ViSTAR: Virtual Skill Training with Augmented Reality with 3D Avatars and LLM coaching agent
We present ViSTAR, a Virtual Skill Training system in AR that supports self-guided basketball skill practice, with feedback on balance, posture, and timing. From a formative study with basketball p...
Chunggi Lee, Hayato Saiki, Tica Lin, Eiji Ikeda, Kenji Suzuki, Chen Zhu-Tian, Hanspeter Pfister
Understanding Artificial Theory of Mind: Perturbed Tasks and Reasoning in Large Language Models
Theory of Mind (ToM) refers to an agent's ability to model the internal states of others. Contributing to the debate whether large language models (LLMs) exhibit genuine ToM capabilities, our study...
Christian Nickel, Laura Schrewe, Florian Mai, Lucie Flek