Papers
Research papers from arXiv and related sources
Practicing with Language Models Cultivates Human Empathic Communication
Empathy is central to human connection, yet people often struggle to express it effectively. In blinded evaluations, large language models (LLMs) generate responses that are often judged more empat...
Aakriti Kumar, Nalin Poungpeth, Diyi Yang, Bruce Lambert, Matthew Groh
A proof-of-concept for automated AI-driven stellarator coil optimization with in-the-loop finite-element calculations
Finding feasible coils for stellarator fusion devices is a critical challenge of realizing this concept for future power plants. Years of research work can be put into the design of even a single r...
Alan A. Kaptanoglu, Pedro F. Gil
Why the Valuable Capabilities of LLMs Are Precisely the Unexplainable Ones
This paper proposes and argues for a counterintuitive thesis: the truly valuable capabilities of large language models (LLMs) reside precisely in the part that cannot be fully captured by human-rea...
Quan Cheng
Multi-turn Physics-informed Vision-language Model for Physics-grounded Anomaly Detection
Vision-Language Models (VLMs) demonstrate strong general-purpose reasoning but remain limited in physics-grounded anomaly detection, where causal understanding of dynamics is essential. Existing VL...
Yao Gu, Xiaohao Xu, Yingna Wu
Bidirectional Chinese and English Passive Sentences Dataset for Machine Translation
Machine Translation (MT) evaluation has gone beyond metrics, towards more specific linguistic phenomena. Regarding English-Chinese language pairs, passive sentences are constructed and distributed ...
Xinyue Ma, Pol Pastells, Mireia Farrús, Mariona Taulé
SCAN: Sparse Circuit Anchor Interpretable Neuron for Lifelong Knowledge Editing
Large Language Models (LLMs) often suffer from catastrophic forgetting and collapse during sequential knowledge editing. This vulnerability stems from the prevailing dense editing paradigm, which t...
Yuhuan Liu, Haitian Zhong, Xinyuan Xia, Qiang Liu, Shu Wu, Liang Wang
LMetric: Simple is Better - Multiplication May Be All You Need for LLM Request Scheduling
High-quality LLM request scheduling requires achieving two key objectives: whether the routed instance has KV$ to accelerate the request execution and whether the workload is balanced across instan...
Dingyan Zhang, Jinbo Han, Kaixi Zhang, Xingda Wei, Sijie Shen, Chenguang Fang, Wenyuan Yu, Jingre...
The Hrunting of AI: Where and How to Improve English Dialectal Fairness
It is known that large language models (LLMs) underperform in English dialects, and that improving them is difficult due to data scarcity. In this work we investigate how quality and availability i...
Wei Li, Adrian de Wynter
CATFormer: When Continual Learning Meets Spiking Transformers With Dynamic Thresholds
Although deep neural networks perform extremely well in controlled environments, they fail in real-world scenarios where data isn't available all at once, and the model must adapt to a new data dis...
Vaishnavi Nagabhushana, Kartikay Agrawal, Ayon Borthakur
Token Coherence: Adapting MESI Cache Protocols to Minimize Synchronization Overhead in Multi-Agent LLM Systems
Multi-agent LLM orchestration incurs synchronization costs scaling as O(n x S x |D|) in agents, steps, and artifact size under naive broadcast -- a regime I term broadcast-induced triply-multiplica...
Vladyslav Parakhin
ForceVLA2: Unleashing Hybrid Force-Position Control with Force Awareness for Contact-Rich Manipulation
Embodied intelligence for contact-rich manipulation has predominantly relied on position control, while explicit awareness and regulation of interaction forces remain under-explored, limiting stabi...
Yang Li, Zhaxizhuoma, Hongru Jiang, Junjie Xia, Hongquan Zhang, Jinda Du, Yunsong Zhou, Jia Zeng...
HindSight: Evaluating Research Idea Generation via Future Impact
Evaluating AI-generated research ideas typically relies on LLM judges or human panels -- both subjective and disconnected from actual research impact. We introduce \hs{}, a time-split evaluation fr...
Bo Jiang
To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
Large Language Models (LLMs) have shown strong potential for code generation, yet they remain limited in private-library-oriented code generation, where the goal is to generate code using APIs from...
Yitong Zhang, Chengze Li, Ruize Chen, Guowei Yang, Xiaoran Jia, Yijie Ren, Jia Li
TextOVSR: Text-Guided Real-World Opera Video Super-Resolution
Many classic opera videos exhibit poor visual quality due to the limitations of early filming equipment and long-term degradation during storage. Although real-world video super-resolution (RWVSR) ...
Hua Chang, Xin Xu, Wei Liu, Jiayi Wu, Kui Jiang, Fei Ma, Qi Tian
Confusion-Aware In-Context-Learning for Vision-Language Models in Robotic Manipulation
Vision-language models (VLMs) have significantly improved the generalization capabilities of robotic manipulation. However, VLM-based systems often suffer from a lack of robustness, leading to unpr...
Yayun He, Zuheng Kang, Botao Zhao, Zhouyin Wu, Junqing Peng, Jianzong Wang
A Data-Driven Regional Model for Skillful Medium-Range Typhoon Prediction
Accurate prediction of tropical cyclones remains a major challenge for both numerical weather prediction and emerging artificial intelligence weather prediction systems. While recent global AI mode...
Zeyi Niu, Wei Huang, Sirong Huang, Zhuo Wang, Mu Mu, Mengqi Yang, Xinhai Han, Haofei Sun, Zhaoyan...
From Storage to Steering: Memory Control Flow Attacks on LLM Agents
Modern agentic systems allow Large Language Model (LLM) agents to tackle complex tasks through extensive tool usage, forming structured control flows of tool selection and execution. Existing secur...
Zhenlin Xu, Xiaogang Zhu, Yu Yao, Minhui Xue, Yiliao Song
Establishing Construct Validity in LLM Capability Benchmarks Requires Nomological Networks
Recent work in machine learning increasingly attributes human-like capabilities such as reasoning or theory of mind to large language models (LLMs) on the basis of benchmark performance. This paper...
Timo Freiesleben
Generation of Programming Exam Question and Answer Using ChatGPT Based on Prompt Engineering
In computer science, students are encouraged to learn various programming languages such as Python, C++, and Java, equipping them with a broad range of technical skills and problem-solving capabili...
Jongwook Si, Sungyoung Kim
Synergizing a Decentralized Framework with LLM-Assisted Skill and Willingness-Aware Task Assignment for Volunteer Crowdsourcing
Volunteer crowdsourcing or VCS platforms increasingly support education, healthcare, disaster response, and smart city applications, yet assigning volunteers to complex tasks remains challenging du...
Riya Samanta, Rituparna Bhattyacharya