Papers
Research papers from arXiv and related sources
Evolving Medical Imaging Agents via Experience-driven Self-skill Discovery
Clinical image interpretation is inherently multi-step and tool-centric: clinicians iteratively combine visual evidence with patient context, quantify findings, and refine their decisions through a...
Lin Fan, Pengyu Dai, Zhipeng Deng, Haolin Wang, Xun Gong, Yefeng Zheng, Yafei Ou
How Well Do Current Speech Deepfake Detection Methods Generalize to the Real World?
Recent advances in speech synthesis and voice conversion have greatly improved the naturalness and authenticity of generated audio. Meanwhile, evolving encoding, compression, and transmission mecha...
Daixian Li, Jun Xue, Yanzhen Ren, Zhuolin Yi, Yihuan Huang, Guanxiang Feng, Yi Chai
The Values of Value in AI Adoption: Rethinking Efficiency in UX Designers' Workplaces
Although organizations increasingly position AI adoption as a pathway to competitiveness and innovation, organizations' perspectives on productivity and efficiency often clash with workers' perspec...
Inha Cha, Catherine Wieczorek, Richmond Y. Wong
Evaluating LLM Alignment With Human Trust Models
Trust plays a pivotal role in enabling effective cooperation, reducing uncertainty, and guiding decision-making in both human interactions and multi-agent systems. Although it is significant, there...
Anushka Debnath, Stephen Cranefield, Bastin Tony Roy Savarimuthu, Emiliano Lorini
Lexara: A User-Centered Toolkit for Evaluating Large Language Models for Conversational Visual Analytics
Large Language Models (LLMs) are transforming Conversational Visual Analytics (CVA) by enabling data analysis through natural language. However, evaluating LLMs for CVA remains a challenge: requiri...
Srishti Palani, Vidya Setlur
Knowledge-driven Reasoning for Mobile Agentic AI: Concepts, Approaches, and Directions
Mobile agentic AI is extending autonomous capabilities to resource-constrained platforms such as edge robots and unmanned aerial vehicles (UAVs), where strict size, weight, power, and cost (SWAP-C)...
Guangyuan Liu, Changyuan Zhao, Yinqiu Liu, Dusit Niyato, Biplab Sikdar
Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls
Test-time adaptation enables large language models (LLMs) to modify their behavior at inference without updating model parameters. A common approach is many-shot prompting, where large numbers of i...
Shubhangi Upasani, Chen Wu, Jay Rainton, Bo Li, Changran Hu, Qizheng Zhang, Urmish Thakker
HART: Data-Driven Hallucination Attribution and Evidence-Based Tracing for Large Language Models
Large language models (LLMs) have demonstrated remarkable performance in text generation and knowledge-intensive question answering. Nevertheless, they are prone to producing hallucinated content, ...
Shize Liang, Hongzhi Wang
Self-Auditing Parameter-Efficient Fine-Tuning for Few-Shot 3D Medical Image Segmentation
Adapting foundation models to new clinical sites remains challenging in practice. Domain shift and scarce annotations must be handled by experts, yet many clinical groups do not have ready access t...
Son Thai Ly, Hien V. Nguyen
POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation
Efficient and stable training of large language models (LLMs) remains a core challenge in modern machine learning systems. To address this challenge, Reparameterized Orthogonal Equivalence Training...
Zeju Qiu, Lixin Liu, Adrian Weller, Han Shi, Weiyang Liu
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
Large language models sometimes produce false or misleading responses. Two approaches to this problem are honesty elicitation -- modifying prompts or weights so that the model answers truthfully --...
Helena Casademunt, Bartosz Cywiński, Khoi Tran, Arya Jakkli, Samuel Marks, Neel Nanda
cuRoboV2: Dynamics-Aware Motion Generation with Depth-Fused Distance Fields for High-DoF Robots
Effective robot autonomy requires motion generation that is safe, feasible, and reactive. Current methods are fragmented: fast planners output physically unexecutable trajectories, reactive control...
Balakumar Sundaralingam, Adithyavairavan Murali, Stan Birchfield
NL2GDS: LLM-aided interface for Open Source Chip Design
The growing complexity of hardware design and the widening gap between high-level specifications and register-transfer level (RTL) implementation hinder rapid prototyping and system design. We intr...
Max Eland, Jeyan Thiyagalingam, Dinesh Pamunuwa, Roshan Weerasekera
Observing and Controlling Features in Vision-Language-Action Models
Vision-Language-Action Models (VLAs) have shown remarkable progress towards embodied intelligence. While their architecture partially resembles that of Large Language Models (LLMs), VLAs exhibit hi...
Hugo Buurmeijer, Carmen Amo Alonso, Aiden Swann, Marco Pavone
Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation
As AI models progress beyond simple chatbots into more complex workflows, we draw ever closer to the event horizon beyond which AI systems will be utilized in autonomous, self-maintaining feedback ...
Benjamin Feuer, Lucas Rosenblatt, Oussama Elachqar
Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval
Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs). To enhance trust, natural language claims from diverse sources, including human-written te...
Artem Vazhentsev, Maria Marina, Daniil Moskovskiy, Sergey Pletenev, Mikhail Seleznyov, Mikhail Sa...
Kraus Constrained Sequence Learning For Quantum Trajectories from Continuous Measurement
Real-time reconstruction of conditional quantum states from continuous measurement records is a fundamental requirement for quantum feedback control, yet standard stochastic master equation (SME) s...
Priyanshi Singh, Krishna Bhatia
FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling
Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for large language models and long-context applications. While FlashAttention-3 optimized attention for Hopp...
Ted Zadouri, Markus Hoehnerbach, Jay Shah, Timmy Liu, Vijay Thakkar, Tri Dao
Distributed Partial Information Puzzles: Examining Common Ground Construction Under Epistemic Asymmetry
Establishing common ground, a shared set of beliefs and mutually recognized facts, is fundamental to collaboration, yet remains a challenge for current AI systems, especially in multimodal, multipa...
Yifan Zhu, Mariah Bradford, Kenneth Lai, Timothy Obiso, Videep Venkatesha, James Pustejovsky, Nik...
SAIL: Similarity-Aware Guidance and Inter-Caption Augmentation-based Learning for Weakly-Supervised Dense Video Captioning
Weakly-Supervised Dense Video Captioning aims to localize and describe events in videos trained only on caption annotations, without temporal boundaries. Prior work introduced an implicit supervisi...
Ye-Chan Kim, SeungJu Cha, Si-Woo Kim, Minju Jeon, Hyungee Kim, Dong-Jin Kim