Papers
Research papers from arXiv and related sources
Try, Check and Retry: A Divide-and-Conquer Framework for Boosting Long-context Tool-Calling Performance of LLMs
Tool-calling empowers Large Language Models (LLMs) to interact with external environments. However, current methods often struggle to handle massive and noisy candidate tools in long-context tool-c...
Kunfeng Chen, Qihuang Zhong, Juhua Liu, Bo Du, Dacheng Tao
PRMB: Benchmarking Reward Models in Long-Horizon CBT-based Counseling Dialogue
Large language models (LLMs) hold potential for mental healthcare applications, particularly in cognitive behavioral therapy (CBT)-based counseling, where reward models play a critical role in alig...
Yougen Zhou, Qin Chen, Ningning Zhou, Jie Zhou, Liang He
SPEGC: Continual Test-Time Adaptation via Semantic-Prompt-Enhanced Graph Clustering for Medical Image Segmentation
In medical image segmentation tasks, the domain gap caused by the difference in data collection between training and testing data seriously hinders the deployment of pre-trained models in clinical ...
Xiaogang Du, Jiawei Zhang, Tongfei Liu, Tao Lei, Yingbo Wang
AutoVeriFix+: High-Correctness RTL Generation via Trace-Aware Causal Fix and Semantic Redundancy Pruning
Large language models (LLMs) have demonstrated impressive capabilities in generating software code for high-level programming languages such as Python and C++. However, their application to hardwar...
Yan Tan, Xiangchen Meng, Zijun Jiang, Yangdi Lyu
Quantized Inference for OneRec-V2
Quantized inference has demonstrated substantial system-level benefits in large language models while preserving model quality. In contrast, reliably applying low-precision quantization to recommen...
Yi Su, Xinchen Luo, Hongtao Cheng, Ziteng Shu, Yunfeng Zhao, Fangyu Zhang, Jiaqiang Liu, Xiao Lia...
INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs
Despite rapid progress, Video Large Language Models (Video-LLMs) remain unreliable due to hallucinations, which are outputs that contradict either video evidence (faithfulness) or verifiable world ...
Junqi Yang, Yuecong Min, Jie Zhang, Shiguang Shan, Xilin Chen
Grammar of the Wave: Towards Explainable Multivariate Time Series Event Detection via Neuro-Symbolic VLM Agents
Time Series Event Detection (TSED) has long been an important task with critical applications across many high-stakes domains. Unlike statistical anomalies, events are defined by semantics with com...
Sky Chenwei Wan, Tianjun Hou, Yifei Wang, Xiqing Chang, Aymeric Jan
Graph Generation Methods under Partial Information
We study the problem of generating graphs with prescribed degree sequences for bipartite, directed, and undirected networks. We first propose a sequential method for bipartite graph generation and ...
Tong Sun, Jianshu Hao, Michael C. Fu, Guangxin Jiang
Leveraging Phytolith Research using Artificial Intelligence
Phytolith analysis is a crucial tool for reconstructing past vegetation and human activities, but traditional methods are severely limited by labour-intensive, time-consuming manual microscopy. To ...
Andrés G. Mejía Ramón, Kate Dudgeon, Nina Witteveen, Dolores Piperno, Michael Kloster, Luigi Palo...
Deep Learning Network-Temporal Models For Traffic Prediction
Time series analysis is critical for emerging net- work intelligent control and management functions. However, existing statistical-based and shallow machine learning models have shown limited pred...
Yufeng Xin, Ethan Fan
Stochastic Optimization and Coupling
We study optimization problems in which a linear functional is maximized over probability measures that are dominated by a given measure according to an integral stochastic order in an arbitrary di...
Frank Yang, Kai Hao Yang
Verified Multi-Agent Orchestration: A Plan-Execute-Verify-Replan Framework for Complex Query Resolution
We present Verified Multi-Agent Orchestration (VMAO), a framework that coordinates specialized LLM-based agents through a verification-driven iterative loop. Given a complex query, our system decom...
Xing Zhang, Yanwei Cui, Guanghui Wang, Qucy Wei Qiu, Ziyuan Li, Fangwei Han, Yajing Huang, Hengzh...
NCCLbpf: Verified, Composable Policy Execution for GPU Collective Communication
NCCL is the de facto standard for collective GPU communication in large-scale distributed training, relying heavily on plugins to customize runtime behavior. However, these plugins execute as unver...
Yusheng Zheng
ZTab: Domain-based Zero-shot Annotation for Table Columns
This study addresses the challenge of automatically detecting semantic column types in relational tables, a key task in many real-world applications. Zero-shot modeling eliminates the need for user...
Ehsan Hoseinzade, Ke Wang
Grounding Robot Generalization in Training Data via Retrieval-Augmented VLMs
Recent work on robot manipulation has advanced policy generalization to novel scenarios. However, it is often difficult to characterize how different evaluation settings actually represent generali...
Jensen Gao, Dorsa Sadigh, Sandy Huang, Dhruv Shah
Zero-Shot Cross-City Generalization in End-to-End Autonomous Driving: Self-Supervised versus Supervised Representations
End-to-end autonomous driving models are typically trained on multi-city datasets using supervised ImageNet-pretrained backbones, yet their ability to generalize to unseen cities remains largely un...
Fatemeh Naeinian, Ali Hamza, Haoran Zhu, Anna Choromanska
Evaluation format, not model capability, drives triage failure in the assessment of consumer health AI
Ramaswamy et al. reported in \textit{Nature Medicine} that ChatGPT Health under-triages 51.6\% of emergencies, concluding that consumer-facing AI triage poses safety risks. However, their evaluatio...
David Fraile Navarro, Farah Magrabi, Enrico Coiera
Reproducible Synthetic Clinical Letters for Seizure Frequency Information Extraction
Seizure-frequency information is important for epilepsy research and clinical care, but it is usually recorded in variable free-text clinic letters that are hard to annotate and share. We developed...
Yujian Gan, Stephen H. Barlow, Ben Holgate, Joe Davies, James T. Teo, Joel S. Winston, Mark P. Ri...
Vision-Based Hand Shadowing for Robotic Manipulation via Inverse Kinematics
Teleoperation of low-cost robotic manipulators remains challenging due to the complexity of mapping human hand articulations to robot joint commands. We present an offline hand-shadowing and retarg...
Hendrik Chiche, Antoine Jamme, Trevor Rigoberto Martinez
Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol
Autonomous agents, especially delegated systems with memory, persistent context, and multi-step planning, pose a measurement problem not present in stateless models: an agent that preserves continu...
Christopher Altman