Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Momentum Measurement of Charged Particles in FASER's Emulsion Detector at the LHC

We present a momentum measurement method based on multiple Coulomb scattering (MCS) in the FASER$ν$ emulsion detector. The measurement of charged-particle momenta is essential for studying neutrino...

FASER Collaboration, Roshan Mammen Abraham, Xiaocong Ai, Saul Alonso Monsalve, John Anders, Emma...

2602.17575 2026-02-19
AI LLM

A Hybrid Federated Learning Based Ensemble Approach for Lung Disease Diagnosis Leveraging Fusion of SWIN Transformer and CNN

The significant advancements in computational power cre- ate a vast opportunity for using Artificial Intelligence in different ap- plications of healthcare and medical science. A Hybrid FL-Enabled ...

Asif Hasan Chowdhury, Md. Fahim Islam, M Ragib Anjum Riad, Faiyaz Bin Hashem, Md Tanzim Reza, Md....

2602.17566 2026-02-19
AI LLM

ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment

Activation steering, or representation engineering, offers a lightweight approach to align large language models (LLMs) by manipulating their internal activations at inference time. However, curren...

Hongjue Zhao, Haosen Sun, Jiangtao Kong, Xiaochang Li, Qineng Wang, Liwei Jiang, Qi Zhu, Tarek Ab...

2602.17560 2026-02-19
AI LLM

A Theoretical Framework for Modular Learning of Robust Generative Models

Training large-scale generative models is resource-intensive and relies heavily on heuristic dataset weighting. We address two fundamental questions: Can we train Large Language Models (LLMs) modul...

Corinna Cortes, Mehryar Mohri, Yutao Zhong

2602.17554 2026-02-19
AI LLM

MASPO: Unifying Gradient Utilization, Probability Mass, and Signal Reliability for Robust and Sample-Efficient LLM Reasoning

Existing Reinforcement Learning with Verifiable Rewards (RLVR) algorithms, such as GRPO, rely on rigid, uniform, and symmetric trust region mechanisms that are fundamentally misaligned with the com...

Xiaoliang Fu, Jiaye Lin, Yangyi Fang, Binbin Zheng, Chaowen Hu, Zekai Shao, Cong Qin, Lu Pan, Ke ...

2602.17550 2026-02-19
AI LLM

KLong: Training LLM Agent for Extremely Long-horizon Tasks

This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via p...

Yue Liu, Zhiyuan Hu, Flood Sung, Jiaheng Zhang, Bryan Hooi

2602.17547 2026-02-19
AI LLM

Evaluating Chain-of-Thought Reasoning through Reusability and Verifiability

In multi-agent IR pipelines for tasks such as search and ranking, LLM-based agents exchange intermediate reasoning in terms of Chain-of-Thought (CoT) with each other. Current CoT evaluation narrowl...

Shashank Aggarwal, Ram Vikas Mishra, Amit Awekar

2602.17544 2026-02-19
AI LLM

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

Fine-grained skill representations, commonly referred to as knowledge components (KCs), are fundamental to many approaches in student modeling and learning analytics. However, KC-level correctness ...

Zhangqi Duan, Arnav Kankaria, Dhruv Kartik, Andrew Lan

2602.17542 2026-02-19
AI LLM

Toward a Fully Autonomous, AI-Native Particle Accelerator

This position paper presents a vision for self-driving particle accelerators that operate autonomously with minimal human intervention. We propose that future facilities be designed through artific...

Chris Tennant

2602.17536 2026-02-19
TESTING

LATA: Laplacian-Assisted Transductive Adaptation for Conformal Uncertainty in Medical VLMs

Medical vision-language models (VLMs) are strong zero-shot recognizers for medical imaging, but their reliability under domain shift hinges on calibrated uncertainty with guarantees. Split conforma...

Behzad Bozorgtabar, Dwarikanath Mahapatra, Sudipta Roy, Muzammal Naseer, Imran Razzak, Zongyuan Ge

2602.17535 2026-02-19
TESTING

Systematic Evaluation of Single-Cell Foundation Model Interpretability Reveals Attention Captures Co-Expression Rather Than Unique Regulatory Signal

We present a systematic evaluation framework - thirty-seven analyses, 153 statistical tests, four cell types, two perturbation modalities - for assessing mechanistic interpretability in single-cell...

Ihor Kendiukhov

2602.17532 2026-02-19
TESTING

Provably Explaining Neural Additive Models

Despite significant progress in post-hoc explanation methods for neural networks, many remain heuristic and lack provable guarantees. A key approach for obtaining explanations with provable guarant...

Shahaf Bassan, Yizhak Yisrael Elboher, Tobias Ladner, Volkan Şahin, Jan Kretinsky, Matthias Altho...

2602.17530 2026-02-19
AI LLM

Enhancing Large Language Models (LLMs) for Telecom using Dynamic Knowledge Graphs and Explainable Retrieval-Augmented Generation

Large language models (LLMs) have shown strong potential across a variety of tasks, but their application in the telecom field remains challenging due to domain complexity, evolving standards, and ...

Dun Yuan, Hao Zhou, Xue Liu, Hao Chen, Yan Xin, Jianzhong, Zhang

2602.17529 2026-02-19
TESTING

The Anxiety of Influence: Bloom Filters in Transformer Attention Heads

Some transformer attention heads appear to function as membership testers, dedicating themselves to answering the question "has this token appeared before in the context?" We identify these heads a...

Peter Balogh

2602.17526 2026-02-19
TESTING

Inspiral tests of general relativity and waveform geometry

The phase evolution of gravitational waves encodes critical information about the orbital dynamics of binary systems. In this work, we test the robustness of parameterized tests against unmodeled d...

Brian C. Seymour, Jacob Golomb, Yanbei Chen

2602.17524 2026-02-19
AI LLM

When Models Ignore Definitions: Measuring Semantic Override Hallucinations in LLM Reasoning

Large language models (LLMs) demonstrate strong performance on standard digital logic and Boolean reasoning tasks, yet their reliability under locally redefined semantics remains poorly understood....

Yogeswar Reddy Thota, Setareh Rafatirad, Homayoun Houman, Tooraj Nikoubin

2602.17520 2026-02-19
TESTING

Dodging the Moose: Experimental Insights in Real-Life Automated Collision Avoidance

The sudden appearance of a static obstacle on the road, i.e. the moose test, is a well-known emergency scenario in collision avoidance for automated driving. Model Predictive Control (MPC) has long...

Leila Gharavi, Simone Baldi, Yuki Hosomi, Tona Sato, Bart De Schutter, Binh-Minh Nguyen, Hiroshi ...

2602.17512 2026-02-19
AI LLM

Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems

This work presents a practical benchmarking framework for optimizing artificial intelligence (AI) models on ARM Cortex processors (M0+, M4, M7), focusing on energy efficiency, accuracy, and resourc...

Pranay Jain, Maximilian Kasper, Göran Köber, Axel Plinge, Dominik Seuß

2602.17508 2026-02-19
TESTING

Proximal powered knee placement: a case study

Lower limb amputation affects millions worldwide, leading to impaired mobility, reduced walking speed, and limited participation in daily and social activities. Powered prosthetic knees can partial...

Kyle R. Embry, Lorenzo Vianello, Jim Lipsey, Frank Ursetta, Michael Stephens, Zhi Wang, Ann M. Si...

2602.17502 2026-02-19
AI LLM

Retrospective In-Context Learning for Temporal Credit Assignment with Large Language Models

Learning from self-sampled data and sparse environmental feedback remains a fundamental challenge in training self-evolving agents. Temporal credit assignment mitigates this issue by transforming s...

Wen-Tse Chen, Jiayu Chen, Fahim Tajwar, Hao Zhu, Xintong Duan, Ruslan Salakhutdinov, Jeff Schneider

2602.17497 2026-02-19