Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards

Large language models (LLMs) are increasingly deployed as multi-step decision-making agents, where effective reward design is essential for guiding learning. Although recent work explores various f...

Dengjia Zhang, Xiaoou Liu, Lu Cheng, Yaqing Wang, Kenton Murray, Hua Wei

2602.21158 2026-02-24
TESTING

Neural network optimization strategies and the topography of the loss landscape

Neural networks are trained by optimizing multi-dimensional sets of fitting parameters on non-convex loss landscapes. Low-loss regions of the landscapes correspond to the parameter sets that perfor...

Jianneng Yu, Alexandre V. Morozov

2602.21276 2026-02-24
TESTING

RAMSES-MCR: A consistent multi-group treatment of cosmic rays physics in momentum-space with the RAMSES code

Cosmic rays (CRs) are known to play a key role in many astrophysical environments: they can modify shock dynamics, influence the thermochemistry and the ionization of the interstellar medium, regul...

Nimatou-Seydi Diallo, Yohan Dubois, Alexandre Marcowith, Joki Rosdahl, Benoît Commerçon

2602.21147 2026-02-24
AI LLM

Scaling State-Space Models on Multiple GPUs with Tensor Parallelism

Selective state space models (SSMs) have rapidly become a compelling backbone for large language models, especially for long-context workloads. Yet in deployment, their inference performance is oft...

Anurag Dutt, Nimit Shah, Hazem Masarani, Anshul Gandhi

2602.21144 2026-02-24
AI LLM

A Benchmark for Deep Information Synthesis

Large language model (LLM)-based agents are increasingly used to solve complex tasks involving tool use, such as web browsing, code execution, and data analysis. However, current evaluation benchma...

Debjit Paul, Daniel Murphy, Milan Gritta, Ronald Cardenas, Victor Prokhorov, Lena Sophia Bolliger...

2602.21143 2026-02-24
AI LLM

ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments

As LLM deployments scale over more hardware, the probability of a single failure in a system increases significantly, and cloud operators must consider robust countermeasures to handle these inevit...

Haley Li, Xinglu Wang, Cong Feng, Chunxu Zuo, Yanan Wang, Hei Lo, Yufei Cui, Bingji Wang, Duo Cui...

2602.21140 2026-02-24
AI LLM

SparkMe: Adaptive Semi-Structured Interviewing for Qualitative Insight Discovery

Qualitative insights from user experiences are critical for informing product and policy decisions, but collecting such data at scale is constrained by the time and availability of experts to condu...

David Anugraha, Vishakh Padmakumar, Diyi Yang

2602.21136 2026-02-24
AI LLM

"Are You Sure?": An Empirical Study of Human Perception Vulnerability in LLM-Driven Agentic Systems

Large language model (LLM) agents are rapidly becoming trusted copilots in high-stakes domains like software development and healthcare. However, this deepening trust introduces a novel attack surf...

Xinfeng Li, Shenyu Dai, Kelong Zheng, Yue Xiao, Gelei Deng, Wei Dong, Xiaofeng Wang

2602.21127 2026-02-24
TESTING

Quantum Approximate Optimization for Decoding of Low-Density Parity-Check Codes

Decoding Low-Density Parity-Check (LDPC) codes is a fundamental problem in coding theory, and Belief Propagation (BP) is one of the most popular methods for LDPC code decoding. However, BP may enco...

Krishnakanta Barik, Goutam Paul

2602.21124 2026-02-24
TESTING

Scalar Lie point symmetries of the Standard Model with one or two real gauge singlets

We present a classification of all scalar Lie point symmetries of the Standard Model with one or two real gauge-singlet scalars (SM+S and SM+2S). By analyzing the associated field equations, we ide...

M. Aa. Solberg

2602.21122 2026-02-24
AI LLM

Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning s...

Sanket Badhe, Deep Shah

2602.21103 2026-02-24
AI LLM

Turning Semantics into Topology: LLM-Driven Attribute Augmentation for Collaborative Filtering

Large Language Models (LLMs) have shown great potential for enhancing recommender systems through their extensive world knowledge and reasoning capabilities. However, effectively translating these ...

Junjie Meng, Ranxu zhang, Wei Wu, Rui Zhang, Chuan Qin, Qi Zhang, Qi Liu, Hui Xiong, Chao Wang

2602.21099 2026-02-24
TESTING

Rapid Primary Radiation Damage Resistance Assessment of Precipitation-Hardened Cu Alloys

This study establishes a direct correlation between in situ irradiation-induced property changes measured by transient grating spectroscopy (TGS) and the resulting microstructural damage in Cu-Cr-T...

Elena Botica-Artalejo, Gregory Wallace, Michael P. Short

2602.21093 2026-02-24
AI LLM

Can Interest-Bearing Positions Solve the Long-Horizon Problem in Prediction Markets?

Prediction markets suffer from reduced liquidity and price accuracy for long-horizon events due to the opportunity cost of committed capital. Recently, major platforms have introduced interest-bear...

Caleb Maresca

2602.21091 2026-02-24
AI LLM

Beyond the Star Rating: A Scalable Framework for Aspect-Based Sentiment Analysis Using LLMs and Text Classification

Customer-provided reviews have become an important source of information for business owners and other customers alike. However, effectively analyzing millions of unstructured reviews remains chall...

Vishal Patil, Shree Vaishnavi Bacha, Revanth Yamani, Yidan Sun, Mayank Kejriwal

2602.21082 2026-02-24
TESTING

Elementary local representation densities at all primes via lifting recursions

Let $p$ be a prime and let $L$ be a quadratic $\mathbb{Z}_p$-lattice with quadratic form $Q$. For $t\neq 0$ the local representation density $α_p(t;L)$ is the stable normalised growth of the congru...

Samuel Griffiths

2602.21070 2026-02-24
TESTING

Detecting Where Effects Occur by Testing Hypotheses in Order

Experimental evaluations of public policies often randomize a new intervention within many sites or blocks. After a report of an overall result -- statistically significant or not -- the natural qu...

Jake Bowers, David Kim, Nuole Chen

2602.21068 2026-02-24
TESTING

The no-hair theorems at work in the tidal disruption event AT2020afhd

Recently, the coprecession of both the accretion disk and the jet formed following the tidal disruption event associated with the optical transient AT2020afhd, driven by a supermassive black hole o...

Lorenzo Iorio

2602.21065 2026-02-24
AI LLM

Tool Building as a Path to "Superintelligence"

The Diligent Learner framework suggests LLMs can achieve superintelligence via test-time search, provided a sufficient step-success probability $γ$. In this work, we design a benchmark to measure $...

David Koplow, Tomer Galanti, Tomaso Poggio

2602.21061 2026-02-24
AI LLM

An Expert Schema for Evaluating Large Language Model Errors in Scholarly Question-Answering Systems

Large Language Models (LLMs) are transforming scholarly tasks like search and summarization, but their reliability remains uncertain. Current evaluation metrics for testing LLM reliability are prim...

Anna Martin-Boyle, William Humphreys, Martha Brown, Cara Leckey, Harmanpreet Kaur

2602.21059 2026-02-24