Papers
Research papers from arXiv and related sources
SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards
Large language models (LLMs) are increasingly deployed as multi-step decision-making agents, where effective reward design is essential for guiding learning. Although recent work explores various f...
Dengjia Zhang, Xiaoou Liu, Lu Cheng, Yaqing Wang, Kenton Murray, Hua Wei
Neural network optimization strategies and the topography of the loss landscape
Neural networks are trained by optimizing multi-dimensional sets of fitting parameters on non-convex loss landscapes. Low-loss regions of the landscapes correspond to the parameter sets that perfor...
Jianneng Yu, Alexandre V. Morozov
RAMSES-MCR: A consistent multi-group treatment of cosmic rays physics in momentum-space with the RAMSES code
Cosmic rays (CRs) are known to play a key role in many astrophysical environments: they can modify shock dynamics, influence the thermochemistry and the ionization of the interstellar medium, regul...
Nimatou-Seydi Diallo, Yohan Dubois, Alexandre Marcowith, Joki Rosdahl, Benoît Commerçon
Scaling State-Space Models on Multiple GPUs with Tensor Parallelism
Selective state space models (SSMs) have rapidly become a compelling backbone for large language models, especially for long-context workloads. Yet in deployment, their inference performance is oft...
Anurag Dutt, Nimit Shah, Hazem Masarani, Anshul Gandhi
A Benchmark for Deep Information Synthesis
Large language model (LLM)-based agents are increasingly used to solve complex tasks involving tool use, such as web browsing, code execution, and data analysis. However, current evaluation benchma...
Debjit Paul, Daniel Murphy, Milan Gritta, Ronald Cardenas, Victor Prokhorov, Lena Sophia Bolliger...
ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments
As LLM deployments scale over more hardware, the probability of a single failure in a system increases significantly, and cloud operators must consider robust countermeasures to handle these inevit...
Haley Li, Xinglu Wang, Cong Feng, Chunxu Zuo, Yanan Wang, Hei Lo, Yufei Cui, Bingji Wang, Duo Cui...
SparkMe: Adaptive Semi-Structured Interviewing for Qualitative Insight Discovery
Qualitative insights from user experiences are critical for informing product and policy decisions, but collecting such data at scale is constrained by the time and availability of experts to condu...
David Anugraha, Vishakh Padmakumar, Diyi Yang
"Are You Sure?": An Empirical Study of Human Perception Vulnerability in LLM-Driven Agentic Systems
Large language model (LLM) agents are rapidly becoming trusted copilots in high-stakes domains like software development and healthcare. However, this deepening trust introduces a novel attack surf...
Xinfeng Li, Shenyu Dai, Kelong Zheng, Yue Xiao, Gelei Deng, Wei Dong, Xiaofeng Wang
Quantum Approximate Optimization for Decoding of Low-Density Parity-Check Codes
Decoding Low-Density Parity-Check (LDPC) codes is a fundamental problem in coding theory, and Belief Propagation (BP) is one of the most popular methods for LDPC code decoding. However, BP may enco...
Krishnakanta Barik, Goutam Paul
Scalar Lie point symmetries of the Standard Model with one or two real gauge singlets
We present a classification of all scalar Lie point symmetries of the Standard Model with one or two real gauge-singlet scalars (SM+S and SM+2S). By analyzing the associated field equations, we ide...
M. Aa. Solberg
Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning
Advanced reasoning typically requires Chain-of-Thought prompting, which is accurate but incurs prohibitive latency and substantial test-time inference costs. The standard alternative, fine-tuning s...
Sanket Badhe, Deep Shah
Turning Semantics into Topology: LLM-Driven Attribute Augmentation for Collaborative Filtering
Large Language Models (LLMs) have shown great potential for enhancing recommender systems through their extensive world knowledge and reasoning capabilities. However, effectively translating these ...
Junjie Meng, Ranxu zhang, Wei Wu, Rui Zhang, Chuan Qin, Qi Zhang, Qi Liu, Hui Xiong, Chao Wang
Rapid Primary Radiation Damage Resistance Assessment of Precipitation-Hardened Cu Alloys
This study establishes a direct correlation between in situ irradiation-induced property changes measured by transient grating spectroscopy (TGS) and the resulting microstructural damage in Cu-Cr-T...
Elena Botica-Artalejo, Gregory Wallace, Michael P. Short
Can Interest-Bearing Positions Solve the Long-Horizon Problem in Prediction Markets?
Prediction markets suffer from reduced liquidity and price accuracy for long-horizon events due to the opportunity cost of committed capital. Recently, major platforms have introduced interest-bear...
Caleb Maresca
Beyond the Star Rating: A Scalable Framework for Aspect-Based Sentiment Analysis Using LLMs and Text Classification
Customer-provided reviews have become an important source of information for business owners and other customers alike. However, effectively analyzing millions of unstructured reviews remains chall...
Vishal Patil, Shree Vaishnavi Bacha, Revanth Yamani, Yidan Sun, Mayank Kejriwal
Elementary local representation densities at all primes via lifting recursions
Let $p$ be a prime and let $L$ be a quadratic $\mathbb{Z}_p$-lattice with quadratic form $Q$. For $t\neq 0$ the local representation density $α_p(t;L)$ is the stable normalised growth of the congru...
Samuel Griffiths
Detecting Where Effects Occur by Testing Hypotheses in Order
Experimental evaluations of public policies often randomize a new intervention within many sites or blocks. After a report of an overall result -- statistically significant or not -- the natural qu...
Jake Bowers, David Kim, Nuole Chen
The no-hair theorems at work in the tidal disruption event AT2020afhd
Recently, the coprecession of both the accretion disk and the jet formed following the tidal disruption event associated with the optical transient AT2020afhd, driven by a supermassive black hole o...
Lorenzo Iorio
Tool Building as a Path to "Superintelligence"
The Diligent Learner framework suggests LLMs can achieve superintelligence via test-time search, provided a sufficient step-success probability $γ$. In this work, we design a benchmark to measure $...
David Koplow, Tomer Galanti, Tomaso Poggio
An Expert Schema for Evaluating Large Language Model Errors in Scholarly Question-Answering Systems
Large Language Models (LLMs) are transforming scholarly tasks like search and summarization, but their reliability remains uncertain. Current evaluation metrics for testing LLM reliability are prim...
Anna Martin-Boyle, William Humphreys, Martha Brown, Cara Leckey, Harmanpreet Kaur