Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning

Chain-of-Thought (CoT) has substantially empowered Large Language Models (LLMs) to tackle complex reasoning tasks, yet the verbose nature of explicit reasoning steps incurs prohibitive inference la...

Qin-Wen Luo, Sheng Ren, Xiang Chen, Rui Liu, Jun Fang, Naiqiang Tan, Sheng-Jun Huang

2602.22642 2026-02-26
AI LLM

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated deci...

Zhiheng Song, Jingshuai Zhang, Chuan Qin, Chao Wang, Chao Chen, Longfei Xu, Kaikui Liu, Xiangxian...

2602.22638 2026-02-26
TESTING

Tackling Privacy Heterogeneity in Differentially Private Federated Learning

Differentially private federated learning (DP-FL) enables clients to collaboratively train machine learning models while preserving the privacy of their local data. However, most existing DP-FL app...

Ruichen Xu, Ying-Jun Angela Zhang, Jianwei Huang

2602.22633 2026-02-26
AI LLM

Fine-grained Semantics Integration for Large Language Model-based Recommendation

Recent advances in Large Language Models (LLMs) have shifted in recommendation systems from the discriminative paradigm to the LLM-based generative paradigm, where the recommender autoregressively ...

Jiawen Feng, Xiaoyu Kong, Leheng Sheng, Bin Wu, Chao Yi, Feifang Yang, Xiang-Rong Sheng, Han Zhu,...

2602.22632 2026-02-26
TESTING

TorchLean: Formalizing Neural Networks in Lean

Neural networks are increasingly deployed in safety- and mission-critical pipelines, yet many verification and analysis results are produced outside the programming environment that defines and run...

Robert Joseph George, Jennifer Cruden, Xiangru Zhong, Huan Zhang, Anima Anandkumar

2602.22631 2026-02-26
AI LLM

HyperKKL: Enabling Non-Autonomous State Estimation through Dynamic Weight Conditioning

This paper proposes HyperKKL, a novel learning approach for designing Kazantzis-Kravaris/Luenberger (KKL) observers for non-autonomous nonlinear systems. While KKL observers offer a rigorous theore...

Yahia Salaheldin Shaaban, Salem Lahlou, Abdelrahman Sayed Sayed

2602.22630 2026-02-26
TESTING

ContextRL: Enhancing MLLM's Knowledge Discovery Efficiency with Context-Augmented RL

We propose ContextRL, a novel framework that leverages context augmentation to overcome these bottlenecks. Specifically, to enhance Identifiability, we provide the reward model with full reference ...

Xingyu Lu, Jinpeng Wang, YiFan Zhang, Shijie Ma, Xiao Hu, Tianke Zhang, Haonan fan, Kaiyu Jiang, ...

2602.22623 2026-02-26
AI LLM

Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA

Large Language Models (LLMs) obey consistent scaling laws -- empirical power-law fits that predict how loss decreases with compute, data, and parameters. While predictive, these laws are descriptiv...

Hai Huang, Yann LeCun, Randall Balestriero

2602.22617 2026-02-26
AI LLM

Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery

Vision-language foundation models (VLFMs) promise zero-shot and retrieval understanding for Earth observation. While operational satellite systems often lack full multi-spectral coverage, making RG...

Minh Kha Do, Wei Xiang, Kang Han, Di Wu, Khoa Phan, Yi-Ping Phoebe Chen, Gaowen Liu, Ramana Rao K...

2602.22613 2026-02-26
TESTING

Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

In Embedding-as-an-Interface (EaaI) settings, pre-trained models are queried for Intermediate Representations (IRs). The distributional properties of IRs can leak training-set membership signals, e...

Jiayang Meng, Tao Huang, Chen Hou, Guolong Zheng, Hong Chen

2602.22611 2026-02-26
TESTING

EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning

Progress in hardware model checking depends critically on high-quality benchmarks. However, the community faces a significant benchmark gap: existing suites are limited in number, often distributed...

Guangyu Hu, Xiaofeng Zhou, Wei Zhang, Hongce Zhang

2602.22609 2026-02-26
AI LLM

CoLyricist: Enhancing Lyric Writing with AI through Workflow-Aligned Support

We propose CoLyricist, an AI-assisted lyric writing tool designed to support the typical workflows of experienced lyricists and enhance their creative efficiency. While lyricists have unique proces...

Masahiro Yoshida, Bingxuan Li, Songyan Zhao, Qinyi Zhou, Shiwei Hu, Xiang Anthony Chen, Nanyun Peng

2602.22606 2026-02-26
AI LLM

SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning

Long-running agentic tasks, such as deep research, require multi-hop reasoning over information distributed across multiple webpages and documents. In such tasks, the LLM context is dominated by to...

Sanjay Kariyappa, G. Edward Suh

2602.22603 2026-02-26
TESTING

Beyond Vintage Rotation: Bias-Free Sparse Representation Learning with Oracle Inference

Learning low-dimensional latent representations is a central topic in statistics and machine learning, and rotation methods have long been used to obtain sparse and interpretable representations. D...

Chengyu Cui, Yunxiao Chen, Jing Ouyang, Gongjun Xu

2602.22590 2026-02-26
TESTING

Towards Faithful Industrial RAG: A Reinforced Co-adaptation Framework for Advertising QA

Industrial advertising question answering (QA) is a high-stakes task in which hallucinated content, particularly fabricated URLs, can lead to financial loss, compliance violations, and legal risk. ...

Wenwei Li, Ming Xu, Tianle Xia, Lingxiang Hu, Yiding Sun, Linfang Shang, Liqun Liu, Peng Shu, Hua...

2602.22584 2026-02-26
TESTING

Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance

Example-based guidance is widely used to improve mathematical reasoning at inference time, yet its effectiveness is highly unstable across problems and models-even when the guidance is correct and ...

Weida Liang, Yiyou Sun, Shuyuan Nan, Chuang Li, Dawn Song, Kenji Kawaguchi

2602.22583 2026-02-26
TESTING

Metamorphic Testing of Vision-Language Action-Enabled Robots

Vision-Language-Action (VLA) models are multimodal robotic task controllers that, given an instruction and visual inputs, produce a sequence of low-level control actions (or motor commands) enablin...

Pablo Valle, Sergio Segura, Shaukat Ali, Aitor Arrieta

2602.22579 2026-02-26
TESTING

GIFSplat: Generative Prior-Guided Iterative Feed-Forward 3D Gaussian Splatting from Sparse Views

Feed-forward 3D reconstruction offers substantial runtime advantages over per-scene optimization, which remains slow at inference and often fragile under sparse views. However, existing feed-forwar...

Tianyu Chen, Wei Xiang, Kang Han, Yu Lu, Di Wu, Gaowen Liu, Ramana Rao Kompella

2602.22571 2026-02-26
TESTING

Operationalizing Fairness: Post-Hoc Threshold Optimization Under Hard Resource Limits

The deployment of machine learning in high-stakes domains requires a balance between predictive safety and algorithmic fairness. However, existing fairness interventions often as- sume unconstraine...

Moirangthem Tiken Singh, Amit Kalita, Sapam Jitu Singh

2602.22560 2026-02-26
TESTING

RepoMod-Bench: A Benchmark for Code Repository Modernization via Implementation-Agnostic Testing

The evolution of AI coding agents has shifted the frontier from simple snippet completion to autonomous repository-level engineering. However, evaluating these agents remains ill-posed in general c...

Xuefeng Li, Nir Ben-Israel, Yotam Raz, Belal Ahmed, Doron Serebro, Antoine Raux

2602.22518 2026-02-26