Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

A Blockchain-based Traceability System for AI-Driven Engine Blade Inspection

Aircraft engine blade maintenance relies on inspection records shared across manufacturers, airlines, maintenance organizations, and regulators. Yet current systems are fragmented, difficult to aud...

Mahmoud Hafez, Eman Ouda, Mohammed A. Mohammed Eltoum, Khaled Salah, Yusra Abdulrahman

2603.08288 2026-03-09
AI LLM

LAMUS: A Large-Scale Corpus for Legal Argument Mining from U.S. Caselaw using LLMs

Legal argument mining aims to identify and classify the functional components of judicial reasoning, such as facts, issues, rules, analysis, and conclusions. Progress in this area is limited by the...

Serene Wang, Lavanya Pobbathi, Haihua Chen

2603.08286 2026-03-09
AI LLM

Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

As AI-assisted grant proposals outpace manual review capacity in a kind of ``Malthusian trap'' for the research ecosystem, this paper investigates the capabilities and limitations of LLM-based gran...

William Thorne, Joseph James, Yang Wang, Chenghua Lin, Diana Maynard

2603.08281 2026-03-09
AI LLM

AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

With the widespread adoption of Large Language Models (LLMs), respecting indigenous cultures becomes essential for models' culturally safety and responsible global applications. Existing studies se...

Hankun Kang, Di Lin, Zhirong Liao, Pengfei Bai, Xinyi Zeng, Jiawei Jiang, Yuanyuan Zhu, Tieyun Qian

2603.08275 2026-03-09
AI LLM

How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms

How much do large language models actually hallucinate when answering questions grounded in provided documents? Despite the critical importance of this question for enterprise AI deployments, relia...

JV Roig

2603.08274 2026-03-09
AI LLM

Towards a more efficient bias detection in financial language models

Bias in financial language models constitutes a major obstacle to their adoption in real-world applications. Detecting such bias is challenging, as it requires identifying inputs whose predictions ...

Firas Hadj Kacem, Ahmed Khanfir, Mike Papadakis

2603.08267 2026-03-09
AI LLM

FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use

The integration of Large Language Models (LLMs) into the financial domain is driving a paradigm shift from passive information retrieval to dynamic, agentic interaction. While general-purpose tool ...

Jiaxuan Lu, Kong Wang, Yemin Wang, Qingmei Tang, Hongwei Zeng, Xiang Chen, Jiahao Pi, Shujian Den...

2603.08262 2026-03-09
AI LLM

Seed2Scale: A Self-Evolving Data Engine for Embodied AI via Small to Large Model Synergy and Multimodal Evaluation

Existing data generation methods suffer from exploration limits, embodiment gaps, and low signal-to-noise ratios, leading to performance degradation during self-iteration. To address these challeng...

Cong Tai, Zhaoyu Zheng, Haixu Long, Hansheng Wu, Zhengbin Long, Haodong Xiang, Rong Shi, Zhuo Cui...

2603.08260 2026-03-09
AI LLM

NCL-UoR at SemEval-2026 Task 5: Embedding-Based Methods, Fine-Tuning, and LLMs for Word Sense Plausibility Rating

Word sense plausibility rating requires predicting the human-perceived plausibility of a given word sense on a 1--5 scale in the context of short narrative stories containing ambiguous homonyms. Th...

Tong Wu, Thanet Markchom, Huizhi Liang

2603.08256 2026-03-09
AI LLM

Not All Queries Need Deep Thought: CoFiCot for Adaptive Coarse-to-fine Stateful Refinement

Scaling test-time computation enhances LLM reasoning ability but faces a uniform computation paradox. Allocating identical resources leads to over-correction on simple tasks and insufficient refine...

Dongxu Zhang, Hongqiang Lin, Yiding Sun, Pengyu Wang, Qirui Wang, Ning Yang, Jihua Zhu

2603.08251 2026-03-09
AI LLM

Sensivity of LLMs' Explanations to the Training Randomness:Context, Class & Task Dependencies

Transformer models are now a cornerstone in natural language processing. Yet, explaining their decisions remains a challenge. It was shown recently that the same model trained on the same data with...

Romain Loncour, Jérémie Bogaert, François-Xavier Standaert

2603.08241 2026-03-09
AI LLM

Fibration Policy Optimization

Large language models are increasingly trained as heterogeneous systems spanning multiple domains, expert partitions, and agentic pipelines, yet prevalent proximal objectives operate at a single sc...

Chang Li, Tshihao Tsu, Yaren Zhang, Chao Xue, Xiaodong He

2603.08239 2026-03-09
AI LLM

The Struggle Between Continuation and Refusal: A Mechanistic Analysis of the Continuation-Triggered Jailbreak in LLMs

With the rapid advancement of large language models (LLMs), the safety of LLMs has become a critical concern. Despite significant efforts in safety alignment, current LLMs remain vulnerable to jail...

Yonghong Deng, Zhen Yang, Ping Jian, Xinyue Zhang, Zhongbin Guo, Chengzhi Li

2603.08234 2026-03-09
AI LLM

Computationally Efficient Data-Driven Topology Design Independent from High-Infoentropy Initial Dataset

Topology optimization (TO) has been widely adopted in engineering design; however, it is prone to being trapped in local optima, particularly in strongly nonlinear problems. Sensitivity-free data-d...

Jun Yang, Ziliang Wang, Shintaro Yamasaki

2603.08233 2026-03-09
AI LLM

SplitAgent: A Privacy-Preserving Distributed Architecture for Enterprise-Cloud Agent Collaboration

Enterprise adoption of cloud-based AI agents faces a fundamental privacy dilemma: leveraging powerful cloud models requires sharing sensitive data, while local processing limits capability. Current...

Jianshu She

2603.08221 2026-03-09
AI LLM

DualTurn: Learning Turn-Taking from Dual-Channel Generative Speech Pretraining

Speech-to-speech models handle turn-taking naturally but offer limited support for tool-calling or complex reasoning, while production ASR-LLM-TTS voice pipelines offer these capabilities but rely ...

Shangeth Rajaa

2603.08216 2026-03-09
AI LLM

Human-AI Collaboration for Scaling Agile Regression Testing: An Agentic-AI Teammate from Manual to Automated Testing

Agile organizations increasingly rely on automated regression testing to sustain rapid, high-quality software delivery. However, as systems grow and requirements evolve, a persistent bottleneck ari...

Moustapha El Outmani, Manthan Venkataramana Shenoy, Ahmad Hatahet, Andreas Rausch, Tim Niklas Kni...

2603.08190 2026-03-09
AI LLM

SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization

Post-training quantization (PTQ) has emerged as a prevailing technique for deploying large language models (LLMs) efficiently in terms of both memory and computation, across edge devices and server...

Yeonsik Park, Hyeonseong Kim, Seungkyu Choi

2603.08185 2026-03-09
AI LLM

TildeOpen LLM: Leveraging Curriculum Learning to Achieve Equitable Language Representation

Large language models often underperform in many European languages due to the dominance of English and a few high-resource languages in training data. This paper presents TildeOpen LLM, a 30-billi...

Toms Bergmanis, Martins Kronis, Ingus Jānis Pretkalniņš, Dāvis Nicmanis, Jeļizaveta Jeļinska, Rob...

2603.08182 2026-03-09
AI LLM

AutoAdapt: An Automated Domain Adaptation Framework for LLMs

Large language models (LLMs) excel in open domains but struggle in specialized settings with limited data and evolving knowledge. Existing domain adaptation practices rely heavily on manual trial-a...

Sidharth Sinha, Anson Bastos, Xuchao Zhang, Akshay Nambi, Chetan Bansal, Saravan Rajmohan

2603.08181 2026-03-09