Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization

Large Language Model (LLM)-driven Multi-Agent Systems (MAS) have demonstrated strong capability in complex reasoning and tool use, and heterogeneous agent pools further broaden the quality--cost tr...

Xudong Wang, Chaoning Zhang, Jiaquan Zhang, Chenghao Li, Qigan Sun, Sung-Ho Bae, Peng Wang, Ning ...

2603.12933 2026-03-13
AI LLM

DS$^2$-Instruct: Domain-Specific Data Synthesis for Large Language Models Instruction Tuning

Adapting Large Language Models (LLMs) to specialized domains requires high-quality instruction tuning datasets, which are expensive to create through human annotation. Existing data synthesis metho...

Ruiyao Xu, Noelle I. Samia, Han Liu

2603.12932 2026-03-13
AI LLM

Teaching Agile Requirements Engineering: A Stakeholder Simulation with Generative AI

Context: The active involvement of users and customers in agile software development remains a persistent challenge in practice. For this reason, it is important that students in higher education b...

Eva-Maria Schön, Michael Neumann, Tiago Silva da Silva

2603.12925 2026-03-13
AI LLM

Human-Centered Evaluation of an LLM-Based Process Modeling Copilot: A Mixed-Methods Study with Domain Experts

Integrating Large Language Models (LLMs) into business process management tools promises to democratize Business Process Model and Notation (BPMN) modeling for non-experts. While automated framewor...

Chantale Lauer, Peter Pfeiffer, Nijat Mehdiyev

2603.12895 2026-03-13
AI LLM

Enhanced Drug-drug Interaction Prediction Using Adaptive Knowledge Integration

Drug-drug interaction event (DDIE) prediction is crucial for preventing adverse reactions and ensuring optimal therapeutic outcomes. However, existing methods often face challenges with imbalanced ...

Pengfei Liu, Jun Tao, Zhixiang Ren

2603.12885 2026-03-13
AI LLM

Explainable AI Using Inherently Interpretable Components for Wearable-based Health Monitoring

The use of wearables in medicine and wellness, enabled by AI-based models, offers tremendous potential for real-time monitoring and interpretable event detection. Explainable AI (XAI) is required t...

Maurice Kuschel, Solveig Vieluf, Claus Reinsberger, Tobias Loddenkemper, Tanuj Hasija

2603.12880 2026-03-13
AI LLM

Test-time RL alignment exposes task familiarity artifacts in LLM benchmarks

Direct evaluation of LLMs on benchmarks can be misleading because comparatively strong performance may reflect task familiarity rather than capability. The train-before-test approach controls for t...

Kun Wang, Reinhard Heckel

2603.12875 2026-03-13
AI LLM

CLARIN-PT-LDB: An Open LLM Leaderboard for Portuguese to assess Language, Culture and Civility

This paper reports on the development of a leaderboard of Open Large Language Models (LLM) for European Portuguese (PT-PT), and on its associated benchmarks. This leaderboard comes as a way to addr...

João Silva, Luís Gomes, António Branco

2603.12872 2026-03-13
AI LLM

Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking

Nowadays, service providers often deploy multiple types of LLM services within shared clusters. While the service colocation improves resource utilization, it introduces significant interference ri...

Zizhao Mo, Junlin Chen, Huanle Xu, Chengzhong Xu

2603.12831 2026-03-13
AI LLM

From AI Weather Prediction to Infrastructure Resilience: A Correction-Downscaling Framework for Tropical Cyclone Impacts

This paper addresses a missing capability in infrastructure resilience: turning fast, global AI weather forecasts into asset-scale, actionable risk. We introduce the AI-based Correction-Downscaling...

You Wu, Zhenguo Wang, Naiyu Wang

2603.12828 2026-03-13
AI LLM

Context is all you need: Towards autonomous model-based process design using agentic AI in flowsheet simulations

Agentic AI systems integrating large language models (LLMs) with reasoning and tooluse capabilities are transforming various domains - in particular, software development. In contrast, their applic...

Pascal Schäfer, Lukas J. Krinke, Martin Wlotzka, Norbert Asprion

2603.12813 2026-03-13
AI LLM

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

A recent cutting-edge topic in multimodal modeling is to unify visual comprehension and generation within a single model. However, the two tasks demand mismatched decoding regimes and visual repres...

Yichen Zhang, Da Peng, Zonghao Guo, Zijian Zhang, Xuesong Yang, Tong Sun, Shichu Sun, Yidan Zhang...

2603.12793 2026-03-13
AI LLM

The RIGID Framework: Research-Integrated, Generative AI-Mediated Instructional Design

Instructional Design (ID) often faces challenges in incorporating research-based knowledge and pedagogical best practices. Although educational researchers and government agencies emphasize groundi...

Yerin Kwak, Zachary A. Pardos

2603.12781 2026-03-13
AI LLM

SectEval: Evaluating the Latent Sectarian Preferences of Large Language Models

As Large Language Models (LLMs) becomes a popular source for religious knowledge, it is important to know if it treats different groups fairly. This study is the first to measure how LLMs handle th...

Aditya Maheshwari, Amit Gajkeshwar, Kaushal Sharma, Vivek Patel

2603.12768 2026-03-13
AI LLM

AI Model Modulation with Logits Redistribution

Large-scale models are typically adapted to meet the diverse requirements of model owners and users. However, maintaining multiple specialized versions of the model is inefficient. In response, we ...

Zihan Wang, Zhongkui Ma, Xinguo Feng, Zhiyang Mei, Ethan Ma, Derui Wang, Minhui Xue, Guangdong Bai

2603.12755 2026-03-13
AI LLM

Taming the Long Tail: Efficient Item-wise Sharpness-Aware Minimization for LLM-based Recommender Systems

Large Language Model-based Recommender Systems (LRSs) have recently emerged as a new paradigm in sequential recommendation by directly adopting LLMs as backbones. While LRSs demonstrate strong know...

Jiaming Zhang, Yuyuan Li, Xiaohua Feng, Li Zhang, Longfei Li, Jun Zhou, Chaochao Chen

2603.12752 2026-03-13
AI LLM

TaoBench: Do Automated Theorem Prover LLMs Generalize Beyond MathLib?

Automated theorem proving (ATP) benchmarks largely consist of problems formalized in MathLib, so current ATP training and evaluation are heavily biased toward MathLib's definitional framework. Howe...

Alexander K Taylor, Junyi Zhang, Ethan Ji, Vigyan Sahai, Haikang Deng, Yuanzhou Chen, Yifan Yuan,...

2603.12744 2026-03-13
AI LLM

What You Prompt is What You Get: Increasing Transparency of Prompting Using Prompt Cards

The rapid advancement and impressive capabilities of large language models (LLMs) have given rise to the field of prompt engineering, the practice of crafting inputs to guide LLMs toward high-quali...

Amandine M. Caut, Beimnet Zenebe, Amy Rouillard, David J. T. Sumpter

2603.12741 2026-03-13
AI LLM

ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning

Large Language Model (LLM) agents are increasingly applied to complex, multi-step tasks that require interaction with diverse external tools across various domains. However, current LLM agent tool ...

Shuo Yang, Soyeon Caren Han, Yihao Ding, Shuhe Wang, Eduard Hoy

2603.12740 2026-03-13
AI LLM

Altered Thoughts, Altered Actions: Probing Chain-of-Thought Vulnerabilities in VLA Robotic Manipulation

Recent Vision-Language-Action (VLA) models increasingly adopt chain-of-thought (CoT) reasoning, generating a natural-language plan before decoding motor commands. This internal text channel between...

Tuan Duong Trinh, Naveed Akhtar, Basim Azam

2603.12717 2026-03-13