Papers
Research papers from arXiv and related sources
Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization
Large Language Model (LLM)-driven Multi-Agent Systems (MAS) have demonstrated strong capability in complex reasoning and tool use, and heterogeneous agent pools further broaden the quality--cost tr...
Xudong Wang, Chaoning Zhang, Jiaquan Zhang, Chenghao Li, Qigan Sun, Sung-Ho Bae, Peng Wang, Ning ...
DS$^2$-Instruct: Domain-Specific Data Synthesis for Large Language Models Instruction Tuning
Adapting Large Language Models (LLMs) to specialized domains requires high-quality instruction tuning datasets, which are expensive to create through human annotation. Existing data synthesis metho...
Ruiyao Xu, Noelle I. Samia, Han Liu
Teaching Agile Requirements Engineering: A Stakeholder Simulation with Generative AI
Context: The active involvement of users and customers in agile software development remains a persistent challenge in practice. For this reason, it is important that students in higher education b...
Eva-Maria Schön, Michael Neumann, Tiago Silva da Silva
Human-Centered Evaluation of an LLM-Based Process Modeling Copilot: A Mixed-Methods Study with Domain Experts
Integrating Large Language Models (LLMs) into business process management tools promises to democratize Business Process Model and Notation (BPMN) modeling for non-experts. While automated framewor...
Chantale Lauer, Peter Pfeiffer, Nijat Mehdiyev
Enhanced Drug-drug Interaction Prediction Using Adaptive Knowledge Integration
Drug-drug interaction event (DDIE) prediction is crucial for preventing adverse reactions and ensuring optimal therapeutic outcomes. However, existing methods often face challenges with imbalanced ...
Pengfei Liu, Jun Tao, Zhixiang Ren
Explainable AI Using Inherently Interpretable Components for Wearable-based Health Monitoring
The use of wearables in medicine and wellness, enabled by AI-based models, offers tremendous potential for real-time monitoring and interpretable event detection. Explainable AI (XAI) is required t...
Maurice Kuschel, Solveig Vieluf, Claus Reinsberger, Tobias Loddenkemper, Tanuj Hasija
Test-time RL alignment exposes task familiarity artifacts in LLM benchmarks
Direct evaluation of LLMs on benchmarks can be misleading because comparatively strong performance may reflect task familiarity rather than capability. The train-before-test approach controls for t...
Kun Wang, Reinhard Heckel
CLARIN-PT-LDB: An Open LLM Leaderboard for Portuguese to assess Language, Culture and Civility
This paper reports on the development of a leaderboard of Open Large Language Models (LLM) for European Portuguese (PT-PT), and on its associated benchmarks. This leaderboard comes as a way to addr...
João Silva, Luís Gomes, António Branco
Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking
Nowadays, service providers often deploy multiple types of LLM services within shared clusters. While the service colocation improves resource utilization, it introduces significant interference ri...
Zizhao Mo, Junlin Chen, Huanle Xu, Chengzhong Xu
From AI Weather Prediction to Infrastructure Resilience: A Correction-Downscaling Framework for Tropical Cyclone Impacts
This paper addresses a missing capability in infrastructure resilience: turning fast, global AI weather forecasts into asset-scale, actionable risk. We introduce the AI-based Correction-Downscaling...
You Wu, Zhenguo Wang, Naiyu Wang
Context is all you need: Towards autonomous model-based process design using agentic AI in flowsheet simulations
Agentic AI systems integrating large language models (LLMs) with reasoning and tooluse capabilities are transforming various domains - in particular, software development. In contrast, their applic...
Pascal Schäfer, Lukas J. Krinke, Martin Wlotzka, Norbert Asprion
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
A recent cutting-edge topic in multimodal modeling is to unify visual comprehension and generation within a single model. However, the two tasks demand mismatched decoding regimes and visual repres...
Yichen Zhang, Da Peng, Zonghao Guo, Zijian Zhang, Xuesong Yang, Tong Sun, Shichu Sun, Yidan Zhang...
The RIGID Framework: Research-Integrated, Generative AI-Mediated Instructional Design
Instructional Design (ID) often faces challenges in incorporating research-based knowledge and pedagogical best practices. Although educational researchers and government agencies emphasize groundi...
Yerin Kwak, Zachary A. Pardos
SectEval: Evaluating the Latent Sectarian Preferences of Large Language Models
As Large Language Models (LLMs) becomes a popular source for religious knowledge, it is important to know if it treats different groups fairly. This study is the first to measure how LLMs handle th...
Aditya Maheshwari, Amit Gajkeshwar, Kaushal Sharma, Vivek Patel
AI Model Modulation with Logits Redistribution
Large-scale models are typically adapted to meet the diverse requirements of model owners and users. However, maintaining multiple specialized versions of the model is inefficient. In response, we ...
Zihan Wang, Zhongkui Ma, Xinguo Feng, Zhiyang Mei, Ethan Ma, Derui Wang, Minhui Xue, Guangdong Bai
Taming the Long Tail: Efficient Item-wise Sharpness-Aware Minimization for LLM-based Recommender Systems
Large Language Model-based Recommender Systems (LRSs) have recently emerged as a new paradigm in sequential recommendation by directly adopting LLMs as backbones. While LRSs demonstrate strong know...
Jiaming Zhang, Yuyuan Li, Xiaohua Feng, Li Zhang, Longfei Li, Jun Zhou, Chaochao Chen
TaoBench: Do Automated Theorem Prover LLMs Generalize Beyond MathLib?
Automated theorem proving (ATP) benchmarks largely consist of problems formalized in MathLib, so current ATP training and evaluation are heavily biased toward MathLib's definitional framework. Howe...
Alexander K Taylor, Junyi Zhang, Ethan Ji, Vigyan Sahai, Haikang Deng, Yuanzhou Chen, Yifan Yuan,...
What You Prompt is What You Get: Increasing Transparency of Prompting Using Prompt Cards
The rapid advancement and impressive capabilities of large language models (LLMs) have given rise to the field of prompt engineering, the practice of crafting inputs to guide LLMs toward high-quali...
Amandine M. Caut, Beimnet Zenebe, Amy Rouillard, David J. T. Sumpter
ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning
Large Language Model (LLM) agents are increasingly applied to complex, multi-step tasks that require interaction with diverse external tools across various domains. However, current LLM agent tool ...
Shuo Yang, Soyeon Caren Han, Yihao Ding, Shuhe Wang, Eduard Hoy
Altered Thoughts, Altered Actions: Probing Chain-of-Thought Vulnerabilities in VLA Robotic Manipulation
Recent Vision-Language-Action (VLA) models increasingly adopt chain-of-thought (CoT) reasoning, generating a natural-language plan before decoding motor commands. This internal text channel between...
Tuan Duong Trinh, Naveed Akhtar, Basim Azam