Personal Assistant Web

AI LLM

Automatic End-to-End Data Integration using Large Language Models

Designing data integration pipelines typically requires substantial manual effort from data engineers to configure pipeline components and label training data. While LLMs have shown promise in hand...

Aaron Steiner, Christian Bizer

2603.10547 • 2026-03-11

View PDF

AI LLM

Prompting with the human-touch: evaluating model-sensitivity of foundation models for musculoskeletal CT segmentation

Promptable Foundation Models (FMs), initially introduced for natural image segmentation, have also revolutionized medical image segmentation. The increasing number of models, along with evaluations...

Caroline Magg, Maaike A. ter Wee, Johannes G. G. Dobbe, Geert J. Streekstra, Leendert Blankevoort...

2603.10541 • 2026-03-11

View PDF

AI LLM

Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning

Reinforcement learning significantly enhances LLM capabilities but suffers from a critical issue: length inflation, where models adopt verbosity or inefficient reasoning to maximize rewards. Prior ...

Zichao Li, Jie Lou, Fangchen Dong, Zhiyuan Fan, Mengjie Ren, Hongyu Lin, Xianpei Han, Debing Zhan...

2603.10535 • 2026-03-11

View PDF

AI LLM

AILS-NTUA at SemEval-2026 Task 8: Evaluating Multi-Turn RAG Conversations

We present the AILS-NTUA system for SemEval-2026 Task 8 (MTRAGEval), addressing all three subtasks of multi-turn retrieval-augmented generation: passage retrieval (A), reference-grounded response g...

Dimosthenis Athanasiou, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, Giorgos St...

2603.10524 • 2026-03-11

View PDF

AI LLM

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs

Instruction hierarchy (IH) defines how LLMs prioritize system, developer, user, and tool instructions under conflict, providing a concrete, trust-ordered policy for resolving instruction conflicts....

Chuan Guo, Juan Felipe Ceron Uribe, Sicheng Zhu, Christopher A. Choquette-Choo, Steph Lin, Nikhil...

2603.10521 • 2026-03-11

View PDF

AI LLM

Resource-constrained Amazons chess decision framework integrating large language models and graph attention

Artificial intelligence has advanced significantly through the development of intelligent game-playing systems, providing rigorous testbeds for decision-making, strategic planning, and adaptive lea...

Tianhao Qian, Zhuoxuan Li, Jinde Cao, Xinli Shi, Hanjie Liu, Leszek Rutkowski

2603.10512 • 2026-03-11

View PDF

AI LLM

Safe and Scalable Web Agent Learning via Recreated Websites

Training autonomous web agents is fundamentally limited by the environments they learn from: real-world websites are unsafe to explore, hard to reset, and rarely provide verifiable feedback. We pro...

Hyungjoo Chae, Jungsoo Park, Alan Ritter

2603.10505 • 2026-03-11

View PDF

AI LLM

Naïve Exposure of Generative AI Capabilities Undermines Deepfake Detection

Generative AI systems increasingly expose powerful reasoning and image refinement capabilities through user-facing chatbot interfaces. In this work, we show that the naïve exposure of such capabili...

Sunpill Kim, Chanwoo Hwang, Minsu Kim, Jae Hong Seo

2603.10504 • 2026-03-11

View PDF

AI LLM

Efficiency vs Demand in AI Electricity: Implications for Post-AGI Scaling

As AI capabilities and deployment accelerate toward a post-AGI era, concerns are growing about electricity demand and carbon emissions from AI computing, yet it is rarely represented explicitly in ...

Doyi Kim, Jiseok Ahn, Haewon McJeon, Changick Kim

2603.10498 • 2026-03-11

View PDF

AI LLM

VERI-DPO: Evidence-Aware Alignment for Clinical Summarization via Claim Verification and Direct Preference Optimization

Brief Hospital Course (BHC) narratives must be clinically useful yet faithful to fragmented EHR evidence. LLM-based clinical summarizers still introduce unsupported statements, and alignment can en...

Weixin Liu, Congning Ni, Qingyuan Song, Susannah L. Rose, Christopher Symons, Murat Kantarcioglu,...

2603.10494 • 2026-03-11

View PDF

AI LLM

Human-AI Co-reasoning for Clinical Diagnosis with Evidence-Integrated Language Agent

We present PULSE, a medical reasoning agent that combines a domain-tuned large language model with scientific literature retrieval to support diagnostic decision-making in complex real-world cases....

Zhongzhen Huang, Yan Ling, Hong Chen, Ye Feng, Li Wu, Linjie Mu, Shaoting Zhang, Xiaofan Zhang, K...

2603.10492 • 2026-03-11

View PDF

AI LLM

PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses

Prompt design is a primary control interface for large language models (LLMs), yet standard evaluations largely reduce performance to answer correctness, obscuring why a prompt succeeds or fails an...

Minki Hong, Eunsoo Lee, Sohyun Park, Jihie Kim

2603.10477 • 2026-03-11

View PDF

AI LLM

Learning to Negotiate: Multi-Agent Deliberation for Collective Value Alignment in LLMs

The alignment of large language models (LLMs) has progressed substantially in single-agent settings through paradigms such as RLHF and Constitutional AI, with recent work exploring scalable alterna...

Panatchakorn Anantaprayoon, Nataliia Babina, Nima Asgharbeygi, Jad Tarifi

2603.10476 • 2026-03-11

View PDF

AI LLM

Aligning Large Language Models with Searcher Preferences

The paradigm shift from item-centric ranking to answer-centric synthesis is redefining the role of search engines. While recent industrial progress has applied generative techniques to closed-set i...

Wei Wu, Peilun Zhou, Liyi Chen, Qimeng Wang, Chengqiang Lu, Yan Gao, Yi Wu, Yao Hu, Hui Xiong

2603.10473 • 2026-03-11

View PDF

AI LLM

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

We study timestamped speaker-attributed ASR for long-form, multi-party speech with overlap, where chunk-wise inference must preserve meeting-level speaker identity consistency while producing time-...

Jing Peng, Ziyi Chen, Haoyu Li, Yucheng Wang, Duo Ma, Mengtian Li, Yunfan Du, Dezhu Xu, Kai Yu, S...

2603.10468 • 2026-03-11

View PDF

AI LLM

Spatio-Temporal Forecasting of Retaining Wall Deformation: Mitigating Error Accumulation via Multi-Resolution ConvLSTM Stacking Ensemble

This study proposes a multi-resolution Convolutional Long Short-Term Memory (ConvLSTM) ensemble framework that leverages diverse temporal input resolutions to mitigate error accumulation and improv...

Jihoon Kim, Heejung Youn

2603.10453 • 2026-03-11

View PDF

AI LLM

Machinagogy: Experiments in Staging Teaching Dramas with LLMs

This paper describes an AI tutoring system built upon two psycho-social theoretic constructs: Hegelian recognition and Freudian psychodynamics. Two related interventions are proposed: recognition-e...

Liam Magee

2603.10450 • 2026-03-11

View PDF

AI LLM

Unlearning the Unpromptable: Prompt-free Instance Unlearning in Diffusion Models

Machine unlearning aims to remove specific outputs from trained models, often at the concept level, such as forgetting all occurrences of a particular celebrity or filtering content via text prompt...

Kyungryeol Lee, Kyeonghyun Lee, Seongmin Hong, Byung Hyun Lee, Se Young Chun

2603.10445 • 2026-03-11

View PDF

AI LLM

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Large language models trained on natural language exhibit pronounced anisotropy: a small number of directions concentrate disproportionate energy, while the remaining dimensions form a broad semant...

Hengjie Cao, Zhendong Huang, Mengyi Chen, Yifeng Yang, Fanqi Yu, Ruijun Huang, Fang Dong, Xin Zha...

2603.10444 • 2026-03-11

View PDF

AI LLM

World2Act: Latent Action Post-Training via Skill-Compositional World Models

World Models (WMs) have emerged as a promising approach for post-training Vision-Language-Action (VLA) policies to improve robustness and generalization under environmental changes. However, most W...

An Dinh Vuong, Tuan Van Vo, Abdullah Sohail, Haoran Ding, Liang Ma, Xiaodan Liang, Anqing Duan, I...

2603.10422 • 2026-03-11

View PDF

Papers