Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Predicting Sentence Acceptability Judgments in Multimodal Contexts

Previous work has examined the capacity of deep neural networks (DNNs), particularly transformers, to predict human sentence acceptability judgments, both independently of context, and in document ...

Hyewon Jang, Nikolai Ilinykh, Sharid Loáiciga, Jey Han Lau, Shalom Lappin

2602.20918 2026-02-24
AI LLM

InterPilot: Exploring the Design Space of AI-assisted Job Interview Support for HR Professionals

Recruitment interviews are cognitively demanding interactions in which interviewers must simultaneously listen, evaluate candidates, take notes, and formulate follow-up questions. To better underst...

Zhengtao Xu, Zimo Xia, Zicheng Zhu, Nattapat Boonprakong, Yu-An Chen, Rabih Zbib, Casimiro Pio Ca...

2602.20891 2026-02-24
AI LLM

When LLMs Enter Everyday Feminism on Chinese Social Media: Opportunities and Risks for Women's Empowerment

Everyday digital feminism refers to the ordinary, often pragmatic ways women articulate lived experiences and cultivate solidarity in online spaces. In China, such practices flourish on RedNote thr...

Runhua Zhang, Ziqi Pan, Kangyu Yuan, Qiaoyi Chen, Yulin Tian, Huamin Qu, Xiaojuan Ma

2602.20876 2026-02-24
AI LLM

MUSE: Harnessing Precise and Diverse Semantics for Few-Shot Whole Slide Image Classification

In computational pathology, few-shot whole slide image classification is primarily driven by the extreme scarcity of expert-labeled slides. Recent vision-language methods incorporate textual semant...

Jiahao Xu, Sheng Huang, Xin Zhang, Zhixiong Nan, Jiajun Dong, Nankun Mu

2602.20873 2026-02-24
AI LLM

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Agentic systems increasingly rely on reusable procedural capabilities, \textit{a.k.a., agentic skills}, to execute long-horizon workflows reliably. These capabilities are callable modules that pack...

Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, Guangsheng Yu

2602.20867 2026-02-24
AI LLM

FinAnchor: Aligned Multi-Model Representations for Financial Prediction

Financial prediction from long documents involves significant challenges, as actionable signals are often sparse and obscured by noise, and the optimal LLM for generating embeddings varies across t...

Zirui He, Huopu Zhang, Yanguang Liu, Sirui Wu, Mengnan Du

2602.20859 2026-02-24
AI LLM

Training-Free Multi-Concept Image Editing

Editing images with diffusion models without training remains challenging. While recent optimisation-based methods achieve strong zero-shot edits from text, they struggle to preserve identity or ca...

Niki Foteinopoulou, Ignas Budvytis, Stephan Liwicki

2602.20839 2026-02-24
AI LLM

Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

Evaluating alignment in language models requires testing how they behave under realistic pressure, not just what they claim they would do. While alignment failures increasingly cause real-world har...

Nora Petrova, John Burden

2602.20813 2026-02-24
AI LLM

Qwen-BIM: developing large language model for BIM-based design with domain-specific benchmark and dataset

As the construction industry advances toward digital transformation, BIM (Building Information Modeling)-based design has become a key driver supporting intelligent construction. Despite Large Lang...

Jia-Rui Lin, Yun-Hong Cai, Xiang-Rui Ni, Shaojie Zhou, Peng Pan

2602.20812 2026-02-24
AI LLM

Mitigating Preference Leakage via Strict Estimator Separation for Normative Generative Ranking

In Generative Information Retrieval (GenIR), the bottleneck has shifted from generation to the selection of candidates, particularly for normative criteria such as cultural relevance. Current LLM-a...

Dalia Nahhas, Xiaohao Cai, Imran Razzak, Shoaib Jameel

2602.20800 2026-02-24
AI LLM

Unseen-Codebases-Domain Data Synthesis and Training Based on Code Graphs

In the context of newly release software frameworks, large language models (LLMs) often exhibit poor performance and a high rate of hallucination, as they are not exposed to such environments durin...

Guangsheng Ou, Qiming Zhang, Sirong Chen, Anji Li, Dong Xu, Tiancheng Luo, Dekun Dai, Cuiyun Gao,...

2602.20799 2026-02-24
AI LLM

Federated Learning for Cross-Modality Medical Image Segmentation via Augmentation-Driven Generalization

Artificial intelligence has emerged as a transformative tool in medical image analysis, yet developing robust and generalizable segmentation models remains difficult due to fragmented, privacy-cons...

Sachin Dudda Nagaraju, Ashkan Moradi, Bendik Skarre Abrahamsen, Mattijs Elschot

2602.20773 2026-02-24
AI LLM

Pipeline for Verifying LLM-Generated Mathematical Solutions

With the growing popularity of Large Reasoning Models and their results in solving mathematical problems, it becomes crucial to measure their capabilities. We introduce a pipeline for both automati...

Varvara Sazonova, Dmitri Shmelkin, Stanislav Kikot, Vasily Motolygin

2602.20770 2026-02-24
AI LLM

Overton Pluralistic Reinforcement Learning for Large Language Models

Existing alignment paradigms remain limited in capturing the pluralistic nature of human values. Overton Pluralism addresses this gap by generating responses with diverse perspectives from a single...

Yu Fu, Seongho Son, Ilija Bogunovic

2602.20759 2026-02-24
AI LLM

SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing

Designing aligned and robust rewards for open-ended generation remains a key barrier to RL post-training. Rubrics provide structured, interpretable supervision, but scaling rubric construction is d...

Yifei Xu, Guilherme Potje, Shivam Shandilya, Tiancheng Yuan, Leonardo de Oliveira Nunes, Rakshand...

2602.20751 2026-02-24
AI LLM

Adaptive Text Anonymization: Learning Privacy-Utility Trade-offs via Prompt Optimization

Anonymizing textual documents is a highly context-sensitive problem: the appropriate balance between privacy protection and utility preservation varies with the data domain, privacy objectives, and...

Gabriel Loiseau, Damien Sileo, Damien Riquet, Maxime Meyer, Marc Tommasi

2602.20743 2026-02-24
AI LLM

RMIT-ADM+S at the MMU-RAG NeurIPS 2025 Competition

This paper presents the award-winning RMIT-ADM+S system for the Text-to-Text track of the NeurIPS~2025 MMU-RAG Competition. We introduce Routing-to-RAG (R2RAG), a research-focused retrieval-aug...

Kun Ran, Marwah Alaofi, Danula Hettiachchi, Chenglong Ma, Khoi Nguyen Dinh Anh, Khoi Vo Nguyen, S...

2602.20735 2026-02-24
AI LLM

CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference

Long-context LLMs demand accurate inference at low latency, yet decoding becomes primarily constrained by KV cache as context grows. Prior pruning methods are largely context-agnostic: their token ...

Chao Fei, Guozhong Li, Chenxi Liu, Panos Kalnis

2602.20732 2026-02-24
AI LLM

Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback

Reward design has been one of the central challenges for real world reinforcement learning (RL) deployment, especially in settings with multiple objectives. Preference-based RL offers an appealing ...

Chenyang Zhao, Vinny Cahill, Ivana Dusparic

2602.20728 2026-02-24
AI LLM

ID-LoRA: Efficient Low-Rank Adaptation Inspired by Matrix Interpolative Decomposition

LoRA has become a universal Parameter-Efficient Fine-Tuning (PEFT) technique that equips Large Language Models (LLMs) to adapt quickly to new tasks. However, when these models are scaled up, even t...

Xindian Ma, Rundong Kong, Peng Zhang, Ruoxiang Huang, Yongyu Jiang

2602.20727 2026-02-24