Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

Agentic workflow enables the recovery of critical materials from complex feedstocks via selective precipitation

We present a multi-agentic workflow for critical materials recovery that deploys a series of AI agents and automated instruments to recover critical materials from produced water and magnet leachat...

Andrew Ritchhart, Sarah I. Allec, Pravalika Butreddy, Krista Kulesa, Qingpu Wang, Dan Thien Nguye...

2603.15491 2026-03-16
AI LLM

RSGen: Enhancing Layout-Driven Remote Sensing Image Generation with Diverse Edge Guidance

Diffusion models have significantly mitigated the impact of annotated data scarcity in remote sensing (RS). Although recent approaches have successfully harnessed these models to enable diverse and...

Xianbao Hou, Yonghao He, Zeyd Boukhers, John See, Hu Su, Wei Sui, Cong Yang

2603.15484 2026-03-16
AI LLM

Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis

Agent applications are increasingly adopted to automate workflows across diverse tasks. However, due to the heterogeneous domains they operate in, it is challenging to create a scalable evaluation ...

Penny Chong, Harshavardhan Abichandani, Jiyuan Shen, Atin Ghosh, Min Pyae Moe, Yifan Mai, Daniel ...

2603.15483 2026-03-16
AI LLM

ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer

Diffusion Transformers (DiTs) have demonstrated remarkable scalability and quality in image and video generation, prompting growing interest in extending them to controllable generation and editing...

Ruonan Yu, Zhenxiong Tan, Zigeng Chen, Songhua Liu, Xinchao Wang

2603.15478 2026-03-16
AI LLM

Agent Lifecycle Toolkit (ALTK): Reusable Middleware Components for Robust AI Agents

As AI agents move from demos into enterprise deployments, their failure modes become consequential: a misinterpreted tool argument can corrupt production data, a silent reasoning error can go undet...

Zidane Wright, Jason Tsay, Anupama Murthi, Osher Elhadad, Diego Del Rio, Saurabh Goyal, Kiran Kat...

2603.15473 2026-03-16
AI LLM

Financial Transaction Retrieval and Contextual Evidence for Knowledge-Grounded Reasoning

Nowadays, success of financial organizations heavily depends on their ability to process digital traces generated by their clients, e.g., transaction histories, gathered from various sources to imp...

Artem Sakhno, Daniil Tomilov, Yuliana Shakhvalieva, Inessa Fedorova, Daria Ruzanova, Omar Zoloev,...

2603.15459 2026-03-16
AI LLM

Evasive Intelligence: Lessons from Malware Analysis for Evaluating AI Agents

Artificial intelligence (AI) systems are increasingly adopted as tool-using agents that can plan, observe their environment, and take actions over extended time periods. This evolution challenges c...

Simone Aonzo, Merve Sahin, Aurélien Francillon, Daniele Perito

2603.15457 2026-03-16
AI LLM

Unlocking the Value of Text: Event-Driven Reasoning and Multi-Level Alignment for Time Series Forecasting

Existing time series forecasting methods primarily rely on the numerical data itself. However, real-world time series exhibit complex patterns associated with multimodal information, making them di...

Siyuan Wang, Peng Chen, Yihang Wang, Wanghui Qiu, Chenjuan Guo, Bin Yang, Yang Shu

2603.15452 2026-03-16
AI LLM

The Social Sycophancy Scale: A psychometrically validated measure of sycophancy

Large Language Model (LLM) sycophancy is a growing concern. The current literature has largely examined sycophancy in contexts with clear right and wrong answers, like coding. However, AI is increa...

Jean Rehani, Victoria Oldemburgo de Mello, Dariya Ovsyannikova, Ashton Anderson, Michael Inzlicht

2603.15448 2026-03-16
AI LLM

MV2UV: Generating High-quality UV Texture Maps with Multiview Prompts

Generating high-quality textures for 3D assets is a challenging task. Existing multiview texture generation methods suffer from the multiview inconsistency and missing textures on unseen parts, whi...

Zheng Zhang, Qinchuan Zhang, Yuteng Ye, Zhi Chen, Penglei Ji, Mengfei Li, Wenxiao Zhang, Yuan Liu

2603.15436 2026-03-16
AI LLM

Invisible failures in human-AI interactions

AI systems fail silently far more often than they fail visibly. In a large-scale quantitative analysis of human-AI interactions from the WildChat dataset, we find that 78% of AI failures are invisi...

Christopher Potts, Moritz Sudhof

2603.15423 2026-03-16
AI LLM

Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities

Test-time training (TTT) has recently emerged as a promising method to improve the reasoning abilities of large language models (LLMs), in which the model directly learns from test data without acc...

Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury, Jing Liu, Toshiaki Koike-Akino, Ming Jin, ...

2603.15417 2026-03-16
AI LLM

TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems

With the rapid development of LLM-based multi-agent systems (MAS), their significant safety and security concerns have emerged, which introduce novel risks going beyond single agents or LLMs. Despi...

Kai Wang, Biaojie Zeng, Zeming Wei, Chang Jin, Hefeng Zhou, Xiangtian Li, Chao Yang, Jingjing Qu,...

2603.15408 2026-03-16
AI LLM

Fusian: Multi-LoRA Fusion for Fine-Grained Continuous MBTI Personality Control in Large Language Models

Large Language Models (LLMs) have demonstrated impressive capabilities in simulating diverse human behaviors and personalities. However, existing methods for personality control, which include prom...

Zehao Chen, Rong Pan

2603.15405 2026-03-16
AI LLM

A Closer Look into LLMs for Table Understanding

Despite the success of Large Language Models (LLMs) in table understanding, their internal mechanisms remain unclear. In this paper, we conduct an empirical study on 16 LLMs, covering general LLMs,...

Jia Wang, Chuanyu Qin, Mingyu Zheng, Qingyi Si, Peize Li, Zheng Lin

2603.15402 2026-03-16
AI LLM

SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?

Agent skills, structured procedural knowledge packages injected at inference time, are increasingly used to augment LLM agents on software engineering tasks. However, their real utility in end-to-e...

Tingxu Han, Yi Zhang, Wei Song, Chunrong Fang, Zhenyu Chen, Youcheng Sun, Lijie Hu

2603.15401 2026-03-16
AI LLM

SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration

Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. However, they remain highly susceptible to jailbreak attacks that undermine their safety alignment...

Yu Pan, Wenlong Yu, Tiejun Wu, Xiaohu Ye, Qiannan Si, Guangquan Xu, Bin Wu

2603.15397 2026-03-16
AI LLM

AI Evasion and Impersonation Attacks on Facial Re-Identification with Activation Map Explanations

Facial identification systems are increasingly deployed in surveillance and yet their vulnerability to adversarial evasion and impersonation attacks pose a critical risk. This paper introduces a no...

Noe Claudel, Weisi Guo, Yang Xing

2603.15396 2026-03-16
AI LLM

When Does Sparsity Mitigate the Curse of Depth in LLMs

Recent work has demonstrated the curse of depth in large language models (LLMs), where later layers contribute less to learning and representation than earlier layers. Such under-utilization is lin...

Dilxat Muhtar, Xinyuan Song, Sebastian Pokutta, Max Zimmer, Nico Pelleriti, Thomas Hofmann, Shiwe...

2603.15389 2026-03-16
AI LLM

RieMind: Geometry-Grounded Spatial Agent for Scene Understanding

Visual Language Models (VLMs) have increasingly become the main paradigm for understanding indoor scenes, but they still struggle with metric and spatial reasoning. Current approaches rely on end-t...

Fernando Ropero, Erkin Turkoz, Daniel Matos, Junqing Du, Antonio Ruiz, Yanfeng Zhang, Lu Liu, Min...

2603.15386 2026-03-16