Research

Papers

Research papers from arXiv and related sources

Total: 4694 AI/LLM: 2583 Testing: 2111
AI LLM

Highly Autonomous Cyber-Capable Agents: Anticipating Capabilities, Tactics, and Strategic Implications

This report introduces the concept of "Highly Autonomous Cyber-Capable Agents" (HACCAs), AI systems capable of autonomously conducting multi-stage cyber campaigns at a level comparable to today's t...

Jam Kraprayoon, Shaun Ee, Brianna Rosen, Yohan Matthew, Aditya Singh, Christopher Covino, Asher B...

2603.11528 2026-03-12
AI LLM

Managing Cognitive Bias in Human Labeling Operations for Rare-Event AI: Evidence from a Field Experiment

Many operational AI systems depend on large-scale human annotation to detect rare but consequential events (e.g., fraud, defects, and medical abnormalities). When positives are rare, the prevalence...

Gunnar P. Epping, Andrew Caplin, Erik Duhaime, William R. Holmes, Daniel Martin, Jennifer S. True...

2603.11511 2026-03-12
AI LLM

Tiny Aya: Bridging Scale and Multilingual Depth

Tiny Aya redefines what a small multilingual language model can achieve. Trained on 70 languages and refined through region-aware posttraining, it delivers state-of-the-art in translation quality, ...

Alejandro R. Salamanca, Diana Abagyan, Daniel D'souza, Ammar Khairi, David Mora, Saurabh Dash, Vi...

2603.11510 2026-03-12
AI LLM

KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation

Graph-based Retrieval-Augmented Generation (GraphRAG) constructs the Knowledge Graph (KG) from external databases to enhance the timeliness and accuracy of Large Language Model (LLM) generations.Ho...

Qizhi Chen, Chao Qi, Yihong Huang, Muquan Li, Rongzheng Wang, Dongyang Zhang, Ke Qin, Shuang Liang

2603.11501 2026-03-12
AI LLM

Try, Check and Retry: A Divide-and-Conquer Framework for Boosting Long-context Tool-Calling Performance of LLMs

Tool-calling empowers Large Language Models (LLMs) to interact with external environments. However, current methods often struggle to handle massive and noisy candidate tools in long-context tool-c...

Kunfeng Chen, Qihuang Zhong, Juhua Liu, Bo Du, Dacheng Tao

2603.11495 2026-03-12
AI LLM

PRMB: Benchmarking Reward Models in Long-Horizon CBT-based Counseling Dialogue

Large language models (LLMs) hold potential for mental healthcare applications, particularly in cognitive behavioral therapy (CBT)-based counseling, where reward models play a critical role in alig...

Yougen Zhou, Qin Chen, Ningning Zhou, Jie Zhou, Liang He

2603.11494 2026-03-12
AI LLM

INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs

Despite rapid progress, Video Large Language Models (Video-LLMs) remain unreliable due to hallucinations, which are outputs that contradict either video evidence (faithfulness) or verifiable world ...

Junqi Yang, Yuecong Min, Jie Zhang, Shiguang Shan, Xilin Chen

2603.11481 2026-03-12
AI LLM

Grammar of the Wave: Towards Explainable Multivariate Time Series Event Detection via Neuro-Symbolic VLM Agents

Time Series Event Detection (TSED) has long been an important task with critical applications across many high-stakes domains. Unlike statistical anomalies, events are defined by semantics with com...

Sky Chenwei Wan, Tianjun Hou, Yifei Wang, Xiqing Chang, Aymeric Jan

2603.11479 2026-03-12
AI LLM

Deep Learning Network-Temporal Models For Traffic Prediction

Time series analysis is critical for emerging net- work intelligent control and management functions. However, existing statistical-based and shallow machine learning models have shown limited pred...

Yufeng Xin, Ethan Fan

2603.11475 2026-03-12
AI LLM

COMIC: Agentic Sketch Comedy Generation

We propose a fully automated AI system that produces short comedic videos similar to sketch shows such as Saturday Night Live. Starting with character references, the system employs a population of...

Susung Hong, Brian Curless, Ira Kemelmacher-Shlizerman, Steve Seitz

2603.11048 2026-03-11
AI LLM

Chasing RATs: Tracing Reading for and as Creative Activity

Creativity research has privileged making over the interpretive labor that precedes and shapes it. We introduce Reading Activity Traces (RATs), a proposal that treats reading -- broadly defined to ...

Sophia Liu, Shm Garanganao Almeda

2603.11031 2026-03-11
AI LLM

Beyond the Illusion of Consensus: From Surface Heuristics to Knowledge-Grounded Evaluation in LLM-as-a-Judge

The paradigm of LLM-as-a-judge relies on a critical assumption, namely that high inter-evaluator agreement indicates reliable and objective evaluation. We present two complementary findings that ch...

Mingyang Song, Mao Zheng, Chenning Xu

2603.11027 2026-03-11
AI LLM

LLMGreenRec: LLM-Based Multi-Agent Recommender System for Sustainable E-Commerce

Rising environmental awareness in e-commerce necessitates recommender systems that not only guide users to sustainable products but also minimize their own digital carbon footprints. Traditional se...

Hao N. Nguyen, Hieu M. Nguyen, Son Van Nguyen, Nguyen Thi Hanh

2603.11025 2026-03-11
AI LLM

Does AI See like Art Historians? Interpreting How Vision Language Models Recognize Artistic Style

VLMs have become increasingly proficient at a range of computer vision tasks, such as visual question answering and object detection. This includes increasingly strong capabilities in the domain of...

Marvin Limpijankit, Milad Alshomary, Yassin Oulad Daoud, Amith Ananthram, Tim Trombley, Elias Ste...

2603.11024 2026-03-11
AI LLM

Leech Lattice Vector Quantization for Efficient LLM Compression

Scalar quantization of large language models (LLMs) is fundamentally limited by information-theoretic bounds. While vector quantization (VQ) overcomes these limits by encoding blocks of parameters ...

Tycho F. A. van der Ouderaa, Mart van Baalen, Paul Whatmough, Markus Nagel

2603.11021 2026-03-11
AI LLM

Task-Aware Delegation Cues for LLM Agents

LLM agents increasingly present as conversational collaborators, yet human--agent teamwork remains brittle due to information asymmetry: users lack task-specific reliability cues, and agents rarely...

Xingrui Gu

2603.11011 2026-03-11
AI LLM

A Systematic Study of Pseudo-Relevance Feedback with LLMs

Pseudo-relevance feedback (PRF) methods built on large language models (LLMs) can be organized along two key design dimensions: the feedback source, which is where the feedback text is derived from...

Nour Jedidi, Jimmy Lin

2603.11008 2026-03-11
AI LLM

RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

Human uplift studies - or studies that measure AI effects on human performance relative to a status quo, typically using randomized controlled trial (RCT) methodology - are increasingly used to inf...

Patricia Paskov, Kevin Wei, Shen Zhou Hong, Dan Bateyko, Xavier Roberts-Gaal, Carson Ezell, Gaili...

2603.11001 2026-03-11
AI LLM

Artificial Intelligence as a Catalyst for Innovation in Software Engineering

The rapid evolution and inherent complexity of modern software requirements demand highly flexible and responsive development methodologies. While Agile frameworks have become the industry standard...

Carlos Alberto Fernández-y-Fernández, Jorge R. Aguilar-Cisneros

2603.10994 2026-03-11
AI LLM

Too Vivid to Be Real? Benchmarking and Calibrating Generative Color Fidelity

Recent advances in text-to-image (T2I) generation have greatly improved visual quality, yet producing images that appear visually authentic to real-world photography remains challenging. This is pa...

Zhengyao Fang, Zexi Jia, Yijia Zhong, Pengcheng Luo, Jinchao Zhang, Guangming Lu, Jun Yu, Wenjie Pei

2603.10990 2026-03-11