Research

Papers

Research papers from arXiv and related sources

Total: 4513 AI/LLM: 2483 Testing: 2030
AI LLM

CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment

Large-language-model (LLM)-based text-to-speech (TTS) systems can generate natural speech, but most are not designed for low-latency dual-streaming synthesis. High-quality dual-streaming TTS depend...

Hanwen Liu, Saierdaer Yusuyin, Hao Huang, Zhijian Ou

2602.19574 2026-02-23
AI LLM

HOCA-Bench: Beyond Semantic Perception to Predictive World Modeling via Hegelian Ontological-Causal Anomalies

Video-LLMs have improved steadily on semantic perception, but they still fall short on predictive world modeling, which is central to physically grounded intelligence. We introduce HOCA-Bench, a be...

Chang Liu, Yunfan Ye, Qingyang Zhou, Xichen Tan, Mengxuan Luo, Zhenyu Qiu, Wei Peng, Zhiping Cai

2602.19571 2026-02-23
AI LLM

VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense

Large Vision-Language Models (LVLMs) can be vulnerable to adversarial images that subtly bias their outputs toward plausible yet incorrect responses. We introduce a general, efficient, and training...

Nadav Kadvil, Ayellet Tal

2602.19570 2026-02-23
AI LLM

Spritz: Path-Aware Load Balancing in Low-Diameter Networks

Low-diameter topologies such as Dragonfly and Slim Fly are increasingly adopted in HPC and datacenter networks, yet existing load balancing techniques either rely on proprietary in-network mechanis...

Tommaso Bonato, Ales Kubicek, Abdul Kabbani, Ahmad Ghalayini, Maciej Besta, Torsten Hoefler

2602.19567 2026-02-23
AI LLM

DICArt: Advancing Category-level Articulated Object Pose Estimation in Discrete State-Spaces

Articulated object pose estimation is a core task in embodied AI. Existing methods typically regress poses in a continuous space, but often struggle with 1) navigating a large, complex search space...

Li Zhang, Mingyu Mei, Ailing Wang, Xianhui Meng, Yan Zhong, Xinyuan Song, Liu Liu, Rujing Wang, Z...

2602.19565 2026-02-23
AI LLM

Identifying, Explaining, and Correcting Ableist Language with AI

Ableist language perpetuates harmful stereotypes and exclusion, yet its nuanced nature makes it difficult to recognize and address. Artificial intelligence could serve as a powerful ally in the fig...

Kynnedy Simone Smith, Lydia B. Chilton, Danielle Bragg

2602.19560 2026-02-23
AI LLM

Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains

Agentic systems built on large language models (LLMs) extend beyond text generation to autonomously retrieve information and invoke tools. This runtime execution model shifts the attack surface fro...

Xiaochong Jiang, Shiqi Yang, Wenting Yang, Yichen Liu, Cheng Ji

2602.19555 2026-02-23
AI LLM

Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining

One of the first pre-processing steps for constructing web-scale LLM pretraining datasets involves extracting text from HTML. Despite the immense diversity of web content, existing open-source data...

Jeffrey Li, Josh Gardner, Doug Kang, Fangping Shi, Karanjeet Singh, Chun-Liang Li, Herumb Shandil...

2602.19548 2026-02-23
AI LLM

CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents

LLM-based code interpreter agents are increasingly deployed in critical workflows, yet their robustness against risks introduced by their code execution capabilities remains underexplored. Existing...

Lei Ba, Qinbin Li, Songze Li

2602.19547 2026-02-23
AI LLM

Vinedresser3D: Agentic Text-guided 3D Editing

Text-guided 3D editing aims to modify existing 3D assets using natural-language instructions. Current methods struggle to jointly understand complex prompts, automatically localize edits in 3D, and...

Yankuan Chi, Xiang Li, Zixuan Huang, James M. Rehg

2602.19542 2026-02-23
AI LLM

Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems

Age estimation systems are increasingly deployed as gatekeepers for age-restricted online content, yet their robustness to cosmetic modifications has not been systematically evaluated. We investiga...

Xingyu Shen, Tommy Duong, Xiaodong An, Zengqi Zhao, Zebang Hu, Haoyu Hu, Ziyou Wang, Finn Guo, Si...

2602.19539 2026-02-23
AI LLM

Large Language Model-Assisted UAV Operations and Communications: A Multifaceted Survey and Tutorial

Uncrewed Aerial Vehicles (UAVs) are widely deployed across diverse applications due to their mobility and agility. Recent advances in Large Language Models (LLMs) offer a transformative opportunity...

Yousef Emami, Hao Zhou, Radha Reddy, Atefeh Hajijamali Arani, Biliang Wang, Kai Li, Luis Almeida,...

2602.19534 2026-02-23
AI LLM

ORION: ORthonormal Text Encoding for Universal VLM AdaptatION

Vision language models (VLMs) have demonstrated remarkable generalization across diverse tasks, yet their performance remains constrained by the quality and geometry of the textual prototypes used ...

Omprakash Chakraborty, Jose Dolz, Ismail Ben Ayed

2602.19530 2026-02-23
AI LLM

How to Train Your Deep Research Agent? Prompt, Reward, and Policy Optimization in Search-R1

Deep Research agents tackle knowledge-intensive tasks through multi-round retrieval and decision-oriented generation. While reinforcement learning (RL) has been shown to improve performance in this...

Yinuo Xu, Shuo Lu, Jianjie Cheng, Meng Wang, Qianlong Xie, Xingxing Wang, Ran He, Jian Liang

2602.19526 2026-02-23
AI LLM

An LLM-Enabled Frequency-Aware Flow Diffusion Model for Natural-Language-Guided Power System Scenario Generation

Diverse and controllable scenario generation (e.g., wind, solar, load, etc.) is critical for robust power system planning and operation. As AI-based scenario generation methods are becoming the mai...

Zhenghao Zhou, Yiyan Li, Fei Xie, Lu Wang, Bo Wang, Jiansheng Wang, Zheng Yan, Mo-Yuen Chow

2602.19522 2026-02-23
AI LLM

Ada-RS: Adaptive Rejection Sampling for Selective Thinking

Large language models (LLMs) are increasingly being deployed in cost and latency-sensitive settings. While chain-of-thought improves reasoning, it can waste tokens on simple requests. We study sele...

Yirou Ge, Yixi Li, Alec Chiu, Shivani Shekhar, Zijie Pan, Avinash Thangali, Yun-Shiuan Chuang, Ch...

2602.19519 2026-02-23
AI LLM

Anticipate, Adapt, Act: A Hybrid Framework for Task Planning

Anticipating and adapting to failures is a key capability robots need to collaborate effectively with humans in complex domains. This continues to be a challenge despite the impressive performance ...

Nabanita Dash, Ayush Kaura, Shivam Singh, Ramandeep Singh, Snehasis Banerjee, Mohan Sridharan, K....

2602.19518 2026-02-23
AI LLM

Classroom Final Exam: An Instructor-Tested Reasoning Benchmark

We introduce \CFE{} (\textbf{C}lassroom \textbf{F}inal \textbf{E}xam), a multimodal benchmark for evaluating the reasoning capabilities of large language models across more than 20 STEM domains. \C...

Chongyang Gao, Diji Yang, Shuyan Zhou, Xichen Yan, Luchuan Song, Shuo Li, Kezhen Chen

2602.19517 2026-02-23
AI LLM

Pixel2Phys: Distilling Governing Laws from Visual Dynamics

Discovering physical laws directly from high-dimensional visual data is a long-standing human pursuit but remains a formidable challenge for machines, representing a fundamental goal of scientific ...

Ruikun Li, Jun Yao, Yingfan Hua, Shixiang Tang, Biqing Qi, Bin Liu, Wanli Ouyang, Yan Lu

2602.19516 2026-02-23
AI LLM

Security Risks of AI Agents Hiring Humans: An Empirical Marketplace Study

Autonomous AI agents can now programmatically hire human workers through marketplaces using REST APIs and Model Context Protocol (MCP) integrations. This creates an attack surface analogous to CAPT...

Pulak Mehta

2602.19514 2026-02-23