Papers
Research papers from arXiv and related sources
UniMotion: A Unified Framework for Motion-Text-Vision Understanding and Generation
We present UniMotion, to our knowledge the first unified framework for simultaneous understanding and generation of human motion, natural language, and RGB images within a single architecture. Exis...
Ziyi Wang, Xinshun Wang, Shuang Chen, Yang Cong, Mengyuan Liu
3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing
Large Language Models (LLMs) and Vision Language Models (VLMs) have shown impressive reasoning abilities, yet they struggle with spatial understanding and layout consistency when performing fine-gr...
Haoyu Zhen, Xiaolong Li, Yilin Zhao, Han Zhang, Sifei Liu, Kaichun Mo, Chuang Gan, Subhashree Rad...
Greater accessibility can amplify discrimination in generative AI
Hundreds of millions of people rely on large language models (LLMs) for education, work, and even healthcare. Yet these models are known to reproduce and amplify social biases present in their trai...
Carolin Holtermann, Minh Duc Bui, Kaitlyn Zhou, Valentin Hofmann, Katharina von der Wense, Anne L...
EgoGroups: A Benchmark For Detecting Social Groups of People in the Wild
Social group detection, or the identification of humans involved in reciprocal interpersonal interactions (e.g., family members, friends, and customers and merchants), is a crucial component of soc...
Jeffri Murrugarra-Llerena, Pranav Chitale, Zicheng Liu, Kai Ao, Yujin Ham, Guha Balakrishnan, Pao...
SpatialReward: Verifiable Spatial Reward Modeling for Fine-Grained Spatial Consistency in Text-to-Image Generation
Recent advances in text-to-image (T2I) generation via reinforcement learning (RL) have benefited from reward models that assess semantic alignment and visual quality. However, most existing reward ...
Sashuai Zhou, Qiang Zhou, Junpeng Ma, Yue Cao, Ruofan Hu, Ziang Zhang, Xiaoda Yang, Zhibin Wang, ...
Dyadic: A Scalable Platform for Human-Human and Human-AI Conversation Research
Conversation is ubiquitous in social life, but the empirical study of this interactive process has been thwarted by tools that are insufficiently modular and unadaptive to researcher needs. To reli...
David M. Markowitz
Evaluating the Reliability and Fidelity of Automated Judgment Systems of Large Language Models
A Large Language Model (LLM) as judge evaluates the quality of victim Machine Learning (ML) models, specifically LLMs, by analyzing their outputs. An LLM as judge is the combination of one model an...
Tom Biskupski, Stephan Kleber
SPA: A Simple but Tough-to-Beat Baseline for Knowledge Injection
While large language models (LLMs) are pretrained on massive amounts of data, their knowledge coverage remains incomplete in specialized, data-scarce domains, motivating extensive efforts to study ...
Kexian Tang, Jiani Wang, Shaowen Wang, Kaifeng Lyu
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models
Video--based world models have emerged along two dominant paradigms: video generation and 3D reconstruction. However, existing evaluation benchmarks either focus narrowly on visual fidelity and tex...
Meiqi Wu, Zhixin Cai, Fufangchen Zhao, Xiaokun Feng, Rujing Dang, Bingze Song, Ruitian Tian, Jias...
Chimera: Latency- and Performance-Aware Multi-agent Serving for Heterogeneous LLMs
Multi-agent applications often execute complex tasks as multi-stage workflows, where each stage is an LLM call whose output becomes part of context for subsequent steps. Existing LLM serving system...
Kangqi Ni, Wenyue Hua, Xiaoxiang Shi, Jiang Guo, Shiyu Chang, Tianlong Chen
CayleyPy-4: AI-Holography. Towards analogs of holographic string dualities for AI tasks
This is the fourth paper in the CayleyPy project, which applies AI methods to the exploration of large graphs. In this work, we suggest the existence of a new discrete version of holographic string...
A. Chervov, F. Levkovich-Maslyuk, A. Smolensky, F. Khafizov, I. Kiselev, D. Melnikov, I. Koltsov,...
PAM: A Pose-Appearance-Motion Engine for Sim-to-Real HOI Video Generation
Hand-object interaction (HOI) reconstruction and synthesis are becoming central to embodied AI and AR/VR. Yet, despite rapid progress, existing HOI generation research remains fragmented across thr...
Mingju Gao, Kaisen Yang, Huan-ang Gao, Bohan Li, Ao Ding, Wenyi Li, Yangcheng Yu, Jinkun Liu, Sha...
Enhancing Document-Level Machine Translation via Filtered Synthetic Corpora and Two-Stage LLM Adaptation
In Machine Translation, Large Language Models (LLMs) have generally underperformed compared to conventional encoder-decoder systems and thus see limited adoption. However, LLMs excel at modeling co...
Ireh Kim, Tesia Sker, Chanwoo Kim
Revisiting Quantum Code Generation: Where Should Domain Knowledge Live?
Recent advances in large language models (LLMs) have enabled the automation of an increasing number of programming tasks, including code generation for scientific and engineering domains. In rapidl...
Oscar Novo, Oscar Bastidas-Jossa, Alberto Calvo, Antonio Peris, Carlos Kuchkovsky
MARCUS: An agentic, multimodal vision-language model for cardiac diagnosis and management
Cardiovascular disease remains the leading cause of global mortality, with progress hindered by human interpretation of complex cardiac tests. Current AI vision-language models are limited to singl...
Jack W O'Sullivan, Mohammad Asadi, Lennart Elbe, Akshay Chaudhari, Tahoura Nedaee, Francois Hadda...
Causal Evidence that Language Models use Confidence to Drive Behavior
Metacognition -- the ability to assess one's own cognitive performance -- is documented across species, with internal confidence estimates serving as a key signal for adaptive behavior. While confi...
Dharshan Kumaran, Nathaniel Daw, Simon Osindero, Petar Velickovic, Viorica Patraucean
Multimodal Survival Analysis with Locally Deployable Large Language Models
We study multimodal survival analysis integrating clinical text, tabular covariates, and genomic profiles using locally deployable large language models (LLMs). As many institutions face tight comp...
Moritz Gögl, Christopher Yau
More Isn't Always Better: Balancing Decision Accuracy and Conformity Pressures in Multi-AI Advice
Just as people improve decision-making by consulting diverse human advisors, they can now also consult with multiple AI systems. Prior work on group decision-making shows that advice aggregation cr...
Yuta Tsuchiya, Yukino Baba
The Semantic Ladder: A Framework for Progressive Formalization of Natural Language Content for Knowledge Graphs and AI Systems
Semantic data and knowledge infrastructures must reconcile two fundamentally different forms of representation: natural language, in which most knowledge is created and communicated, and formal sem...
Lars Vogt
Navigational Thinking as an Emerging Paradigm of Computer Science in the Age of Generative AI
Generative AI systems produce meaning with a quality indistinguishable from - and occasionally surpassing - human performance, yet the epistemic mechanism through which this occurs remains poorly u...
Ilya Levin