Papers
Research papers from arXiv and related sources
Prompt-Driven Color Accessibility Evaluation in Diffusion-based Image Generation Models
Generative models are increasingly integrated into creative workflows. While text-to-image generation excels in visual quality and diversity, color accessibility for users with Color Vision Deficie...
Xinyao Zhuang, Jose Echevarria, Kaan Akşit
MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents
As embodied models become powerful, humans will collaborate with multiple embodied AI agents at their workplace or home in the future. To ensure better communication between human users and the mul...
Kangsan Kim, Yanlai Yang, Suji Kim, Woongyeong Yeo, Youngwan Lee, Mengye Ren, Sung Ju Hwang
One-Eval: An Agentic System for Automated and Traceable LLM Evaluation
Reliable evaluation is essential for developing and deploying large language models, yet in practice it often requires substantial manual effort: practitioners must identify appropriate benchmarks,...
Chengyu Shen, Yanheng Hou, Minghui Pan, Runming He, Zhen Hao Wong, Meiyi Qiang, Zhou Liu, Hao Lia...
EmoSURA: Towards Accurate Evaluation of Detailed and Long-Context Emotional Speech Captions
Recent advancements in speech captioning models have enabled the generation of rich, fine-grained captions for emotional speech. However, the evaluation of such captions remains a critical bottlene...
Xin Jing, Andreas Triantafyllopoulos, Jiadong Wang, Shahin Amiriparian, Jun Luo, Björn Schuller
A Swampland-modified Hod bound for charged black holes with exotic matter
In this paper, we study the quasinormal modes (QNMs) of a charged black hole in the presence of both quintessence and a cloud of strings using the Pade-averaged higher-order WKB approximation metho...
S. Saoud, M. A Rbah, R. Sammani, E. H. Saidi, R. Ahl Laamara
RA-SSU: Towards Fine-Grained Audio-Visual Learning with Region-Aware Sound Source Understanding
Audio-Visual Learning (AVL) is one fundamental task of multi-modality learning and embodied intelligence, displaying the vital role in scene understanding and interaction. However, previous researc...
Muyi Sun, Yixuan Wang, Hong Wang, Chen Su, Man Zhang, Xingqun Qi, Qi Li, Zhenan Sun
A Hybrid Model-Assisted Approach for Path Loss Prediction in Suburban Scenarios
Accurate path loss prediction is crucial for wireless network planning and optimization in suburban environments with complex terrain variation and diverse land cover. This paper proposes a model a...
Chenlong Wang, Bo Ai, Ruiming Chen, Ruisi He, Mi Yang, Yuxin Zhang, Weirong Liu, Liu Liu
MITRA: An AI Assistant for Knowledge Retrieval in Physics Collaborations
Large-scale scientific collaborations, such as the Compact Muon Solenoid (CMS) at CERN, produce a vast and ever-growing corpus of internal documentation. Navigating this complex information landsca...
Abhishikth Mallampalli, Sridhara Dasu
Test-time Ego-Exo-centric Adaptation for Action Anticipation via Multi-Label Prototype Growing and Dual-Clue Consistency
Efficient adaptation between Egocentric (Ego) and Exocentric (Exo) views is crucial for applications such as human-robot cooperation. However, the success of most existing Ego-Exo adaptation method...
Zhaofeng Shi, Heqian Qiu, Lanxiao Wang, Qingbo Wu, Fanman Meng, Lili Pan, Hongliang Li
Exploiting Label-Aware Channel Scoring for Adaptive Channel Pruning in Split Learning
Split learning (SL) transfers most of the training workload to the server, which alleviates computational burden on client devices. However, the transmission of intermediate feature representations...
Jialei Tan, Zheng Lin, Xiangming Cai, Ruoxi Zhu, Zihan Fang, Pingping Chen, Wei Ni
Quantifying the Necessity of Chain of Thought through Opaque Serial Depth
Large language models (LLMs) tend to externalize their reasoning in their chain of thought, making the chain of thought a good target for monitoring. This is partially an inherent feature of the Tr...
Jonah Brown-Cohen, David Lindner, Rohin Shah
TIMID: Time-Dependent Mistake Detection in Videos of Robot Executions
As robotic systems execute increasingly difficult task sequences, so does the number of ways in which they can fail. Video Anomaly Detection (VAD) frameworks typically focus on singular, low-level ...
Nerea Gallego, Fernando Salanova, Claudio Mannarano, Cristian Mahulea, Eduardo Montijano
CLIOPATRA: Extracting Private Information from LLM Insights
As AI assistants become widely used, privacy-aware platforms like Anthropic's Clio have been introduced to generate insights from real-world AI use. Clio's privacy protections rely on layering mult...
Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro, Peter Kairouz
Ego: Embedding-Guided Personalization of Vision-Language Models
AI assistants that support humans in daily life are becoming increasingly feasible, driven by the rapid advancements in multimodal language models. A key challenge lies in overcoming the generic na...
Soroush Seifi, Simon Gardier, Vaggelis Dorovatas, Daniel Olmeda Reino, Rahaf Aljundi
LogoDiffuser: Training-Free Multilingual Logo Generation and Stylization via Letter-Aware Attention Control
Recent advances in text-to-image generation have been remarkable, but generating multilingual design logos that harmoniously integrate visual and textual elements remains a challenging task. Existi...
Mingyu Kang, Hyein Seo, Yuna Jeong, Junhyeong Park, Yong Suk Choi
Beyond Fine-Tuning: Robust Food Entity Linking under Ontology Drift with FoodOntoRAG
Standardizing food terms from product labels and menus into ontology concepts is a prerequisite for trustworthy dietary assessment and safety reporting. The dominant approach to Named Entity Linkin...
Jan Drole, Ana Gjorgjevikj, Barbara Korouši'c Seljak, Tome Eftimov
Two-grid Penalty Approximation Scheme for Doubly Reflected BSDEs
We study penalization coupled with time discretization for decoupled Markovian doubly reflected BSDEs with obstacles \(p_b(t,X_t)\le Y_t\le p_w(t,X_t)\). The DRBSDE is approximated by a penalized B...
Wonjae Lee, Hyunbin Park
Epistemic Closure: Autonomous Mechanism Completion for Physically Consistent Simulation
The integration of Large Language Models (LLMs) into scientific discovery is currently hindered by the Implicit Context problem, where governing equations extracted from literature contain invisibl...
Yue Wua, Tianhao Su, Rui Hu, Mingchuan Zhao, Shunbo Hu, Deng Pan, Jizhong Huang
Shaken, not stirred: inefficient mixing of CM- and CI-like materials
A recent study suggests that CM chondrite-like planetesimals formed in the vicinity of Saturn, in a pressure bump outside the gap carved by proto-Jupiter. While a fraction of these objects was impl...
Sarah E. Anderson, Pierre Vernazza, Miroslav Broz
AI-driven Inverse Design of Complex Oxide Thin Films for Semiconductor Devices
Bridging generative foundation models with non-equilibrium thin-film synthesis remains a central challenge, limiting the practical impact of AI-driven materials discovery on semiconductor dielectri...
Bonwook Gu, Trinh Ngoc Le, Wonjoong Kim, Zunair Masroor, Han-Bo-Ram Lee