Papers
Research papers from arXiv and related sources
Question-guided Visual Compression with Memory Feedback for Long-Term Video Understanding
In the context of long-term video understanding with large multimodal models, many frameworks have been proposed. Although transformer-based visual compressors and memory-augmented approaches are o...
Sosuke Yamao, Natsuki Miyahara, Yuankai Qi, Shun Takeuchi
HindSight: Evaluating Research Idea Generation via Future Impact
Evaluating AI-generated research ideas typically relies on LLM judges or human panels -- both subjective and disconnected from actual research impact. We introduce \hs{}, a time-split evaluation fr...
Bo Jiang
To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
Large Language Models (LLMs) have shown strong potential for code generation, yet they remain limited in private-library-oriented code generation, where the goal is to generate code using APIs from...
Yitong Zhang, Chengze Li, Ruize Chen, Guowei Yang, Xiaoran Jia, Yijie Ren, Jia Li
ANNA: a toolbox for Newtonian Noise Analysis
The Einstein Telescope (ET) is a third-generation underground gravitational wave observatory designed to achieve an unprecedented sensitivity down to 3 Hz. Waves propagating in the soil due to anth...
Pieter Reumers, Xhorxha Kucia, Stijn François, Geert Degrande
Storage and selection of multiple chaotic attractors in minimal reservoir computers
Modern predictive modeling increasingly calls for a single learned dynamical substrate to operate across multiple regimes. From a dynamical-systems viewpoint, this capability decomposes into the st...
Francesco Martinuzzi, Holger Kantz
Vision-Language Model Based Multi-Expert Fusion for CT Image Classification
Robust detection of COVID-19 from chest CT remains challenging in multi-institutional settings due to substantial source shift, source imbalance, and hidden test-source identities. In this work, we...
Jianfa Bai, Kejin Lu, Runtian Yuan, Qingqiu Li, Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng
TextOVSR: Text-Guided Real-World Opera Video Super-Resolution
Many classic opera videos exhibit poor visual quality due to the limitations of early filming equipment and long-term degradation during storage. Although real-world video super-resolution (RWVSR) ...
Hua Chang, Xin Xu, Wei Liu, Jiayi Wu, Kui Jiang, Fei Ma, Qi Tian
Confusion-Aware In-Context-Learning for Vision-Language Models in Robotic Manipulation
Vision-language models (VLMs) have significantly improved the generalization capabilities of robotic manipulation. However, VLM-based systems often suffer from a lack of robustness, leading to unpr...
Yayun He, Zuheng Kang, Botao Zhao, Zhouyin Wu, Junqing Peng, Jianzong Wang
Indirect Question Answering in English, German and Bavarian: A Challenging Task for High- and Low-Resource Languages Alike
Indirectness is a common feature of daily communication, yet is underexplored in NLP research for both low-resource as well as high-resource languages. Indirect Question Answering (IQA) aims at cla...
Miriam Winkler, Verena Blaschke, Barbara Plank
Next-Frame Decoding for Ultra-Low-Bitrate Image Compression with Video Diffusion Priors
We present a novel paradigm for ultra-low-bitrate image compression (ULB-IC) that exploits the ``temporal'' evolution in generative image compression. Specifically, we define an explicit intermedia...
Yunuo Chen, Chuqin Zhou, Jiangchuan Li, Xiaoyue Ling, Bing He, Jincheng Dai, Li Song, Guo Lu
A Data-Driven Regional Model for Skillful Medium-Range Typhoon Prediction
Accurate prediction of tropical cyclones remains a major challenge for both numerical weather prediction and emerging artificial intelligence weather prediction systems. While recent global AI mode...
Zeyi Niu, Wei Huang, Sirong Huang, Zhuo Wang, Mu Mu, Mengqi Yang, Xinhai Han, Haofei Sun, Zhaoyan...
From Storage to Steering: Memory Control Flow Attacks on LLM Agents
Modern agentic systems allow Large Language Model (LLM) agents to tackle complex tasks through extensive tool usage, forming structured control flows of tool selection and execution. Existing secur...
Zhenlin Xu, Xiaogang Zhu, Yu Yao, Minhui Xue, Yiliao Song
Establishing Construct Validity in LLM Capability Benchmarks Requires Nomological Networks
Recent work in machine learning increasingly attributes human-like capabilities such as reasoning or theory of mind to large language models (LLMs) on the basis of benchmark performance. This paper...
Timo Freiesleben
How Attention Shapes Emotion: A Comparative Study of Attention Mechanisms for Speech Emotion Recognition
Speech Emotion Recognition (SER) plays a key role in advancing human-computer interaction. Attention mechanisms have become the dominant approach for modeling emotional speech due to their ability ...
Marc Casals-Salvador, Federico Costa, Rodolfo Zevallos, Javier Hernando
Generalized Tadmor Conditions and Structure-Preserving Numerical Fluxes for the Compressible Flow of Real Gases
We generalize Tadmor's algebraic numerical flux condition for entropy-conservative discretizations of conservation laws to a broader class of secondary structures, i.e. possibly non-convex secondar...
Robin Klein, Benjamin Sanderse, Pedro Costa, Rene Pecnik, Ruud Henkes
Sampling-guided exploration of active feature selection policies
Determining the most appropriate features for machine learning predictive models is challenging regarding performance and feature acquisition costs. In particular, global feature choice is limited ...
Gabriel Bernardino, Anders Jonsson, Patrick Clarysse, Nicolas Duchateau
Generation of Programming Exam Question and Answer Using ChatGPT Based on Prompt Engineering
In computer science, students are encouraged to learn various programming languages such as Python, C++, and Java, equipping them with a broad range of technical skills and problem-solving capabili...
Jongwook Si, Sungyoung Kim
Synergizing a Decentralized Framework with LLM-Assisted Skill and Willingness-Aware Task Assignment for Volunteer Crowdsourcing
Volunteer crowdsourcing or VCS platforms increasingly support education, healthcare, disaster response, and smart city applications, yet assigning volunteers to complex tasks remains challenging du...
Riya Samanta, Rituparna Bhattyacharya
Beam Prediction Based on Multimodal Large Language Models
Accurate beam prediction is a key enabler for next-generation wireless communication systems. In this paper, we propose a multimodal large language model (LLM)-based beam prediction framework that ...
Tianhao Mao, Le Liang, Jie Yang, Xiao Li, Shi Jin
Beyond Monolithic Models: Symbolic Seams for Composable Neuro-Symbolic Architectures
Current Artificial Intelligence (AI) systems are frequently built around monolithic models that entangle perception, reasoning, and decision-making, a design that often conflicts with established s...
Nicolas Schuler, Vincenzo Scotti, Raffaela Mirandola