Papers
Research papers from arXiv and related sources
Scaling Laws for Educational AI Agents
While scaling laws for Large Language Models (LLMs) have been extensively studied along dimensions of model parameters, training data, and compute, the scaling behavior of LLM-based educational age...
Mengsong Wu, Hao Hao, Shuzhen Bi, Keqian Li, Wentao Liu, Siyu Song, Hongbo Zhao, Aimin Zhou
Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks
Frontier Multimodal Large Language Models (MLLMs) exhibit remarkable capabilities in Visual-Language Comprehension (VLC) tasks. However, they are often deployed as zero-shot solution to new tasks i...
Mei Chee Leong, Ying Gu, Hui Li Tan, Liyuan Li, Nancy Chen
SemBench: A Universal Semantic Framework for LLM Evaluation
Recent progress in Natural Language Processing (NLP) has been driven by the emergence of Large Language Models (LLMs), which exhibit remarkable generative and reasoning capabilities. However, despi...
Mikel Zubillaga, Naiara Perez, Oscar Sainz, German Rigau
LLMs can construct powerful representations and streamline sample-efficient supervised learning
As real-world datasets become increasingly complex and heterogeneous, supervised learning is often bottlenecked by input representation design. Modeling multimodal data for downstream tasks, such a...
Ilker Demirel, Larry Shi, Zeshan Hussain, David Sontag
From Control to Foresight: Simulation as a New Paradigm for Human-Agent Collaboration
Large Language Models (LLMs) are increasingly used to power autonomous agents for complex, multi-step tasks. However, human-agent interaction remains pointwise and reactive: users approve or correc...
Gaole He, Brian Y. Lim
Machine Learning-Based Analysis of Critical Process Parameters Influencing Product Quality Defects: A Real-World Case Study in Manufacturing
Quality control is an essential operation in manufacturing, ensuring products meet the necessary standards of quality, safety, and reliability. Traditional methods, such as visual inspections, meas...
Sukumaran Rajasekaran, Ebru Turanoglu Bekar, Kanika Gandhi, Sabino Francesco Roselli, Mohan Rajas...
Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge
Multimodal Large Language Models (MLLMs) have been widely adopted as MLLM-as-a-Judges due to their strong alignment with human judgment across various visual tasks. However, most existing judge mod...
Junjie Wu, Xuan Kan, Zihao He, Shunwen Tan, Bo Pan, Kaitai Zhang
Resonate: Reinforcing Text-to-Audio Generation via Online Feedback from Large Audio Language Models
Reinforcement Learning (RL) has become an effective paradigm for enhancing Large Language Models (LLMs) and visual generative models. However, its application in text-to-audio (TTA) generation rema...
Xiquan Li, Junxi Liu, Wenxi Chen, Haina Zhu, Ziyang Ma, Xie Chen
Tokenization Allows Multimodal Large Language Models to Understand, Generate and Edit Architectural Floor Plans
Architectural floor plan design demands joint reasoning over geometry, semantics, and spatial hierarchy, which remains a major challenge for current AI systems. Although recent diffusion and langua...
Sizhong Qin, Ramon Elias Weber, Xinzheng Lu
Learnable Template Matching Approach for Micro-Deformation Monitoring based on Integrated Sensing and Communication Platform
Existing integrated sensing and communication (ISAC) platforms fail to fully utilize the shared spectrum and aperture resources for sensing, resulting in poor sensing performance. Specifically, wea...
Zhuoyang Liu, Yixiang Luomei, Feng Xu
Double-twisted surface spectrum from hybridized Majorana Kramers pairs and wallpaper fermions
We theoretically investigate the superconducting surface states of wallpaper fermions, which are surface quasiparticles of topological nonsymmorphic crystalline insulators protected by a wallpaper ...
Kaito Yoda, Ai Yamakage
The Density of Cross-Persistence Diagrams and Its Applications
Topological Data Analysis (TDA) provides powerful tools to explore the shape and structure of data through topological features such as clusters, loops, and voids. Persistence diagrams are a corner...
Alexander Mironenko, Evgeny. Burnaev, Serguei Barannikov
Sema: A High-performance System for LLM-based Semantic Query Processing
The integration of Large Language Models (LLMs) into data analytics has unlocked powerful capabilities for reasoning over bulk structured and unstructured data. However, existing systems typically ...
Kangkang Qi, Dongyang Xie, Wenbo Li, Hao Zhang, Yuanyuan Zhu, Jeffrey Xu Yu, Kangfei Zhao
LaMoGen: Language to Motion Generation Through LLM-Guided Symbolic Inference
Human motion is highly expressive and naturally aligned with language, yet prevailing methods relying heavily on joint text-motion embeddings struggle to synthesize temporally accurate, detailed mo...
Junkun Jiang, Ho Yin Au, Jingyu Xiang, Jie Chen
Performance Evaluation of Open-Source Large Language Models for Assisting Pathology Report Writing in Japanese
The performance of large language models (LLMs) for supporting pathology report writing in Japanese remains unexplored. We evaluated seven open-source LLMs from three perspectives: (A) generation a...
Masataka Kawai, Singo Sakashita, Shumpei Ishikawa, Shogo Watanabe, Anna Matsuoka, Mikio Sakurai, ...
Leveraging Large Language Models and Survival Analysis for Early Prediction of Chemotherapy Outcomes
Chemotherapy for cancer treatment is costly and accompanied by severe side effects, highlighting the critical need for early prediction of treatment outcomes to improve patient management and infor...
Muhammad Faisal Shahid, Asad Afzal, Abdullah Faiz, Muhammad Siddiqui, Arbaz Khan Shehzad, Fatima ...
UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization
The success of a Large Language Model (LLM) task depends heavily on its prompt. Most use-cases specify prompts using natural language, which is inherently ambiguous when multiple objectives must be...
Ofir Marom
Modeling Sequential Design Actions as Designer Externalization on an Infinite Canvas
Infinite canvas platforms are becoming central to contemporary design practice, enabling designers to externalize cognition through the spatial arrangement of multimodal artifacts. As AI agents inc...
Yejin Yun, Seung Won Lee, Jiin Choi, Kyung Hoon Hyun
Where Matters More Than What: Decoding-aligned KV Cache Compression via Position-aware Pseudo Queries
The Key-Value (KV) cache is crucial for efficient Large Language Models (LLMs) inference, but excessively long contexts drastically increase KV cache memory footprint. Existing KV cache compression...
Zhenxu Tian, Yi Su, Juntao Li, Min Zhang
One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries
We present an agentic AI framework for autonomous multimodal query processing that coordinates specialized tools across text, image, audio, video, and document modalities. A central Supervisor dyna...
Mayank Saini Arit Kumar Bishwas