AI Hub

Research Papers

arXiv · cs.AI, cs.LG, cs.CL · 20 papers

ActionParty: Multi-Subject Action Binding in Generative Video Games

Alexander Pondaven, Ziyi Wu, Igor Gilitschenski +4

Recent advances in video diffusion have enabled the development of "world models" capable of simulating interactive environments. However, these models are largely restricted to single-agent settings, failing to control multiple agents simultaneously

20h ago
cs.CVcs.AIcs.LG

Steerable Visual Representations

Jona Ruthardt, Manu Gaur, Deva Ramanan +2

Pretrained Vision Transformers (ViTs) such as DINOv2 and MAE provide generic image features that can be applied to a variety of downstream tasks such as retrieval, classification, and segmentation. However, such representations tend to focus on the m

20h ago
cs.CVcs.AI

Grounded Token Initialization for New Vocabulary in LMs for Generative Recommendation

Daiwei Chen, Zhoutong Fu, Chengming Jiang +12

Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabu

20h ago
cs.CLcs.AIcs.LG

Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning

Bangji Yang, Hongbo Ma, Jiajun Fan +1

Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or

20h ago
cs.LGcs.AIcs.CL

No Single Best Model for Diversity: Learning a Router for Sample Diversity

Yuhan Liu, Fangyuan Xu, Vishakh Padmakumar +2

When posed with prompts that permit a large number of valid answers, comprehensively generating them is the first step towards satisfying a wide range of users. In this paper, we study methods to elicit a comprehensive set of valid responses. To eval

20h ago
cs.CL

Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models

Sarath Shekkizhar, Romain Cosentino, Adam Earle

Standard LLM benchmarks evaluate the assistant turn: the model generates a response to an input, a verifier scores correctness, and the analysis ends. This paradigm leaves unmeasured whether the LLM encodes any awareness of what follows the assistant

20h ago
cs.AI

Topological Effects in Neural Network Field Theory

Christian Ferko, James Halverson, Vishnu Jejjala +1

Neural network field theory formulates field theory as a statistical ensemble of fields defined by a network architecture and a density on its parameters. We extend the construction to topological settings via the inclusion of discrete parameters tha

20h ago
hep-thcs.LG

go-$m$HC: Direct Parameterization of Manifold-Constrained Hyper-Connections via Generalized Orthostochastic Matrices

Torque Dandachi, Sophia Diggs-Galligan

Doubly stochastic matrices enable learned mixing across residual streams, but parameterizing the set of doubly stochastic matrices (the Birkhoff polytope) exactly and efficiently remains an open challenge. Existing exact methods scale factorially wit

20h ago
cs.LGcs.CL

VOID: Video Object and Interaction Deletion

Saman Motamed, William Harvey, Benjamin Klein +3

Existing video object removal methods excel at inpainting content "behind" the object and correcting appearance-level artifacts such as shadows and reflections. However, when the removed object has more significant interactions, such as collisions wi

20h ago
cs.CVcs.AI

Taming the Exponential: A Fast Softmax Surrogate for Integer-Native Edge Inference

Dimitrios Danopoulos, Enrico Lupi, Michael Kagan +1

Softmax can become a computational bottleneck in the Transformer model's Multi-Head Attention (MHA) block, particularly in small models under low-precision inference, where exponentiation and normalization incur significant overhead. As such, we sugg

20h ago
cs.LGcs.AR

Omni123: Exploring 3D Native Foundation Models with Limited 3D Data by Unifying Text to 2D and 3D Generation

Chongjie Ye, Cheng Cao, Chuanyu Pan +4

Recent multimodal large language models have achieved strong performance in unified text and image understanding and generation, yet extending such native capability to 3D remains challenging due to limited data. Compared to abundant 2D imagery, high

20h ago
cs.CVcs.AI

Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing

Gengsheng Li, Tianyu Yang, Junfeng Fang +6

Reinforcement learning with verifiable rewards (RLVR) has become a standard paradigm for post-training large language models. While Group Relative Policy Optimization (GRPO) is widely adopted, its coarse credit assignment uniformly penalizes failed r

20h ago
cs.LGcs.AI

Novel Memory Forgetting Techniques for Autonomous AI Agents: Balancing Relevance and Efficiency

Payal Fofadiya, Sunil Tiwari

Long-horizon conversational agents require persistent memory for coherent reasoning, yet uncontrolled accumulation causes temporal decay and false memory propagation. Benchmarks such as LOCOMO and LOCCO report performance degradation from 0.455 to 0.

20h ago
cs.AIcs.CV

The Self Driving Portfolio: Agentic Architecture for Institutional Asset Management

Andrew Ang, Nazym Azimbayev, Andrey Kim

Agentic AI shifts the investor's role from analytical execution to oversight. We present an agentic strategic asset allocation pipeline in which approximately 50 specialized agents produce capital market assumptions, construct portfolios using over 2

20h ago
cs.AIcs.MAq-fin.GN

De Jure: Iterative LLM Self-Refinement for Structured Extraction of Regulatory Rules

Keerat Guliani, Deepkamal Gill, David Landsman +3

Regulatory documents encode legally binding obligations that LLM-based systems must respect. Yet converting dense, hierarchically structured legal text into machine-readable rules remains a costly, expert-intensive process. We present De Jure, a full

20h ago
cs.AIcs.CLcs.LG

Crystalite: A Lightweight Transformer for Efficient Crystal Modeling

Tin Hadži Veljković, Joshua Rosenthal, Ivor Lončarić +1

Generative models for crystalline materials often rely on equivariant graph neural networks, which capture geometric structure well but are costly to train and slow to sample. We present Crystalite, a lightweight diffusion Transformer for crystal mod

20h ago
cs.LGcs.AI

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

Zhengxi Lu, Zhiyuan Yao, Jinyang Wu +7

Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limi

20h ago
cs.LG

Model-Based Reinforcement Learning for Control under Time-Varying Dynamics

Klemens Iten, Bruce Lee, Chenhao Li +3

Learning-based control methods typically assume stationary system dynamics, an assumption often violated in real-world systems due to drift, wear, or changing operating conditions. We study reinforcement learning for control under time-varying dynami

21h ago
cs.LGcs.RO

Retrieval-Augmented Question Answering over Scientific Literature for the Electron-Ion Collider

Tina. J. Jat, T. Ghosh, Karthik Suresh

To harness the power of Language Models in answering domain specific specialized technical questions, Retrieval Augmented Generation (RAG) is been used widely. In this work, we have developed a Q\&A application inspired by the Retrieval Augmented

21h ago
hep-excs.AIphysics.ins-det

Best-Arm Identification with Noisy Actuation

Merve Karakas, Osama Hanna, Lin F. Yang +1

In this paper, we consider a multi-armed bandit (MAB) instance and study how to identify the best arm when arm commands are conveyed from a central learner to a distributed agent over a discrete memoryless channel (DMC). Depending on the agent capabi

21h ago
cs.ITcs.LG