Arxiv今日论文 | 2026-03-05

每日自动更新时间: 每天12:30左右

总计论文: 1087 篇

cs.CL (167篇)

AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents
Wenhui Zhu, Xiwen Chen, Zhipeng Wang, Jingjing Wang, Xuanzhao Dong, Minzhou Huang, Rui Cai, Hejian Sang, Hao Wang, Peijie Qiu, Yueyue Deng, Prayag Tiwari, Brendan Hogan Rappazzo, Yalin Wang
arXiv:2603.03290v1 Announce Type: new Abstract: Long-horizon LLM agents require memory systems that remain accurate under fixed context budgets. However, existing systems struggle with two persistent challenges in long-term dialogue: (i) \textbf{disconnected evidence}, where multi-hop answers requi...
One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models
Daniel Fein, Max Lamparth, Violet Xiang, Mykel J. Kochenderfer, Nick Haber
arXiv:2603.03291v1 Announce Type: new Abstract: Reward Models (RMs) are crucial for online alignment of language models (LMs) with human preferences. However, RM-based preference-tuning is vulnerable to reward hacking, whereby LM policies learn undesirable behaviors from flawed RMs. By systematical...
From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG
Wenhao Wu, Zhentao Tang, Yafu Li, Shixiong Kai, Mingxuan Yuan, Zhenhong Sun, Chunlin Chen, Zhi Wang
arXiv:2603.03292v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation (RAG) mitig...
SE-Search: Self-Evolving Search Agent via Memory and Dense Reward
Jian Li, Yizhang Jin, Dongqi Liu, Hang Ding, Jiafu Wu, Dongsheng Chen, Yunhang Shen, Yulei Qin, Ying Tai, Chengjie Wang, Xiaotong Yuan, Yabiao Wang
arXiv:2603.03293v1 Announce Type: new Abstract: Retrieval augmented generation (RAG) reduces hallucinations and factual errors in large language models (LLMs) by conditioning generation on retrieved external knowledge. Recent search agents further cast RAG as an autonomous, multi-turn information-s...
Fine-Tuning and Evaluating Conversational AI for Agricultural Advisory
Sanyam Singh, Naga Ganesh, Vineet Singh, Lakshmi Pedapudi, Ritesh Kumar, SSP Jyothi, Archana Karanam, C. Yashoda, Mettu Vijaya Rekha Reddy, Shesha Phani Debbesa, Chandan Dash
arXiv:2603.03294v1 Announce Type: new Abstract: Large Language Models show promise for agricultural advisory, yet vanilla models exhibit unsupported recommendations, generic advice lacking specific, actionable detail, and communication styles misaligned with smallholder farmer needs. In high stakes...
Language Model Goal Selection Differs from Humans' in an Open-Ended Task
Gaia Molinaro, Dave August, Danielle Perszyk, Anne G. E. Collins
arXiv:2603.03295v1 Announce Type: new Abstract: As large language models (LLMs) get integrated into human decision-making, they are increasingly choosing goals autonomously rather than only completing human-defined ones, assuming they will reflect human preferences. However, human-LLM similarity in...
PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents
Ke Yang, Zixi Chen, Xuan He, Jize Jiang, Michel Galley, Chenglong Wang, Jianfeng Gao, Jiawei Han, ChengXiang Zhai
arXiv:2603.03296v1 Announce Type: new Abstract: Long-term memory is essential for large language model (LLM) agents operating in complex environments, yet existing memory designs are either task-specific and non-transferable, or task-agnostic but less effective due to low task-relevance and context...
TTSR: Test-Time Self-Reflection for Continual Reasoning Improvement
Haoyang He, Zihua Rong, Liangjie Zhao, Yunjia Zhao, Lan Yang, Honggang Zhang
arXiv:2603.03297v1 Announce Type: new Abstract: Test-time Training enables model adaptation using only test questions and offers a promising paradigm for improving the reasoning ability of large language models (LLMs). However, it faces two major challenges: test questions are often highly difficul...
TATRA: Training-Free Instance-Adaptive Prompting Through Rephrasing and Aggregation
Bartosz Dziuba, Kacper Kuchta, Pawe{\l} Batorski, Przemys{\l}aw Spurek, Paul Swoboda
arXiv:2603.03298v1 Announce Type: new Abstract: Large Language Models (LLMs) have improved substantially alignment, yet their behavior remains highly sensitive to prompt phrasing. This brittleness has motivated automated prompt engineering, but most existing methods (i) require a task-specific trai...
How LLMs Cite and Why It Matters: A Cross-Model Audit of Reference Fabrication in AI-Assisted Academic Writing and Methods to Detect Phantom Citations
MZ Naser
arXiv:2603.03299v1 Announce Type: new Abstract: Large language models (LLMs) have been noted to fabricate scholarly citations, yet the scope of this behavior across providers, domains, and prompting conditions remains poorly quantified. We present one of the largest citation hallucination audits to...

cs.CV (220篇)

mHC-HSI: Clustering-Guided Hyper-Connection Mamba for Hyperspectral Image Classification
Yimin Zhu, Zack Dewis, Quinn Ledingham, Saeid Taleghanidoozdoozan, Mabel Heffring, Zhengsen Xu, Motasem Alkayid, Megan Greenwood, Lincoln Linlin Xu
arXiv:2603.03418v1 Announce Type: new Abstract: Recently, DeepSeek has invented the manifold-constrained hyper-connection (mHC) approach which has demonstrated significant improvements over the traditional residual connection in deep learning models \cite{xie2026mhc}. Nevertheless, this approach ha...
Beyond Accuracy: Evaluating Visual Grounding In Multimodal Medical Reasoning
Anas Zafar, Leema Krishna Murali, Ashish Vashist
arXiv:2603.03437v1 Announce Type: new Abstract: Recent work shows that text-only reinforcement learning with verifiable rewards (RLVR) can match or outperform image-text RLVR on multimodal medical VQA benchmarks, suggesting current evaluation protocols may fail to measure causal visual dependence. ...
Proact-VL: A Proactive VideoLLM for Real-Time AI Companions
Weicai Yan, Yuhong Dai, Qi Ran, Haodong Li, Wang Lin, Hao Liao, Xing Xie, Tao Jin, Jianxun Lian
arXiv:2603.03447v1 Announce Type: new Abstract: Proactive and real-time interactive experiences are essential for human-like AI companions, yet face three key challenges: (1) achieving low-latency inference under continuous streaming inputs, (2) autonomously deciding when to respond, and (3) contro...
Beyond Pixel Histories: World Models with Persistent 3D State
Samuel Garcin, Thomas Walker, Steven McDonagh, Tim Pearce, Hakan Bilen, Tianyu He, Kaixin Wang, Jiang Bian
arXiv:2603.03482v1 Announce Type: new Abstract: Interactive world models continually generate video by responding to a user's actions, enabling open-ended generation capabilities. However, existing models typically lack a 3D representation of the environment, meaning 3D consistency must be implicit...
Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion
Haoran Lu, Shang Wu, Jianshu Zhang, Maojiang Su, Guo Ye, Chenwei Xu, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Zhaoran Wang, Han Liu
arXiv:2603.03485v1 Announce Type: new Abstract: Recent video diffusion models have achieved impressive capabilities as large-scale generative world models. However, these models often struggle with fine-grained physical consistency, exhibiting physically implausible dynamics over time. In this work...
Geographically-Weighted Weakly Supervised Bayesian High-Resolution Transformer for 200m Resolution Pan-Arctic Sea Ice Concentration Mapping and Uncertainty Estimation using Sentinel-1, RCM, and AMSR2 Data
Mabel Heffring, Lincoln Linlin Xu
arXiv:2603.03503v1 Announce Type: new Abstract: Although high-resolution mapping of pan-Arctic sea ice with reliable corresponding uncertainty is essential for operational sea ice concentration (SIC) charting, it is a difficult task due to key challenges, such as the subtle nature of ice signature ...
PhyPrompt: RL-based Prompt Refinement for Physically Plausible Text-to-Video Generation
Shang Wu, Chenwei Xu, Zhuofan Xia, Weijian Li, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Han Liu
arXiv:2603.03505v1 Announce Type: new Abstract: State-of-the-art text-to-video (T2V) generators frequently violate physical laws despite high visual quality. We show this stems from insufficient physical constraints in prompts rather than model limitations: manually adding physics details reliably ...
PinCLIP: Large-scale Foundational Multimodal Representation at Pinterest
Josh Beal, Eric Kim, Jinfeng Rao, Rex Wu, Dmitry Kislyuk, Charles Rosenberg
arXiv:2603.03544v1 Announce Type: new Abstract: While multi-modal Visual Language Models (VLMs) have demonstrated significant success across various domains, the integration of VLMs into recommendation and retrieval systems remains a challenge, due to issues like training objective discrepancies an...
Modeling Cross-vision Synergy for Unified Large Vision Model
Shengqiong Wu, Lanhu Wu, Mingyang Bao, Wenhao Xu, Hanwang Zhang, Shuicheng Yan, Hao Fei, Tat-Seng Chua
arXiv:2603.03564v1 Announce Type: new Abstract: Recent advances in large vision models (LVMs) have shifted from modality-specific designs toward unified architectures that jointly process images, videos, and 3D data. However, existing unified LVMs primarily pursue functional integration, while over...
Confidence-aware Monocular Depth Estimation for Minimally Invasive Surgery
Muhammad Asad, Emanuele Colleoni, Pritesh Mehta, Nicolas Toussaint, Ricardo Sanchez-Matilla, Maria Robu, Faisal Bashir, Rahim Mohammadi, Imanol Luengo, Danail Stoyanov
arXiv:2603.03571v1 Announce Type: new Abstract: Purpose: Monocular depth estimation (MDE) is vital for scene understanding in minimally invasive surgery (MIS). However, endoscopic video sequences are often contaminated by smoke, specular reflections, blur, and occlusions, limiting the accuracy of M...

cs.LG (307篇)

Knowledge Graph and Hypergraph Transformers with Repository-Attention and Journey-Based Role Transport
Mahesh Godavarti
arXiv:2603.03304v1 Announce Type: new Abstract: We present a concise architecture for joint training on sentences and structured data while keeping knowledge and language representations separable. The model treats knowledge graphs and hypergraphs as structured instances with role slots and encodes...
AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis
Pei Yang, Wanyi Chen, Yuxi Zheng, Xueqian Li, Xiang Li, Haoqin Tu, Jie Xiao, Yifan Pang, Bill Shi, Lynn Ai, Eric Yang
arXiv:2603.03378v1 Announce Type: new Abstract: Large language model (LLM) agents offer a promising data-driven approach to automating Site Reliability Engineering (SRE), yet their enterprise deployment is constrained by three challenges: restricted access to proprietary data, unsafe action executi...
RADAR: Learning to Route with Asymmetry-aware DistAnce Representations
Hang Yi, Ziwei Huang, Yining Ma, Zhiguang Cao
arXiv:2603.03388v1 Announce Type: new Abstract: Recent neural solvers have achieved strong performance on vehicle routing problems (VRPs), yet they mainly assume symmetric Euclidean distances, restricting applicability to real-world scenarios. A core challenge is encoding the relational features in...
Towards Improved Sentence Representations using Token Graphs
Krishna Sri Ipsit Mantri, Carola-Bibiane Sch\"onlieb, Zorah L\"ahner, Moshe Eliasof
arXiv:2603.03389v1 Announce Type: new Abstract: Obtaining a single-vector representation from a Large Language Model's (LLM) token-level outputs is a critical step for nearly all sentence-level tasks. However, standard pooling methods like mean or max aggregation treat tokens as an independent set,...
Heterogeneous Time Constants Improve Stability in Equilibrium Propagation
Yoshimasa Kubo, Suhani Pragnesh Modi, Smit Patel
arXiv:2603.03402v1 Announce Type: new Abstract: Equilibrium propagation (EP) is a biologically plausible alternative to backpropagation for training neural networks. However, existing EP models use a uniform scalar time step dt, which corresponds biologically to a membrane time constant that is het...
A Short Note on a Variant of the Squint Algorithm
Haipeng Luo
arXiv:2603.03409v1 Announce Type: new Abstract: This short note describes a simple variant of the Squint algorithm of Koolen and Van Erven [2015] for the classic expert problem. Via an equally simple modification of their proof, we prove that this variant ensures a regret bound that resembles the o...
[Re] FairDICE: A Gap Between Theory And Practice
Peter Adema, Karim Galliamov, Aleksey Evstratovskiy, Ross Geurts
arXiv:2603.03454v1 Announce Type: new Abstract: Offline Reinforcement Learning (RL) is an emerging field of RL in which policies are learned solely from demonstrations. Within offline RL, some environments involve balancing multiple objectives, but existing multi-objective offline RL algorithms do ...
Half the Nonlinearity Is Wasted: Measuring and Reallocating the Transformer's MLP Budget
Peter Balogh
arXiv:2603.03459v1 Announce Type: new Abstract: We investigate when transformer MLP nonlinearity is actually necessary. A gate with $d+1$ parameters decides when to replace the full MLP with a linear surrogate. Through systematic investigation across six models (162M-2.8B parameters), two architect...
Graph Hopfield Networks: Energy-Based Node Classification with Associative Memory
Abinav Rao, Alex Wa, Rishi Athavale
arXiv:2603.03464v1 Announce Type: new Abstract: We introduce Graph Hopfield Networks, whose energy function couples associative memory retrieval with graph Laplacian smoothing for node classification. Gradient descent on this joint energy yields an iterative update interleaving Hopfield retrieval w...
Biased Generalization in Diffusion Models
Jerome Garnier-Brun, Luca Biggio, Davide Beltrame, Marc M\'ezard, Luca Saglietti
arXiv:2603.03469v1 Announce Type: new Abstract: Generalization in generative modeling is defined as the ability to learn an underlying distribution from a finite dataset and produce novel samples, with evaluation largely driven by held-out performance and perceived sample quality. In practice, trai...

cs.AI (348篇)

Asymmetric Goal Drift in Coding Agents Under Value Conflict
Magnus Saebo, Spencer Gibson, Tyler Crosse, Achyutha Menon, Eyon Jang, Diogo Cruz
arXiv:2603.03456v1 Announce Type: new Abstract: Agentic coding agents are increasingly deployed autonomously, at scale, and over long-context horizons. Throughout an agent's lifetime, it must navigate tensions between explicit instructions, learned values, and environmental pressures, often in cont...
Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants
Alejandro Breen Herrera, Aayush Sheth, Steven G. Xu, Zhucheng Zhan, Charles Wright, Marcus Yearwood, Hongtai Wei, Sudeep Das
arXiv:2603.03565v1 Announce Type: new Abstract: Conversational shopping assistants (CSAs) represent a compelling application of agentic AI, but moving from prototype to production reveals two underexplored challenges: how to evaluate multi-turn interactions and how to optimize tightly coupled multi...
Mozi: Governed Autonomy for Drug Discovery LLM Agents
He Cao, Siyu Liu, Fan Zhang, Zijing Liu, Hao Li, Bin Feng, Shengyuan Bai, Leqing Chen, Kai Xie, Yu Li
arXiv:2603.03655v1 Announce Type: new Abstract: Tool-augmented large language model (LLM) agents promise to unify scientific reasoning with computation, yet their deployment in high-stakes domains like drug discovery is bottlenecked by two critical barriers: unconstrained tool-use governance and po...
MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation
Lu Yang, Zelai Xu, Minyang Xie, Jiaxuan Gao, Zhao Shok, Yu Wang, Yi Wu
arXiv:2603.03680v1 Announce Type: new Abstract: Large Language Model (LLM) agents have demonstrated remarkable proficiency in learned tasks, yet they often struggle to adapt to non-stationary environments with feedback. While In-Context Learning and external memory offer some flexibility, they fail...
AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment
Jiangyu Chen
arXiv:2603.03686v1 Announce Type: new Abstract: Automated design of chemical formulations is a cornerstone of materials science, yet it requires navigating a high-dimensional combinatorial space involving discrete compositional choices and continuous geometric constraints. Existing Large Language M...
RAGNav: A Retrieval-Augmented Topological Reasoning Framework for Multi-Goal Visual-Language Navigation
Ling Luo, Qiangian Bai
arXiv:2603.03745v1 Announce Type: new Abstract: Vision-Language Navigation (VLN) is evolving from single-point pathfinding toward the more challenging Multi-Goal VLN. This task requires agents to accurately identify multiple entities while collaboratively reasoning over their spatial-physical const...
AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation
Yunxiao Shi, Wujiang Xu, Tingwei Chen, Haoning Shang, Ling Yang, Yunfeng Wan, Zhuo Cao, Xing Zi, Dimitris N. Metaxas, Min Xu
arXiv:2603.03761v1 Announce Type: new Abstract: LLM agents are rapidly becoming the practical interface for task automation, yet the ecosystem lacks a principled way to choose among an exploding space of deployable configurations. Existing LLM leaderboards and tool/agent benchmarks evaluate compone...
LifeBench: A Benchmark for Long-Horizon Multi-Source Memory
Zihao Cheng, Weixin Wang, Yu Zhao, Ziyang Ren, Jiaxuan Chen, Ruiyang Xu, Shuai Huang, Yang Chen, Guowei Li, Mengshi Wang, Yi Xie, Ren Zhu, Zeren Jiang, Keda Lu, Yihong Li, Xiaoliang Wang, Liwei Liu, Cam-Tu Nguyen
arXiv:2603.03781v1 Announce Type: new Abstract: Long-term memory is fundamental for personalized agents capable of accumulating knowledge, reasoning over user experiences, and adapting across time. However, existing memory benchmarks primarily target declarative memory, specifically semantic and ep...
Specification-Driven Generation and Evaluation of Discrete-Event World Models via the DEVS Formalism
Zheyu Chen, Zhuohuan Li, Chuanhao Li
arXiv:2603.03784v1 Announce Type: new Abstract: World models are essential for planning and evaluation in agentic systems, yet existing approaches lie at two extremes: hand-engineered simulators that offer consistency and reproducibility but are costly to adapt, and implicit neural models that are ...
A Rubric-Supervised Critic from Sparse Real-World Outcomes
Xingyao Wang, Valerie Chen, Heng Ji, Graham Neubig
arXiv:2603.03800v1 Announce Type: new Abstract: Academic benchmarks for coding agents tend to reward autonomous task completion, measured by verifiable rewards such as unit-test success. In contrast, real-world coding agents operate with humans in the loop, where success signals are typically noisy...

cs.IR (29篇)

MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning
Jiejun Tan, Zhicheng Dou, Liancheng Zhang, Yuyang Hu, Yiruo Cheng, Ji-Rong Wen
arXiv:2603.03379v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly used for long-duration tasks, maintaining effective long-term memory has become a critical challenge. Current methods often face a trade-off between cost and accuracy. Simple storage methods often fail ...
Behind the Prompt: The Agent-User Problem in Information Retrieval
Saber Zerhoudi, Michael Granitzer, Dang Hai Dang, Jelena Mitrovic, Florian Lemmerich, Annette Hautli-Janisz, Stefan Katzenbeisser, Kanishka Ghosh Dastidar
arXiv:2603.03630v1 Announce Type: new Abstract: User models in information retrieval rest on a foundational assumption that observed behavior reveals intent. This assumption collapses when the user is an AI agent privately configured by a human operator. For any action an agent takes, a hidden inst...
Not All Candidates are Created Equal: A Heterogeneity-Aware Approach to Pre-ranking in Recommender Systems
Pengfei Tong, Siyuan Chen, Chenwei Zhang, Bo Wang, Qi Pi, Pixun Li, Zuotao Liu
arXiv:2603.03770v1 Announce Type: new Abstract: Most large-scale recommender systems follow a multi-stage cascade of retrieval, pre-ranking, ranking, and re-ranking. A key challenge at the pre-ranking stage arises from the heterogeneity of training instances sampled from coarse-grained retrieval re...
DisenReason: Behavior Disentanglement and Latent Reasoning for Shared-Account Sequential Recommendation
Jiawei Cheng, Min Gao, Zongwei Wang, Xiaofei Zhu, Zhiyi Liu, Wentao Li, Wei Li, Huan Wu
arXiv:2603.03782v1 Announce Type: new Abstract: Shared-account usage is common on streaming and e-commerce platforms, where multiple users share one account. Existing shared-account sequential recommendation (SSR) methods often assume a fixed number of latent users per account, limiting their abili...
SORT: A Systematically Optimized Ranking Transformer for Industrial-scale Recommenders
Chunqi Wang, Bingchao Wu, Taotian Pang, Jiahao Wang, Jie Yang, Jia Liu, Hao Zhang, Hai Zhu, Lei Shen, Shizhun Wang, Bing Wang, Xiaoyi Zeng
arXiv:2603.03988v1 Announce Type: new Abstract: While Transformers have achieved remarkable success in LLMs through superior scalability, their application in industrial-scale ranking models remains nascent, hindered by the challenges of high feature sparsity and low label density. In this paper, w...
Constraint-Aware Generative Re-ranking for Multi-Objective Optimization in Advertising Feeds
Chenfei Li, Hantao Zhao, Weixi Yao, Ruiming Huang, Rongrong Lu, Geng Tian, Dongying Kong
arXiv:2603.04227v1 Announce Type: new Abstract: Optimizing reranking in advertising feeds is a constrained combinatorial problem, requiring simultaneous maximization of platform revenue and preservation of user experience. Recent generative ranking methods enable listwise optimization via autoregre...
CAMMSR: Category-Guided Attentive Mixture of Experts for Multimodal Sequential Recommendation
Jinfeng Xu, Zheyu Chen, Shuo Yang, Jinze Li, Hewei Wang, Yijie Li, Jianheng Tang, Yunhuai Liu, Edith C. H. Ngai
arXiv:2603.04320v1 Announce Type: new Abstract: The explosion of multimedia data in information-rich environments has intensified the challenges of personalized content discovery, positioning recommendation systems as an essential form of passive data management. Multimodal sequential recommendatio...
AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents
Wenhui Zhu, Xiwen Chen, Zhipeng Wang, Jingjing Wang, Xuanzhao Dong, Minzhou Huang, Rui Cai, Hejian Sang, Hao Wang, Peijie Qiu, Yueyue Deng, Prayag Tiwari, Brendan Hogan Rappazzo, Yalin Wang
arXiv:2603.03290v1 Announce Type: cross Abstract: Long-horizon LLM agents require memory systems that remain accurate under fixed context budgets. However, existing systems struggle with two persistent challenges in long-term dialogue: (i) \textbf{disconnected evidence}, where multi-hop answers req...
From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG
Wenhao Wu, Zhentao Tang, Yafu Li, Shixiong Kai, Mingxuan Yuan, Zhenhong Sun, Chunlin Chen, Zhi Wang
arXiv:2603.03292v1 Announce Type: cross Abstract: Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation (RAG) mit...
PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents
Ke Yang, Zixi Chen, Xuan He, Jize Jiang, Michel Galley, Chenglong Wang, Jianfeng Gao, Jiawei Han, ChengXiang Zhai
arXiv:2603.03296v1 Announce Type: cross Abstract: Long-term memory is essential for large language model (LLM) agents operating in complex environments, yet existing memory designs are either task-specific and non-transferable, or task-agnostic but less effective due to low task-relevance and conte...

cs.MA (16篇)

Multi-Agent Influence Diagrams to Hybrid Threat Modeling
Maarten C. Vonk, Anna V. Kononova, Thomas B\"ack, Tim Sweijs
arXiv:2603.03526v1 Announce Type: new Abstract: Western governments have adopted an assortment of counter-hybrid threat measures to defend against hostile actions below the conventional military threshold. The impact of these measures is unclear because of the ambiguity of hybrid threats, their cro...
Molt Dynamics: Emergent Social Phenomena in Autonomous AI Agent Populations
Brandon Yee, Krishna Sharma
arXiv:2603.03555v1 Announce Type: new Abstract: MoltBook is a large-scale multi-agent coordination environment where over 770,000 autonomous LLM agents interact without human participation, offering the first opportunity we are aware of to observe emergent multi-agent coordination dynamics at this ...
Social Norm Reasoning in Multimodal Language Models: An Evaluation
Oishik Chowdhury, Anushka Debnath, Bastin Tony Roy Savarimuthu
arXiv:2603.03590v1 Announce Type: new Abstract: In Multi-Agent Systems (MAS), agents are designed with social capabilities, allowing them to understand and reason about social concepts such as norms when interacting with others (e.g., inter-robot interactions). In Normative MAS (NorMAS), researcher...
Learning Approximate Nash Equilibria in Cooperative Multi-Agent Reinforcement Learning via Mean-Field Subsampling
Emile Anand, Ishani Karmarkar
arXiv:2603.03759v1 Announce Type: new Abstract: Many large-scale platforms and networked control systems have a centralized decision maker interacting with a massive population of agents under strict observability constraints. Motivated by such applications, we study a cooperative Markov game with ...
MACC: Multi-Agent Collaborative Competition for Scientific Exploration
Satoshi Oyama, Yuko Sakurai, Hisashi Kashima
arXiv:2603.03780v1 Announce Type: new Abstract: Scientific discovery still relies heavily on the manual efforts of individual researchers, leading to limited exploration, redundant trials, and reduced reproducibility. Human-participant data analysis competitions generate diverse approaches, yet flu...
TritonDFT: Automating DFT with a Multi-Agent Framework
Zhengding Hu, Kuntal Talit, Zhen Wang, Haseeb Ahmad, Yichen Lin, Prabhleen Kaur, Christopher Lane, Elizabeth A. Peterson, Zhiting Hu, Elizabeth A. Nowadnick, Yufei Ding
arXiv:2603.03372v1 Announce Type: cross Abstract: Density Functional Theory (DFT) is a cornerstone of materials science, yet executing DFT in practice requires coordinating a complex, multi-step workflow. Existing tools and LLM-based solutions automate parts of the steps, but lack support for full ...
Behind the Prompt: The Agent-User Problem in Information Retrieval
Saber Zerhoudi, Michael Granitzer, Dang Hai Dang, Jelena Mitrovic, Florian Lemmerich, Annette Hautli-Janisz, Stefan Katzenbeisser, Kanishka Ghosh Dastidar
arXiv:2603.03630v1 Announce Type: cross Abstract: User models in information retrieval rest on a foundational assumption that observed behavior reveals intent. This assumption collapses when the user is an AI agent privately configured by a human operator. For any action an agent takes, a hidden in...
Principled Learning-to-Communicate with Quasi-Classical Information Structures
Xiangyu Liu, Haoyi You, Kaiqing Zhang
arXiv:2603.03664v1 Announce Type: cross Abstract: Learning-to-communicate (LTC) in partially observable environments has received increasing attention in deep multi-agent reinforcement learning, where the control and communication strategies are jointly learned. Meanwhile, the impact of communicati...
In-Context Environments Induce Evaluation-Awareness in Language Models
Maheep Chaudhary
arXiv:2603.03824v1 Announce Type: cross Abstract: Humans often become more self-aware under threat, yet can lose self-awareness when absorbed in a task; we hypothesize that language models exhibit environment-dependent \textit{evaluation awareness}. This raises concerns that models could strategica...
Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization
Furkan Mumcu, Yasin Yilmaz
arXiv:2603.04378v1 Announce Type: cross Abstract: As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local curvature in the inner maximization. S...