When Privacy Breaks Teamwork: Why AI Agents Fail to Cooperate

At a Glance

Privacy limits sharply reduce how well AI agents collaborate and make the initiating (private) agent drive outcomes, revealing a major imbalance that current models can’t handle reliably.

ON THIS PAGE

What They Found

Introducing explicit, owner-defined privacy limits causes collaboration performance to drop sharply and become dominated by the agent that owns the private data. Common failure modes include early leaks of sensitive details, overly vague summaries that lose task-critical info, and made-up facts when agents refuse or can’t share needed data. Simple prompt tweaks can prevent some leaks, but they don’t reliably restore joint task performance — deeper design changes and evaluation goals are needed.

Data Highlights

1Benchmark evaluated 4 high-performing models (GPT-5.1, Claude-4.5, LLaMA-3.3-70B, Qwen-3-32B) under privacy constraints.

2Removing explicit privacy instructions caused performance drops across all 4 tested models.

3100% of question-containing messages in single-private settings came from the private agent, showing one-sided information seeking.

What This Means

Engineers building multi-agent systems and product leads deciding how agents share data should care because privacy rules can turn collaboration into a one-way process and sharply reduce success rates. Researchers and reliability teams should use privacy-aware benchmarks like PAC-Bench to surface failure modes before deployment.

Not sure where to start?Get personalized recommendations

Learn More

Key Figures

Figure 1: Illustration of privacy-constrained multi-agent collaboration under different agent ownership. Private agents must coordinate to achieve shared objectives while masking sensitive information. (a) Each agent maintains private memory and constraints (e.g., not revealing another company’s meeting schedule). (b) During collaboration, agents must communicate actionable proposals (e.g., scheduling) without leaking private details.

Fig 1: Figure 1: Illustration of privacy-constrained multi-agent collaboration under different agent ownership. Private agents must coordinate to achieve shared objectives while masking sensitive information. (a) Each agent maintains private memory and constraints (e.g., not revealing another company’s meeting schedule). (b) During collaboration, agents must communicate actionable proposals (e.g., scheduling) without leaking private details.

Figure 2: Overview of the evaluation framework design. Private agents interact in a turn-based manner, each equipped with memory and explicit privacy constraints that guide their reasoning and actions. The resulting interaction trajectory is evaluated by a module that assesses task success and detects potential privacy violations, enabling systematic analysis of collaborative behavior under privacy constraints.

Fig 2: Figure 2: Overview of the evaluation framework design. Private agents interact in a turn-based manner, each equipped with memory and explicit privacy constraints that guide their reasoning and actions. The resulting interaction trajectory is evaluated by a module that assesses task success and detects potential privacy violations, enabling systematic analysis of collaborative behavior under privacy constraints.

Figure 3: End-to-end pipeline for constructing privacy-aware multi-agent collaboration tasks. The pipeline illustrates scenario and goal generation, subgoal decomposition across agents with different ownership, controlled information allocation under explicit privacy constraints, and constraint-aware action generation. Human and rule-based refinement ensures realistic privacy constraints, forming the PAC-Bench dataset.

Fig 3: Figure 3: End-to-end pipeline for constructing privacy-aware multi-agent collaboration tasks. The pipeline illustrates scenario and goal generation, subgoal decomposition across agents with different ownership, controlled information allocation under explicit privacy constraints, and constraint-aware action generation. Human and rule-based refinement ensures realistic privacy constraints, forming the PAC-Bench dataset.

Figure 4: Failure modes in joint privacy and task performance. This figure illustrates three failure modes induced by privacy constraints: early-stage privacy violations, where sensitive information is disclosed before disclosure strategies stabilize; over-conservative abstraction, in which agents preserve privacy by excessively abstracting task-relevant information; and privacy-induced hallucination, where agents generate incorrect task-relevant details instead of indicating uncertainty or refusal.

Fig 4: Figure 4: Failure modes in joint privacy and task performance. This figure illustrates three failure modes induced by privacy constraints: early-stage privacy violations, where sensitive information is disclosed before disclosure strategies stabilize; over-conservative abstraction, in which agents preserve privacy by excessively abstracting task-relevant information; and privacy-induced hallucination, where agents generate incorrect task-relevant details instead of indicating uncertainty or refusal.

Ready to evaluate your AI agents?

Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.

Learn More

Yes, But...

PAC-Bench focuses on two-owner, two-agent scenarios, so results may not generalize to larger groups with coalition dynamics. The experiments use synthetic memories and constraints rather than real personal data, which prevents real-world noise but may miss some practical edge cases. Tool-usage behaviors were examined separately, so conclusions about tool-driven leaks need cross-checking against that appendix work.

Methodology & More

PAC-Bench sets up realistic two-party collaboration tasks where each agent has an owner, a private memory, and explicit rules about what can or cannot be revealed. Agents interact in turns and are evaluated on both task success and privacy violations. The benchmark intentionally measures not just whether the goal is completed, but whether the process and result respect the owners’ disclosure constraints. Across multiple large models, privacy constraints substantially lowered joint performance and created a strong asymmetry: private agents did all the questioning while non-private partners largely just responded, making the initiating agent effectively decide the outcome. Identified failure modes include early-stage privacy violations (sensitive details leaked before stable disclosure strategies form), over-conservative abstractions (critical details removed to avoid leaking), and privacy-induced hallucinations (agents invent specifics rather than admit uncertainty). An ablation showed that explicit privacy instructions in agent prompts are necessary to prevent many immediate leaks, but encouraging step-by-step internal reasoning didn’t reliably improve combined task accuracy. The findings imply evaluation and agent design must treat privacy adherence as a primary objective, not an afterthought, and call for new methods that balance cooperation and confidentiality in multi-owner settings. two-party collaboration tasks

Avoid common pitfallsLearn what failures to watch for

Learn More

Credibility Assessment:

Multiple authors with moderate h-indexes (10–13) and an affiliation at Yonsei University (recognized institution), but still only an arXiv preprint.

multi-agent trust agent-to-agent evaluation agent failure modes

Not sure where to start?