Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation

Overview

Large Language Models (LLMs) are increasingly being explored for complex legal tasks such as argument generation. However, they carry significant risks, including generating manipulative content through hallucinations, making ungrounded persuasive statements, and failing to use provided factual bases effectively or abstain when arguments are not viable. This project introduces a novel reflective multi-agent method to address these critical challenges, aiming to enhance legally compliant persuasion. We encourage you to read the paper for more details and explore our findings.

Our Approach: Reflective Multi-Agent System

Our system generates 3-ply legal arguments, which consist of the plaintiff's initial argument, the defendant's counterargument, and the plaintiff's rebuttal. The core of our innovative approach is the Reflective Multi-Agent (RMA) framework. This framework employs specialized LLM-based agents:

  • A Factor Analyst: Reviews the generated argument for substantive accuracy, identifies hallucinations (factors not present in input cases), and mandates abstention if an argument is untenable.
  • An Argument Polisher: Refines the argument for rhetorical and stylistic quality, ensuring clarity, coherence, and persuasiveness while adhering to the factual grounding established by the Factor Analyst.

These agents engage in an iterative reflection and refinement process for each ply of the argument, ensuring a structured and scrutinized output. (See Figures 1 and 2 in the paper for a visual depiction of the agentic structure and information flow).

Visualizing Our Approach

Figure 1: Overview of the Agentic Structure for Legal Argument Generation, including the RMA framework's reflective components (Factor Analyst, Argument Polisher) interacting with the Argument Developer.
Figure 2: Information Flow of Different Structures: Single Agent (SA), Single Agent with Enhanced Prompting (SA-EP), Multi-Agent Debate without Reflection (MA), and Reflective Multi-Agent (RMA).

How We Evaluated Our Approach

We conducted a rigorous evaluation of the Reflective Multi-Agent (RMA) framework by:

  • Comparing it against multiple baselines:
    • Single-agent LLM.
    • Single-agent LLM with enhanced prompting.
    • Non-reflective multi-agent debate setup.
  • Utilizing four diverse LLMs: GPT-4o, GPT-4o-mini, Llama-4-Maverick-17b-128e, and Llama-4-Scout-17b-16e.
  • Testing across three distinct legal scenarios based on U.S. trade secret misappropriation law, using a standardized set of 26 legal factors:
    • "Arguable": Scenarios where genuine grounds for argumentation exist.
    • "Mismatched": Scenarios where precedent outcomes conflict with their intended use, warranting abstention.
    • "Non-arguable": Scenarios with no relevant factor overlaps, also warranting abstention.
  • Measuring performance using key metrics:
    • Hallucination Accuracy: Quantifying the avoidance of fabricated or misattributed case factors.
    • Factor Utilization Recall: Measuring how comprehensively the LLM incorporates relevant provided case facts.
    • Successful Abstention Ratio: Evaluating the ability to correctly refrain from generating arguments in "mismatched" or "non-arguable" scenarios.

Takeaways

Our Reflective Multi-Agent (RMA) framework demonstrated significant advantages:

  • Vastly Superior Abstention: RMA showed a remarkable ability to successfully abstain from generating arguments in "mismatched" and "non-arguable" scenarios where arguments could not be legitimately grounded. This is crucial for preventing the generation of misleading or unsupportable legal claims.
  • Marked Improvements in Hallucination Accuracy: The RMA approach significantly reduced the fabrication and misattribution of factors, particularly in challenging "non-arguable" scenarios where other methods were more prone to inventing information.
  • Enhanced Factor Utilization Recall: RMA improved the incorporation of provided case facts into the generated arguments, leading to more substantively grounded outputs.
  • Fostering Ethical Persuasion: The structured reflection within the multi-agent framework offers a robust method for guiding LLMs towards more reliable and ethically sound outputs, mitigating manipulation.

Overall Impact

The Reflective Multi-Agent framework represents a critical step towards developing trustworthy AI in law. By systematically analyzing arguments for factual grounding, appropriate factor use, the necessity of abstention, and by polishing them for clarity and coherence, our approach addresses key weaknesses currently observed in LLM-based legal argument generation. These principles of role specialization, iterative refinement, and explicit analysis provide a promising direction for creating AI systems that are not only persuasive but also responsible and ethical, paving the way for more reliable legally compliant intelligent chatbots.

Acknowledgments

We thank the Intelligent Systems Program at the University of Pittsburgh.

BibTeX

@misc{zhang2025mitigating,
        title={Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation},
        author={Li Zhang and Kevin D. Ashley},
        year={2025},
        eprint={2506.02992},
        archivePrefix={arXiv},
        primaryClass={cs.AI}
    }