Concept Note

LLM Consensus Engine

A panel of large language models debate a shared question, then anonymously vote on the most compelling response.

AI Consensus Debate Reasoning Collective intelligence

← Back to Concepts | Home

Executive Summary

The LLM Consensus Engine simulates structured debates between multiple large language models (GPT-4o, Claude, Gemini) on a shared question. Models present arguments, engage in multiple rounds of rebuttals, then anonymously vote on the most compelling response based on reasoning quality, clarity, and relevance.

The goal isn't to find "the right answer"—it's to explore how consensus emerges (or fails to) across different model architectures, how disagreement patterns develop, and what happens when AI systems evaluate each other's reasoning.

Core question: When AI models debate and vote on each other's arguments, what patterns emerge? How does consensus differ from individual model responses? What can we learn about reasoning, disagreement, and collective intelligence?

What It Does

Structured Debate Simulation

Models like GPT-4o, Claude, and Gemini participate in multi-round debates with initial arguments and rebuttals.

Anonymous Voting

After debate rounds, models vote on the most compelling response, evaluating reasoning quality rather than model identity.

Transparent Rationale

Each vote includes explicit reasoning, enabling post-analysis of consensus patterns and disagreement drivers.

Debug & Analysis Tools

Debug panels and voting rationale views help understand how consensus forms (or breaks down) across different questions.

Why This Is Interesting

Consensus vs. Individual Responses

Individual AI models give individual answers. But when models debate and vote collectively, different patterns emerge. Consensus voting can surface reasoning that no single model would have produced alone, or reveal fundamental disagreements that individual responses mask.

Cross-Model Reasoning Evaluation

The engine doesn't just aggregate responses—it has models evaluate each other's reasoning. GPT-4 votes on Claude's arguments. Claude evaluates Gemini's logic. This creates a meta-reasoning layer: AI systems judging AI systems, revealing how different architectures evaluate quality, coherence, and persuasiveness.

Exploring Disagreement Patterns

When do models converge? When do they diverge? The engine tracks consensus formation across rounds, showing how initial disagreements evolve (or don't) through structured debate. This illuminates both the strengths and blind spots of different reasoning approaches.

How It Works

Debate Structure

  1. Initial Arguments: Each model presents an opening position on the shared question
  2. Rebuttal Rounds: Models respond to each other's arguments, refining positions
  3. Voting Phase: Models anonymously evaluate all arguments and vote for the most compelling
  4. Rationale Disclosure: Voting rationales are revealed for analysis

Voting Mechanism

Voting uses GPT-based evaluation of the full dialogue history. Models assess arguments on reasoning quality, clarity, relevance, and coherence—not model identity. Votes are anonymous to reduce bias toward specific architectures.

Design choice: Anonymous voting ensures consensus emerges from argument quality, not brand recognition or model reputation. The system evaluates reasoning, not sources.

Technical Implementation

Built as a Streamlit application hosted on EC2, the engine orchestrates API calls to multiple LLM providers. It includes synthetic fallback logic for development/testing when real APIs are unavailable. The UI features collapsible sections for managing debate rounds, votes, and analysis.

Current Features

Planned Features

What This Explores

Research questions:

Broader Context

This project is part of a broader exploration of AI self-awareness, ethical reasoning, and collective intelligence. It asks: What happens when we treat AI systems not as isolated tools, but as a panel of reasoning agents that can debate, disagree, and find consensus?

The engine doesn't assume consensus is always desirable—sometimes disagreement reveals more than agreement. It's a tool for understanding how reasoning, evaluation, and collective decision-making work across different AI architectures.

Status

This page is a public concept note—shared for discussion and posterity. The LLM Consensus Engine is an active experiment, hosted on EC2 and accessible via SSH port forwarding. The codebase is part of ongoing research into AI reasoning, consensus formation, and collective intelligence.

Contributions welcome. This repo explores AI self-awareness, ethical reasoning, and collective intelligence. If you're into those things, you're in good company.


Published: December 2025 · Author: Sean Wylie · seanwylie.ca

← Back to Concepts | Home