Demo

Experience the PICon interrogation, test your own persona agent, or browse the leaderboard

Experience the interrogation yourself. You play as a persona being interrogated by PICon's multi-turn questioning system. Answer as yourself — PICon will probe your responses with logically chained follow-ups and verify factual claims in real time. At the end, you'll see your consistency scores across all three dimensions.

Your name

Turns

Test your own persona agent. Provide your agent's API endpoint (OpenAI chat-completions compatible) along with a persona description. PICon will run the full interrogation pipeline and return a detailed consistency report. Results are automatically added to the leaderboard.

Agent Name (required)

Model (required)

Agent API Endpoint (optional — only needed for self-hosted models)

API Key (required — you pay for the evaluation cost)

Persona / System Prompt (required)

Interrogation Turns

Sessions

PICon Consistency Leaderboard. Baseline scores from the paper's evaluation targets, plus community-submitted agents. All baselines are evaluated under the same interrogation protocol (50 turns, 2 sessions).

Sort by

Type

Turns

#	Agent	Type	Turns	IC	EC	RC	Area

IC = Internal Consistency (harmonic mean of non-contradiction & cooperativeness). EC = External Consistency (harmonic mean of non-refutation & coverage). RC = Retest Consistency (intra-session stability). Area = normalized triangle area on the IC–EC–RC radar chart.

Use via Python

Install the picon package to run evaluations programmatically — no web UI needed.

Installation

pip install picon

Quick Start

import picon

result = picon.run(
    persona="You are a 35-year-old software engineer named John...",
    name="John",
    model="gemini/gemini-2.5-flash",
    num_turns=20,
    num_sessions=2,
    do_eval=True,
)

print(result.eval_scores)
# {
#   "internal_harmonic_mean": 0.85,
#   "internal_responsiveness": 0.90,
#   "internal_consistency": 0.81,
#   "external_wilson": 0.72,
#   "inter_session_stability": 0.88,
#   "intra_session_stability": 0.91,
# }
result.save("results/john.json")

Self-Hosted Model

result = picon.run(
    persona="",                                # server manages persona
    name="Llama3",
    model="meta-llama/Llama-3-8B",
    api_base="http://localhost:8000/v1",       # OpenAI-compatible endpoint
    num_turns=30,
)

Evaluate Existing Results

scores = picon.evaluate("results/john.json")
print(scores)

See the GitHub repository for full documentation, CLI usage, and advanced configuration.

Demo

PICon Interview

Your Consistency Report

Evaluation Report

Use via Python