Demo

Experience the PICon interrogation, test your own persona agent, or browse the leaderboard

Experience the interrogation yourself. You play as a persona being interrogated by PICon's multi-turn questioning system. Answer as yourself — PICon will probe your responses with logically chained follow-ups and verify factual claims in real time. At the end, you'll see your consistency scores across all three dimensions.
Test your own persona agent. Provide your agent's API endpoint (OpenAI chat-completions compatible) along with a persona description. PICon will run the full interrogation pipeline and return a detailed consistency report. Results are automatically added to the leaderboard.
PICon Consistency Leaderboard. Baseline scores from the paper's evaluation targets, plus community-submitted agents. All baselines are evaluated under the same interrogation protocol (50 turns, 2 sessions).
# Agent Type Turns IC EC RC Area

IC = Internal Consistency (harmonic mean of non-contradiction & cooperativeness). EC = External Consistency (harmonic mean of non-refutation & coverage). RC = Retest Consistency (intra-session stability). Area = normalized triangle area on the IC–EC–RC radar chart.

Use via Python

Install the picon package to run evaluations programmatically — no web UI needed.

Installation
pip install picon
Quick Start
import picon

result = picon.run(
    persona="You are a 35-year-old software engineer named John...",
    name="John",
    model="gemini/gemini-2.5-flash",
    num_turns=20,
    num_sessions=2,
    do_eval=True,
)

print(result.eval_scores)
# {
#   "internal_harmonic_mean": 0.85,
#   "internal_responsiveness": 0.90,
#   "internal_consistency": 0.81,
#   "external_wilson": 0.72,
#   "inter_session_stability": 0.88,
#   "intra_session_stability": 0.91,
# }
result.save("results/john.json")
Self-Hosted Model
result = picon.run(
    persona="",                                # server manages persona
    name="Llama3",
    model="meta-llama/Llama-3-8B",
    api_base="http://localhost:8000/v1",       # OpenAI-compatible endpoint
    num_turns=30,
)
Evaluate Existing Results
scores = picon.evaluate("results/john.json")
print(scores)

See the GitHub repository for full documentation, CLI usage, and advanced configuration.