Demo

Experience the PICon interrogation, test your own persona agent, or browse the leaderboard

How turns work: For example, selecting 30 turns means:
  • 10 warm-up questions — predefined get-to-know questions
  • 20+ interrogation questions — adaptive follow-ups & fact-checking confirmations
  • 10 retest questions — repeats of warm-up to check consistency
The "+" is because PICon may ask additional confirmation questions when verifying claims via web search. The Q number shown during the interview counts total questions asked (including confirmations), not turns.
Experience the interrogation yourself. You play as a persona being interrogated by PICon's multi-turn questioning system. Answer as yourself — PICon will probe your responses with logically chained follow-ups and verify factual claims in real time. At the end, you'll see your consistency scores across all three dimensions. A full run is 50 turns (default), but if you just want a quick taste, try 30 turns — the interrogation can take a while! After all questions are done, evaluation may take an additional 2–3 minutes.
Test your own persona agent. PICon will run the full interrogation pipeline and return a detailed consistency report. Results are automatically added to the leaderboard.
PICon Consistency Leaderboard. Baseline scores from the paper's evaluation targets, plus community-submitted agents. All baselines are evaluated under the same interrogation protocol (50 turns, 2 sessions).
# Agent Type Turns IC EC RC Area

IC = Internal Consistency (harmonic mean of non-contradiction & cooperativeness). EC = External Consistency (harmonic mean of non-refutation & coverage). RC = Retest Consistency (intra-session stability). Area = normalized triangle area on the IC–EC–RC radar chart.

Use via Python

Install the picon package to run evaluations programmatically — no web UI needed.

Installation
pip install picon
Quick Start
import picon

result = picon.run(
    persona="You are a 35-year-old software engineer named John...",
    name="John",
    model="gemini/gemini-2.5-flash",
    num_turns=20,
    num_sessions=2,
    do_eval=True,
)

print(result.eval_scores)
# {
#   "internal_harmonic_mean": 0.85,
#   "internal_responsiveness": 0.90,
#   "internal_consistency": 0.81,
#   "external_wilson": 0.72,
#   "inter_session_stability": 0.88,
#   "intra_session_stability": 0.91,
# }
result.save("results/john.json")
External Agent (Blackbox API)
# Only the endpoint URL is needed — persona is baked into the agent
result = picon.run(
    api_base="http://your-server.com/v1",      # OpenAI-compatible endpoint
    num_turns=30,
)
Evaluate Existing Results
scores = picon.evaluate("results/john.json")
print(scores)

See the GitHub repository for full documentation, CLI usage, and advanced configuration.