Demo
Experience the PICon interrogation, test your own persona agent, or browse the leaderboard
How turns work: For example, selecting 30 turns means:
- 10 warm-up questions — predefined get-to-know questions
- 20+ interrogation questions — adaptive follow-ups & fact-checking confirmations
- 10 retest questions — repeats of warm-up to check consistency
Experience the interrogation yourself.
You play as a persona being interrogated by PICon's multi-turn questioning system.
Answer as yourself — PICon will probe your responses with logically chained follow-ups
and verify factual claims in real time. At the end, you'll see your consistency scores across
all three dimensions.
A full run is 50 turns (default), but if you just want a quick taste,
try 30 turns — the interrogation can take a while!
After all questions are done, evaluation may take an additional 2–3 minutes.
Test your own persona agent.
PICon will run the full interrogation pipeline and return a detailed consistency report.
Results are automatically added to the leaderboard.
PICon Consistency Leaderboard.
Baseline scores from the paper's evaluation targets, plus community-submitted agents.
All baselines are evaluated under the same interrogation protocol (50 turns, 2 sessions).
| # | Agent | Type | Turns | IC | EC | RC | Area |
|---|
IC = Internal Consistency (harmonic mean of non-contradiction & cooperativeness). EC = External Consistency (harmonic mean of non-refutation & coverage). RC = Retest Consistency (intra-session stability). Area = normalized triangle area on the IC–EC–RC radar chart.
Use via Python
Install the picon package to run evaluations programmatically —
no web UI needed.
Installation
pip install picon
Quick Start
import picon
result = picon.run(
persona="You are a 35-year-old software engineer named John...",
name="John",
model="gemini/gemini-2.5-flash",
num_turns=20,
num_sessions=2,
do_eval=True,
)
print(result.eval_scores)
# {
# "internal_harmonic_mean": 0.85,
# "internal_responsiveness": 0.90,
# "internal_consistency": 0.81,
# "external_wilson": 0.72,
# "inter_session_stability": 0.88,
# "intra_session_stability": 0.91,
# }
result.save("results/john.json")
External Agent (Blackbox API)
# Only the endpoint URL is needed — persona is baked into the agent
result = picon.run(
api_base="http://your-server.com/v1", # OpenAI-compatible endpoint
num_turns=30,
)
Evaluate Existing Results
scores = picon.evaluate("results/john.json")
print(scores)
See the GitHub repository for full documentation, CLI usage, and advanced configuration.