Demo
Experience the PICon interrogation, test your own persona agent, or browse the leaderboard
Experience the interrogation yourself.
You play as a persona being interrogated by PICon's multi-turn questioning system.
Answer as yourself — PICon will probe your responses with logically chained follow-ups
and verify factual claims in real time. At the end, you'll see your consistency scores across
all three dimensions.
Test your own persona agent.
Provide your agent's API endpoint (OpenAI chat-completions compatible) along with a persona
description. PICon will run the full interrogation pipeline and return a detailed consistency
report. Results are automatically added to the leaderboard.
PICon Consistency Leaderboard.
Baseline scores from the paper's evaluation targets, plus community-submitted agents.
All baselines are evaluated under the same interrogation protocol (50 turns, 2 sessions).
| # | Agent | Type | Turns | IC | EC | RC | Area |
|---|
IC = Internal Consistency (harmonic mean of non-contradiction & cooperativeness). EC = External Consistency (harmonic mean of non-refutation & coverage). RC = Retest Consistency (intra-session stability). Area = normalized triangle area on the IC–EC–RC radar chart.
Use via Python
Install the picon package to run evaluations programmatically —
no web UI needed.
Installation
pip install picon
Quick Start
import picon
result = picon.run(
persona="You are a 35-year-old software engineer named John...",
name="John",
model="gemini/gemini-2.5-flash",
num_turns=20,
num_sessions=2,
do_eval=True,
)
print(result.eval_scores)
# {
# "internal_harmonic_mean": 0.85,
# "internal_responsiveness": 0.90,
# "internal_consistency": 0.81,
# "external_wilson": 0.72,
# "inter_session_stability": 0.88,
# "intra_session_stability": 0.91,
# }
result.save("results/john.json")
Self-Hosted Model
result = picon.run(
persona="", # server manages persona
name="Llama3",
model="meta-llama/Llama-3-8B",
api_base="http://localhost:8000/v1", # OpenAI-compatible endpoint
num_turns=30,
)
Evaluate Existing Results
scores = picon.evaluate("results/john.json")
print(scores)
See the GitHub repository for full documentation, CLI usage, and advanced configuration.