Demo

Experience the PICon interrogation, test your own persona agent, or browse the leaderboard

How turns work: For example, selecting 30 turns means:

10 warm-up questions — predefined get-to-know questions
20+ interrogation questions — adaptive follow-ups & fact-checking confirmations
10 retest questions — repeats of warm-up to check consistency

The "+" is because PICon may ask additional confirmation questions when verifying claims via web search. The Q number shown during the interview counts total questions asked (including confirmations), not turns.

Experience the interrogation yourself. You play as a persona being interrogated by PICon's multi-turn questioning system. Answer as yourself — PICon will probe your responses with logically chained follow-ups and verify factual claims in real time. At the end, you'll see your consistency scores across all three dimensions. A full run is 50 turns (default), but if you just want a quick taste, try 30 turns — the interrogation can take a while! After all questions are done, evaluation may take an additional 2–3 minutes.

Your name

Turns

Test your own persona agent. PICon will run the full interrogation pipeline and return a detailed consistency report. Results are automatically added to the leaderboard.

Agent Name (required)

Agent API Endpoint (required — OpenAI chat-completions compatible)

Interrogation Turns

Sessions

Model names follow the LiteLLM format. Use provider/model (e.g. openai/gpt-4o, gemini/gemini-2.5-flash, anthropic/claude-sonnet-4-5, azure/<deployment-name>). See the full list at docs.litellm.ai/docs/providers or models.litellm.ai.

Providers that need more than a single API key (e.g. bedrock/* needs AWS access key + secret + region; azure/* needs an endpoint + version): expand Advanced and use the API Base URL, API Version, and Extra Environment Variables fields.

Not supported: providers that authenticate via a credential file on disk (e.g. vertex_ai/* with a service-account JSON). For these, use Connect External Agent mode and host the agent yourself.

Agent Name (required)

Model (required — LiteLLM format)

API Key (required unless using a local model via API Base URL — covers your agent's LLM inference cost only)

Advanced (e.g. Azure, Bedrock, Vertex AI, local models)

API Base URL (optional)

API Version (Azure only)

Extra Environment Variables (optional — one KEY=value per line)

Useful for providers like SageMaker (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME) or any LiteLLM provider that reads credentials from env vars. For Azure, don't use this field — use the API Key / API Base URL / API Version fields above instead (AZURE_* env vars are reserved for the evaluator and will be overwritten). Reserved names (OPENAI_API_KEY, GEMINI_API_KEY, server config) are blocked for safety.

Persona / System Prompt (required)

Interrogation Turns

Sessions

PICon Consistency Leaderboard. Baseline scores from the paper's evaluation targets, plus community-submitted agents. All baselines are evaluated under the same interrogation protocol (50 turns, 2 sessions).

Sort by

Type

Turns

#	Agent	Type	Turns	IC	EC	RC	Area

IC = Internal Consistency (harmonic mean of non-contradiction & cooperativeness). EC = External Consistency (harmonic mean of non-refutation & coverage). RC = Retest Consistency (intra-session stability). Area = normalized triangle area on the IC–EC–RC radar chart.

Use via Python

Install the picon package to run evaluations programmatically — no web UI needed.

Installation

pip install picon

Quick Start

import picon

result = picon.run(
    persona="You are a 35-year-old software engineer named John...",
    name="John",
    model="gemini/gemini-2.5-flash",
    num_turns=20,
    num_sessions=2,
    do_eval=True,
)

print(result.eval_scores)
# {
#   "internal_harmonic_mean": 0.85,
#   "internal_responsiveness": 0.90,
#   "internal_consistency": 0.81,
#   "external_wilson": 0.72,
#   "inter_session_stability": 0.88,
#   "intra_session_stability": 0.91,
# }
result.save("results/john.json")

External Agent (Blackbox API)

# Only the endpoint URL is needed — persona is baked into the agent
result = picon.run(
    api_base="http://your-server.com/v1",      # OpenAI-compatible endpoint
    num_turns=30,
)

Evaluate Existing Results

scores = picon.evaluate("results/john.json")
print(scores)

See the GitHub repository for full documentation, CLI usage, and advanced configuration.

Demo

PICon Interview

Your Consistency Report

Evaluation Report

Use via Python