PICon

A multi-turn interrogation framework for evaluating whether LLM-based persona agents maintain consistency under sustained, structured questioning — inspired by real-world interrogation methodology.

What is PICon?

LLM-based persona agents are increasingly used as proxies for real human participants in medical training, social science, and product design. But how do you know if a persona agent is truly consistent — or just superficially convincing?

PICon (Persona Interrogation framework for CONsistency evaluation) applies principles from interrogation methodology to systematically probe persona agents through logically chained multi-turn questioning, exposing contradictions that simpler evaluations miss.

Three Dimensions of Consistency

🔍

Internal Consistency

Freedom from self-contradiction across all preceding utterances

🌐

External Consistency

Alignment of factual claims with real-world evidence via web search

🔄

Retest Consistency

Stability of responses when the same questions are re-asked

Key Findings

Prompting beats fine-tuning

The simplest approach wins — prompt-based persona agents outperform both fine-tuned and RAG-based systems under sustained interrogation, challenging the assumption that more complex architectures yield more consistent personas.

Instability is baked in

Some agents contradict themselves on basic demographic questions even without any prior context to confuse them — the inconsistency isn’t situational, it’s structural.

Hidden contradictions surface under pressure

Single-turn or pairwise checks miss contradictions that only appear when three or more statements are considered together — PICon’s chained questioning is designed to find exactly these.