PICon
A multi-turn interrogation framework for evaluating whether LLM-based persona agents maintain consistency under sustained, structured questioning — inspired by real-world interrogation methodology.
What is PICon?
LLM-based persona agents are increasingly used as proxies for real human participants in medical training, social science, and product design. But how do you know if a persona agent is truly consistent — or just superficially convincing?
PICon (Persona Interrogation framework for CONsistency evaluation) applies principles from interrogation methodology to systematically probe persona agents through logically chained multi-turn questioning, exposing contradictions that simpler evaluations miss.
Three Dimensions of Consistency
Internal Consistency
Freedom from self-contradiction across all preceding utterances
External Consistency
Alignment of factual claims with real-world evidence via web search
Retest Consistency
Stability of responses when the same questions are re-asked
Key Findings
Prompting beats fine-tuning
The simplest approach wins — prompt-based persona agents outperform both fine-tuned and RAG-based systems under sustained interrogation, challenging the assumption that more complex architectures yield more consistent personas.
Instability is baked in
Some agents contradict themselves on basic demographic questions even without any prior context to confuse them — the inconsistency isn’t situational, it’s structural.
Hidden contradictions surface under pressure
Single-turn or pairwise checks miss contradictions that only appear when three or more statements are considered together — PICon’s chained questioning is designed to find exactly these.