Our Methodology
Generative AI-enabled Theme Organization and Structuring—a peer-reviewed methodology for AI-assisted thematic analysis with full traceability.
Every insight traces back to source data. No hallucination. No black boxes.
The Problem
Most AI tools generate "insights" without showing how they reached them. You can't verify whether conclusions are grounded in your data or invented by the model.
Large language models can confidently generate plausible-sounding insights that aren't actually in your data. Without traceability, you can't tell the difference.
When an AI tool tells you "67% of customers mentioned X," can you verify that number? Can you see the actual customer statements that contributed to it?
Summarization-based approaches lose the nuance of individual voices. Themes become abstractions disconnected from the people who expressed them.
The Solution
GATOS maintains a chain of custody from source data to final themes. Every insight can be traced back to specific participant utterances.
Original participant feedback
Discrete summary points
Semantic groupings
Grounded codes
Traceable insights
The Key Innovation
Every theme can be traced back through codes → clusters → extracts → original utterances. You can verify exactly which participant voices contributed to each insight.
Explore how insights connect back to source data
Click any step to trace insights back to source data
Key: Every theme traces back through codes, clusters, and extracts to specific participant utterances—no hallucination possible.
Deep Dive
Raw participant utterances are distilled into discrete summary points—each capturing a single idea in the participant's own framing.
// Raw utterance:
"I waited forever in the ER and nobody told me what was happening. The nurse was nice though."
// Extracts:
→ "Long wait time in emergency department"
→ "Lack of communication during wait"
→ "Positive interaction with nursing staff"
Extracts are embedded into vector space and clustered using PCA, UMAP, and agglomerative clustering. Similar ideas from different participants converge naturally.
This step reveals natural groupings in the data without imposing predefined categories. Patterns emerge from the participants' own language.
Codes are generated through nearest-neighbor retrieval, ensuring new codes are grounded in existing patterns. The model cannot invent categories not supported by the data.
Safeguards:
Quality criteria:
Codes are organized into themes, with every connection preserved. Ask about any theme and trace it back to the specific participant utterances that contributed to it.
Theme: "Communication Gaps During Care"
↓
Code: "Uncertainty about wait status"
↓
Cluster: 847 extracts about waiting + communication
↓
Sample extracts: "No updates during wait", "Didn't know if forgotten"...
↓
Source utterances: [Patient 142, Patient 891, ...]
Published Research
The GATOS methodology is documented in peer-reviewed research
See It In Action
Explore our case studies to see how GATOS delivers trustworthy insights across industries.