·9 min read

What Is a Virtual Human? Behavior Twins for Clinician Support

Virtual HumanClinical AIArchitecture

What Is a Virtual Human? Behavior Twins for Clinician Support

The phrase "virtual human" triggers reasonable skepticism. In most contexts, it means a 3D avatar, a chatbot with a face, or a simulation character in a training environment. MindCODE means something different — and the distinction matters for understanding what is actually useful in clinical mental health care.

A MindCODE Virtual Human is a computational representation of an individual patient's behavioral state and trajectory, built from their Canonical Longitudinal Record and designed to help clinicians reason about complex cases. It does not talk. It does not have a face. It synthesizes data that no human clinician could hold in working memory simultaneously, and it presents that synthesis with explicit confidence bounds and evidence traces.

What Problem Does This Solve?

A psychiatrist treating a patient with treatment-resistant depression might have access to:

  • 18 months of PHQ-9 scores (78 data points)
  • Two failed medication trials with detailed side effect logs
  • One course of cognitive behavioral therapy (16 sessions)
  • Sleep data from a wearable covering 540 nights
  • Two fMRI scans showing changes in default mode network connectivity
  • Pharmacogenomic results indicating CYP2D6 intermediate metabolizer status
  • Inflammatory markers from quarterly blood work

No clinician can integrate all of this in a 30-minute appointment. Most will focus on the recent PHQ-9 trend and the medication history, because those are the data points that fit in working memory. The sleep trajectory, the neuroimaging changes, the pharmacogenomic implications for dosing — these inform the picture but are practically impossible to synthesize in real time.

The Virtual Human holds all of it, computes relationships between domains, and presents a synthesized view that a clinician can interrogate.

Three Facets of the Virtual Human

MindCODE's Virtual Human architecture has three distinct functional layers, each serving a different clinical need:

Facet 1: Descriptive Twin

The descriptive twin answers the question: "What is this patient's current state, in context?"

It is not a simple dashboard. It computes and presents:

  • Trajectory summaries: PHQ-9 has decreased from 22 to 14 over 12 weeks, with a plateau at weeks 6-8 before resuming decline. Current rate of improvement: -0.67 points/week.
  • Cross-domain correlations: Sleep efficiency improvements (62% to 73%) preceded PHQ-9 improvements by approximately 2 weeks throughout treatment. This temporal lead suggests sleep may be a leading indicator for this patient.
  • Anomaly detection: Step count dropped 40% in the last 5 days relative to the 30-day moving average. This pattern has preceded PHQ-9 increases in this patient's history (observed 3 times previously, with a mean PHQ-9 increase of 3.2 points within 2 weeks).
  • Cohort context: This patient's 12-week response trajectory places them at the 38th percentile compared to similar patients (matched on baseline severity, medication class, age range, and sex). The median time to 50% PHQ-9 reduction in this cohort is 14 weeks.

Every number includes a confidence interval and a provenance link to the underlying data.

Facet 2: Predictive Twin

The predictive twin answers the question: "What is likely to happen next?"

This is where the system must be most careful about communicating uncertainty. The predictive twin generates:

  • Trajectory forecasts: Based on the current improvement rate and the patient's response pattern, the model estimates a 68% probability (95% CI: 52%-81%) of achieving PHQ-9 < 10 (remission) within 8 additional weeks of current treatment.
  • Risk signals: The recent activity drop, combined with the patient's historical pattern, suggests a 23% probability of a clinically significant PHQ-9 increase (>5 points) in the next 2 weeks. This is elevated relative to the cohort baseline of 8%.
  • Treatment response probability: If the current SSRI is augmented with a second-generation antipsychotic (the clinician's proposed next step), patients in the matched cohort showed a 45% response rate (vs. 32% for SSRI dose increase alone). However, this patient's CYP2D6 intermediate metabolizer status may affect specific agent selection.

Every prediction includes:

  • The model version and training data summary
  • The features that most influenced the prediction (SHAP values or equivalent)
  • The size and composition of the reference cohort
  • Explicit caveats about the prediction's applicability

Facet 3: Interventional Support Twin

The interventional support twin answers the question: "What options should the clinician consider, and what does the evidence say about each?"

This facet does not make treatment decisions. It structures the decision space:

  • Option enumeration: Based on current clinical guidelines (APA Practice Guidelines, CANMAT 2023) and the patient's treatment history, the relevant next-step options are: (a) SSRI dose optimization, (b) SSRI augmentation with atypical antipsychotic, (c) switch to SNRI, (d) add psychotherapy modality, (e) consider neuromodulation (rTMS).
  • Evidence mapping: For each option, the twin links to relevant guideline sections, meta-analysis results, and cohort outcomes from MindCODE's data. For example: "rTMS has shown a 40-55% response rate in treatment-resistant depression (Berlim et al., 2014; Fitzgerald et al., 2022). In MindCODE's cohort, patients with a similar profile (2 failed medication trials, PHQ-9 14-18, age 30-45) who received rTMS showed a 48% response rate (n=127, 95% CI: 39%-57%)."
  • Pharmacogenomic considerations: The patient's CYP2D6 intermediate metabolizer status is flagged for agents metabolized by this pathway. Specific dosing adjustments are referenced from CPIC guidelines.

The LLM as Reasoning Core

The Virtual Human uses a large language model as its reasoning and synthesis layer — but not in the way most people expect. The LLM does not generate the predictions or compute the statistics. Those come from specialized models (survival models for trajectory forecasting, random forests for risk scoring, Bayesian networks for treatment response estimation).

The LLM serves three functions:

  1. Synthesis narration: It takes the outputs of multiple specialized models and composes them into coherent clinical narratives. Instead of presenting a clinician with six separate model outputs, it produces a unified summary that highlights agreements and conflicts between models.

  2. Query interpretation: When a clinician asks "Why did the model flag this patient's activity drop?", the LLM translates this natural language query into the appropriate technical lookup (retrieve SHAP values for the activity feature in the risk model, retrieve the patient's historical activity-PHQ9 correlation) and composes the answer.

  3. Evidence retrieval and contextualization: The LLM retrieves relevant clinical literature and guideline sections, matches them to the patient's specific situation, and presents them in context rather than as raw citations.

The LLM never fabricates clinical data or statistics. It operates over verified outputs from the computational layer, with retrieval-augmented generation (RAG) grounded in curated clinical knowledge bases. Hallucination detection runs on every output, comparing generated claims against the source data and flagging any assertion that cannot be traced to a specific data point or reference.

Concrete Example: Treatment-Resistant Depression

Patient M, 38-year-old female, 18 months into treatment:

Descriptive twin output:

PHQ-9 trajectory shows partial response to two SSRI trials (sertraline 200mg x 8 weeks, escitalopram 20mg x 10 weeks), with best score of 14 (from baseline 22). Sleep data shows persistent early morning awakening pattern (mean final awakening 4:12 AM, 1.8 hours before target). fMRI comparison (baseline vs. 12 months) shows persistent hyperconnectivity in the default mode network (DMN-to-salience network coupling z-score: 0.82, vs. cohort healthy control mean of 0.31). CYP2D6 genotype: *1/*41 (intermediate metabolizer). IL-6 elevated at 5.1 pg/mL (reference: <3.0).

Predictive twin output:

Probability of remission (PHQ-9 < 10) with continued current treatment: 18% (95% CI: 9%-29%) over 12 weeks. Probability of remission with rTMS augmentation: 44% (95% CI: 33%-56%) based on matched cohort (n=89, matching criteria: 2+ failed SSRI trials, baseline PHQ-9 18-24, age 30-45, elevated inflammatory markers).

Interventional support twin output:

Three guideline-concordant options with cohort evidence: (1) Switch to SNRI (venlafaxine) — cohort response rate 31%, note CYP2D6 IM status may require dose adjustment per CPIC guidelines. (2) Augment with aripiprazole — cohort response rate 38%, standard dosing appropriate for CYP2D6 IM. (3) rTMS targeting L-DLPFC — cohort response rate 48%, elevated inflammatory markers have been associated with better rTMS response in two recent studies (Lisanby et al., 2024; Chen et al., 2025).

The clinician reviews this synthesis, asks follow-up questions, and makes the treatment decision. The Virtual Human informed the decision; the clinician made it.

Clinical Boundaries

We are explicit about what the Virtual Human does not do:

  • It does not diagnose. Diagnosis is a clinical act that requires the full context of the patient encounter, including information the system does not have access to.
  • It does not prescribe. Treatment decisions are made by licensed clinicians. The system presents options and evidence.
  • It does not have autonomous patient contact. The Virtual Human has no patient-facing interface. It is a clinician tool.
  • It does not override clinical judgment. When a clinician disagrees with the system's assessment, the clinician's judgment prevails. The disagreement is logged for quality improvement analysis.

Comparison to Digital Twins in Engineering

The "digital twin" concept originated in manufacturing and aerospace — a virtual replica of a physical system (a jet engine, a factory floor) that simulates behavior under different conditions. MindCODE's Virtual Human borrows the core idea but differs in important ways:

| Aspect | Engineering Digital Twin | MindCODE Virtual Human | |--------|------------------------|----------------------| | System being modeled | Physical machinery with known physics | Human behavior with incomplete mechanistic understanding | | Model fidelity | Can approach near-perfect simulation | Fundamentally probabilistic; behavior is not fully deterministic | | Update frequency | Real-time sensor data (milliseconds) | Mixed: wearable data (seconds), clinical data (weeks) | | Decision authority | Can autonomously control the physical system | Advisory only; clinician retains all decision authority | | Validation | Compare simulation to physical measurement | Compare predictions to longitudinal outcomes; longer feedback loops |

The honest framing is that a Virtual Human is a weaker twin than an engineering digital twin — it operates with more uncertainty, less complete models, and no autonomous control. We consider this appropriate humility for a system that reasons about human beings.

What This Enables

The Virtual Human is the layer where MindCODE's Data Cloud meets clinical practice. It transforms a rich but overwhelming longitudinal record into actionable clinical intelligence — bounded by explicit uncertainty, grounded in evidence, and always subordinate to the clinician's expertise.

In the next post, we will explore a specific application: how MindCODE supports data-driven rTMS protocol planning, where the Virtual Human's predictive capabilities meet concrete treatment parameter decisions.