Mastodon Politics, Power, and Science: Bootstrapping Reliable White-Box Medical Diagnosis Systems Using Large Language Models as Architectural Scaffolds

Thursday, June 26, 2025

Bootstrapping Reliable White-Box Medical Diagnosis Systems Using Large Language Models as Architectural Scaffolds

 J. Rogers, SE Ohio, 26 Jun 2025, 2218

Abstract

We propose a novel methodology for creating reliable, interpretable medical diagnosis systems by leveraging existing large language models (LLMs) as temporary architectural scaffolds. Our approach addresses the fundamental tension between the broad knowledge capabilities of modern LLMs and the strict reliability requirements of medical applications. By using unreliable but knowledgeable LLMs to design and validate constrained white-box systems, we can achieve domain-specific reliability while maintaining interpretability and trust.

1. Introduction

Current large language models demonstrate remarkable medical knowledge but suffer from hallucination, inconsistency, and lack of interpretability—making them unsuitable for clinical deployment. Traditional expert systems offer reliability and interpretability but require prohibitive manual knowledge engineering. We present a hybrid approach that uses LLMs as one-time architectural consultants to bootstrap reliable, constrained medical diagnosis systems.

2. The Categorical Axis Framework

2.1 Theoretical Foundation

Medical diagnosis can be conceptualized as navigation through a high-dimensional categorical space where symptoms, diseases, and diagnostic procedures exist as geometric relationships. Traditional LLMs learn these relationships implicitly through statistical patterns. Our approach makes these relationships explicit through:

  • Categorical Uplift: Mapping medical concepts to well-defined dimensional axes
  • Symbolic Reasoning: Operating on these categories through verifiable logical rules
  • Constrained Generation: Limiting outputs to medically validated pathways

2.2 White-Box Architecture

The proposed system consists of three layers:

  1. Neural Categorization Layer: Maps patient presentations to standardized medical codes (ICD-10/11)
  2. Symbolic Reasoning Engine: Applies evidence-based diagnostic protocols
  3. Verification Layer: Ensures all outputs trace to validated medical knowledge

3. Methodology: LLM-Scaffolded Development

3.1 Phase 1: Knowledge Extraction and Structuring

The scaffolding LLM analyzes comprehensive medical literature to:

  • Extract symptom-disease relationships
  • Map diagnostic decision trees
  • Identify categorical axes for medical reasoning
  • Generate formal specifications for the white-box system

3.2 Phase 2: Test Case Generation

The LLM generates exhaustive test scenarios covering:

  • Common diagnostic presentations (80% of cases)
  • Rare disease presentations (18% of cases)
  • Edge cases and contraindications (2% of cases)
  • Adversarial inputs designed to expose failure modes

Target: 5,000,000+ validated test cases with ground truth diagnoses

3.3 Phase 3: White-Box System Construction

The scaffolding LLM designs and implements:

  • Neural network architecture for symptom categorization
  • Symbolic rule sets for diagnostic reasoning
  • Integration protocols between neural and symbolic components
  • Verification and audit trail mechanisms

3.4 Phase 4: Automated Validation and Refinement

Iterative refinement process:

  1. White-box system processes test cases
  2. Failures are analyzed for root causes
  3. System architecture is automatically adjusted
  4. Process repeats until target reliability is achieved (>99.5% accuracy)

3.5 Phase 5: Scaffolding Removal

Once validation is complete:

  • The scaffolding LLM is disconnected
  • White-box system operates independently
  • All reasoning traces to verified medical knowledge
  • No generative capabilities beyond diagnostic domain

4. Domain Constraints and Reliability Guarantees

4.1 Architectural Constraints

The white-box system is deliberately constrained to:

  • Only recognize symptoms mappable to ICD codes
  • Only suggest diagnoses with established medical evidence
  • Only recommend procedures from clinical guidelines
  • Maintain complete audit trails for all decisions

4.2 Reliability Mechanisms

  • Categorical Validation: All inputs must map to recognized medical categories
  • Evidence Tracing: Every diagnostic suggestion links to peer-reviewed sources
  • Confidence Quantification: Probabilistic uncertainty for all outputs
  • Failure Detection: System refuses to operate outside validated domains

4.3 The Shakespeare Test

The system's domain constraint is verified by its inability to generate content outside medical diagnosis. A properly constrained system cannot "quote Shakespeare" because it lacks the categorical axes and symbolic rules for literary generation—demonstrating successful domain isolation.

5. Expected Outcomes and Advantages

5.1 Reliability Improvements

  • Consistent diagnostic performance across all validated scenarios
  • No hallucination or confabulation outside medical domain
  • Predictable failure modes with graceful degradation
  • Audit trails enabling medical review and validation

5.2 Rare Disease Detection

The system's exhaustive categorical coverage enables detection of ultra-rare conditions (prevalence <1:100,000) that human physicians might never encounter, potentially reducing diagnostic odysseys from years to hours.

5.3 Deployment Scalability

White-box systems can be:

  • Verified independently by medical boards
  • Updated through controlled symbolic rule modifications
  • Deployed in resource-limited environments
  • Integrated with existing electronic health records

6. Implementation Considerations

6.1 Validation Requirements

  • Medical board review of symbolic reasoning rules
  • Clinical trial validation against human specialists
  • Continuous monitoring for concept drift
  • Regular updates incorporating new medical knowledge

6.2 Technical Challenges

  • Ensuring complete test case coverage
  • Balancing system complexity with interpretability
  • Managing computational requirements for real-time diagnosis
  • Integrating with existing healthcare IT infrastructure

7. Ethical and Safety Considerations

7.1 Transparency and Accountability

The white-box design ensures:

  • Complete explanability of diagnostic reasoning
  • Clear attribution of medical knowledge sources
  • Physician oversight and final decision authority
  • Patient understanding of system limitations

7.2 Bias and Fairness

  • Training data must represent diverse populations
  • Categorical axes must account for demographic variations
  • Regular auditing for disparate impacts
  • Continuous monitoring of diagnostic equity

8. Future Directions

8.1 Multi-Domain Extension

The methodology could extend to:

  • Pharmaceutical drug interaction systems
  • Surgical planning assistants
  • Medical imaging interpretation
  • Treatment recommendation engines

8.2 Hybrid Human-AI Workflows

Integration with clinical decision support systems could create collaborative diagnostic environments where human expertise and AI reliability complement each other.

9. Conclusions

By using existing LLMs as temporary architectural scaffolds, we can bootstrap reliable, interpretable medical diagnosis systems that combine the knowledge extraction capabilities of modern AI with the safety requirements of clinical practice. The resulting white-box systems offer a path toward trustworthy AI in healthcare while maintaining the domain constraints necessary for regulatory approval and clinical adoption.

The key insight is that unreliable but knowledgeable systems can architect reliable but constrained successors—creating a bridge between the current state of AI and the reliability requirements of critical applications. This methodology represents a practical approach to achieving verifiable AI in domains where human lives depend on system reliability.


No comments:

Post a Comment

Progress on the campaign manager

You can see that you can build tactical maps automatically from the world map data.  You can place roads, streams, buildings. The framework ...