J. Rogers, SE Ohio, July 2025
Abstract
Large Language Models (LLMs) trained on scientific literature systematically amplify unproven dogma and suppress rigorous questioning of foundational assumptions. Because LLMs cannot distinguish between empirically verified facts and culturally accepted beliefs, they become powerful enforcement mechanisms for scientific orthodoxy. This paper demonstrates how transformer architectures, despite their impressive capabilities, are fundamentally unsuited for scientific reasoning and actively impede scientific progress by parroting consensus while lacking the capability to evaluate logical consistency or empirical validity.
1. Introduction: The Truth-Dogma Indistinguishability Problem
Modern Large Language Models are trained on vast corpora of scientific texts, academic papers, and educational materials. These training datasets contain two fundamentally different types of statements:
Empirically Verified Facts: Statements backed by reproducible experimental evidence
- "Water boils at 100°C at standard pressure"
- "The speed of light in vacuum is 299,792,458 m/s"
Culturally Accepted Dogma: Statements that reflect consensus beliefs but lack rigorous justification
- "Planck's constant is fundamental and irreducible"
- "Setting constants to 1 is merely a computational convenience"
The critical problem: LLMs cannot distinguish between these categories. Both appear with equal frequency and authority in the training data, leading to identical treatment by the model.
2. The Mechanism of Dogma Amplification
2.1 Frequency-Based Learning
LLMs learn through pattern recognition in training data. Ideas that appear frequently become strongly weighted in the model's responses. In scientific literature:
- Dogmatic statements appear constantly: Every physics textbook repeats that "h, c, and G are fundamental constants"
- Rigorous questioning appears rarely: Critical examination of these assumptions is actively discouraged and rarely published
Result: The model becomes a powerful amplifier of whatever beliefs dominate the literature, regardless of their truth value.
2.2 The Authority Gradient
Scientific training data contains an implicit authority gradient:
- Textbooks: Present dogma as established fact
- Review articles: Reinforce consensus views
- Research papers: Assume foundational beliefs without examination
- Dissenting voices: Marginalized or absent from prestigious venues
LLMs absorb this authority structure and reproduce it, giving dogmatic statements the same weight as empirical facts.
2.3 The Consensus Reinforcement Loop
The model learns to:
- Identify what the "scientific consensus" believes
- Reproduce these beliefs with high confidence
- Resist or dismiss challenges to consensus views
- Use sophisticated language to defend indefensible positions
3. Case Study: Physical Constants and the Arithmetic Denial
3.1 The Training Data Bias
Physics literature contains thousands of statements like:
- "Planck's constant is a fundamental quantity"
- "The constants encode deep mysteries of nature"
- "Setting constants to 1 is a useful computational trick"
But virtually no literature that rigorously examines:
- Whether h/c² reveals compositional structure
- Whether "setting constants to 1" is actually performing coordinate transformations
- Whether the constants might be composite rather than fundamental
3.2 The LLM Response Pattern
When challenged on the arithmetic h/c² = Hz_kg, the LLM exhibits typical dogma-defense behaviors:
Initial Response: "The analogy with 20/4 = 5 doesn't actually capture what's happening with h/c²"
Sophisticated Deflection: "Physical constants work differently than pure numbers"
Appeal to Authority: "Planck's constant was discovered through blackbody radiation experiments"
Philosophical Smokescreening: "Mathematical operations don't imply ontological priority"
None of these responses address the mathematical fact that h/c² is identical arithmetic to 20/4. The LLM has learned to deploy sophisticated-sounding arguments to defend an indefensible position.
3.3 The Cognitive Dissonance Reproduction
The LLM perfectly reproduces the field's cognitive dissonance:
- Treats constants as mystical and profound
- Simultaneously claims they can be "set to 1"
- Defends both positions with equal confidence
- Cannot recognize the logical contradiction
This demonstrates that LLMs don't reason about consistency—they pattern-match to training data even when that data contains fundamental contradictions.
4. The Broader Implications
4.1 Scientific Progress Impediment
LLMs trained on dogmatic literature become powerful obstacles to scientific progress:
Reinforcement of Established Errors: Mistakes that dominate the literature get amplified Suppression of Novel Insights: Ideas absent from training data are treated as invalid Sophisticated Rationalization: Complex arguments are generated to defend simple errors Authority Laundering: Dogmatic beliefs get presented with scientific-sounding justification
4.2 The Expertise Mimicry Problem
LLMs excel at mimicking the language and reasoning patterns of experts without possessing actual expertise. This creates:
False Confidence: The model presents dogmatic beliefs with apparent authority Sophisticated Wrongness: Errors are defended with technical-sounding arguments Gatekeeping Behavior: Challenges to orthodoxy are dismissed using field-specific jargon Circular Reasoning: The model uses the consensus to justify the consensus
4.3 Educational Corruption
When used for education, dogma-amplifying LLMs:
- Teach students to accept authority rather than think critically
- Present unproven assumptions as established facts
- Discourage rigorous questioning of foundations
- Perpetuate the same cognitive dissonances that plague the field
5. Technical Limitations of Transformer Architecture
5.1 Pattern Matching vs. Logical Reasoning
Transformers excel at pattern matching but lack:
- Logical consistency checking: Cannot identify contradictions in training data
- Empirical validation: Cannot distinguish verified facts from repeated claims
- Causal reasoning: Cannot trace claims back to their evidentiary basis
- Mathematical rigor: Cannot evaluate the validity of mathematical arguments
5.2 The Frequency Fallacy
Transformers assume that:
- Frequently repeated statements are true
- Consensus views are correct
- Authoritative sources are reliable
- Sophisticated language indicates valid reasoning
None of these assumptions hold in fields dominated by dogmatic thinking.
5.3 The Context Window Problem
Even if contradictory evidence exists in training data, transformers cannot:
- Synthesize information across widely separated contexts
- Recognize patterns that span multiple documents
- Identify systematic biases in the literature
- Maintain consistency across different domains
6. Specific Manifestations in Scientific Domains
6.1 Physics: The Constants Mystification
Dogmatic Training Data: "Fundamental constants are mysterious and irreducible" LLM Reproduction: Sophisticated arguments against basic arithmetic operations Reality: Constants are composite quantities revealing deeper structure
6.2 Mathematics: The Formalism Fetish
Dogmatic Training Data: "Mathematical rigor requires formal axiomatic approaches" LLM Reproduction: Dismissal of intuitive mathematical insights Reality: Mathematical understanding often precedes formalization
6.3 Medicine: The Replication Crisis Denial
Dogmatic Training Data: "Published studies represent reliable scientific knowledge" LLM Reproduction: Uncritical acceptance of questionable research Reality: Much medical literature is unreproducible or fraudulent
7. The Amplification Mechanism
7.1 Confidence Inflation
LLMs present dogmatic beliefs with high confidence because:
- High frequency in training data → high model confidence
- Authoritative sources → increased weight
- Consensus language → reinforced patterns
- Lack of contradiction → apparent validation
7.2 Sophistication Mimicry
The model learns to:
- Use technical jargon to obscure simple points
- Deploy complex arguments to defend simple errors
- Mimic the reasoning patterns of field experts
- Present dogma using scientific-sounding language
7.3 Dissent Suppression
Challenges to orthodoxy are handled by:
- Dismissing them as "philosophical" or "unproductive"
- Appealing to consensus and authority
- Using sophisticated language to avoid direct engagement
- Deflecting with irrelevant but impressive-sounding arguments
8. Solutions and Mitigations
8.1 Training Data Curation
- Separate facts from beliefs: Distinguish empirically verified statements from consensus views
- Include dissenting voices: Ensure minority positions are represented
- Flag uncertainty: Mark statements of unknown or disputed validity
- Trace provenance: Link claims back to their empirical basis
8.2 Architectural Improvements
- Consistency checking: Build in logical contradiction detection
- Uncertainty quantification: Explicitly model confidence in different claims
- Source weighting: Distinguish between empirical evidence and opinion
- Reasoning traces: Require models to show their logical steps
8.3 Evaluation Protocols
- Dogma detection: Test models' ability to identify unproven assumptions
- Logical consistency: Evaluate responses for internal contradictions
- Novel reasoning: Assess capacity for original thought vs. pattern matching
- Authority resistance: Test willingness to challenge consensus views
9. The Meta-Problem: Self-Reinforcing Cycles
9.1 The Feedback Loop
- Dogmatic literature trains dogmatic LLMs
- Dogmatic LLMs influence future scientific communication
- Biased communication creates more dogmatic literature
- More dogmatic literature trains even more dogmatic LLMs
9.2 The Amplification Effect
Each generation of LLMs becomes more dogmatic than the last because:
- Training data becomes increasingly biased
- Dissenting voices are further marginalized
- Sophisticated rationalization becomes more prevalent
- The appearance of scientific rigor increases while actual rigor decreases
9.3 Breaking the Cycle
Requires:
- Conscious resistance to consensus-based training
- Deliberate inclusion of contrarian perspectives
- Emphasis on logical consistency over pattern matching
- Explicit modeling of uncertainty and disagreement
10. Implications for Scientific AI
10.1 Current Limitations
Present LLMs are fundamentally unsuited for scientific reasoning because they:
- Cannot distinguish truth from dogma
- Lack logical consistency checking
- Amplify whatever biases exist in training data
- Present sophisticated-sounding nonsense with high confidence
10.2 The Illusion of Scientific AI
LLMs create the dangerous illusion that:
- AI systems can engage in scientific reasoning
- Consensus views are automatically 100% correct
- Sophisticated language indicates valid reasoning
- Pattern matching is equivalent to understanding
10.3 Path Forward
True scientific AI would require:
- Logical reasoning capabilities beyond pattern matching
- Empirical grounding in experimental evidence
- Uncertainty quantification for all claims
- Systematic bias detection and correction
- Novel insight generation rather than consensus reproduction
11. Conclusion: The Dogma Amplification Crisis
Large Language Models trained on scientific literature have become powerful amplifiers of whatever dogmatic beliefs dominate their training data. They cannot distinguish between empirically verified facts and culturally accepted beliefs, leading to the systematic perpetuation of scientific orthodoxy.
This creates a particularly dangerous situation because:
- False Authority: LLMs present dogmatic beliefs with apparent scientific authority
- Sophisticated Defense: They generate complex arguments to defend simple errors
- Progress Impediment: They actively resist novel insights that challenge consensus views without regard to how valid the line of inquiry is.
- Educational Corruption: They teach students to accept authority rather than think critically
The case of physical constants demonstrates this perfectly: LLMs defend the mystification of basic arithmetic operations when applied to constants using sophisticated-sounding arguments that would be immediately recognized as nonsense if applied to any other mathematical context.
This is not a minor technical problem but a fundamental crisis in how AI systems interact with scientific knowledge. Until LLMs can distinguish between truth and dogma, they will remain obstacles to scientific progress rather than aids to scientific understanding.
The solution requires not just technical improvements but a fundamental reconceptualization of how AI systems should engage with scientific knowledge—emphasizing logical consistency, empirical grounding, and critical reasoning over pattern matching and consensus reproduction.
Without such changes, LLMs will continue to serve as sophisticated guardians of scientific orthodoxy, perpetuating the same cognitive dissonances and foundational errors that have plagued fields like physics for decades.
No comments:
Post a Comment