- Future Proof PT
- Posts
- Weekly Literature Review: Clinical Tests, Consensus, and the Role of AI in Diagnosis
Weekly Literature Review: Clinical Tests, Consensus, and the Role of AI in Diagnosis
Week of June 30– July 4, 2025
Curated insights for physical therapists, healthcare professionals, and clinical decision-makers.
Monday
Are We Making the Most of Diagnostic Test Accuracy in PT?
A 2024 review by Kaizik et al. examined 241 systematic reviews of diagnostic test accuracy (DTA) in physical therapy. The good news: the evidence base is growing. The concern? Nearly half of the reviews carried a high risk of bias.
Key insights:
Sensitivity and specificity are widely reported, but often without methodological rigor.
Risk of bias assessments were inconsistently applied.
Fewer than half of the reviews performed a meta-analysis.
As PTs continue to anchor care in evidence, this paper is a reminder that how we evaluate evidence matters just as much as the evidence itself. Stronger methods lead to stronger insights—and ultimately, better clinical decisions.
Full article: Kaizik et al., 2024
#PhysicalTherapy #Diagnostics #EvidenceBasedPractice
Tuesday
Meniscal Testing: What Actually Works?
A new study published in Acta Orthopaedica evaluated six commonly used meniscal tests against MRI and arthroscopy in over 250 patients. The verdict: no single test offers sufficient reliability.
Findings:
McMurray + Apley in combination produced the best diagnostic accuracy.
Comorbidities like chondropathy or anterior knee pain significantly skew results.
The specificity and sensitivity of each test varied widely, reinforcing the need for clinical pattern recognition over test-based certainty.
This study reinforces a familiar message: in musculoskeletal diagnosis, context is everything.
Full article: Acta Orthopaedica, 2025
#Orthopedics #MeniscalTear #ClinicalAssessment #SportsMedicine
Wednesday
Can We Predict Falls in Older Adults? Not With One Test
Omaña et al. (2021) reviewed the predictive accuracy of three common balance tests—Functional Reach Test (FRT), Single-Leg Stance Test (SLST), and Tinetti POMA, for identifying fall risk in older adults.
Highlights:
All three tests demonstrated low sensitivity and modest specificity, with AUC values generally between 0.50 and 0.70.
No single test reliably predicted future falls across populations or settings.
Tests remain useful to monitor balance over time but are insufficient for fall prediction in isolation.
Bottom line: These tools provide valuable snapshots but need to be integrated into multifactorial assessments to guide intervention strategies.
Full article: Omaña et al., 2021
#FallPrevention #Geriatrics #BalanceTesting #MultifactorialAssessment
Thursday
Requejo-Salinas et al. (2022) conducted a global Delphi study with 15 PT experts to identify clinical descriptors for Rotator Cuff Related Shoulder Pain (RCRSP). The outcome: consensus on 18 descriptors across subjective history, physical exam, and function.
Highlights:
Subjective signs (e.g., pain with overhead activity, deltoid region pain, sleep disturbance) were most reliable.
Special tests failed to reach consensus, highlighting their limited diagnostic value.
Imaging is not recommended routinely, unless red flags or failure of conservative care are present.
Functional and PROM tools (SPADI, DASH) support treatment tracking, not diagnosis.
This study signals a move away from special test-based diagnosis toward a load-response, pattern-based approach, informed by biopsychosocial context.
Full article: Requejo-Salinas et al., 2022
#RCRSP #ShoulderPain #DelphiStudy #ClinicalReasoning
Friday
GPT-4 and Diagnostic Reasoning: Promise, Limits, and Design Gaps
A new randomized clinical trial in JAMA Network Open explored whether GPT-4 could enhance physician diagnostic reasoning. Fifty U.S. physicians reviewed clinical cases either with or without GPT-4. The results offer a sobering dose of nuance.
Key findings:
GPT-4 alone outperformed clinicians, scoring 92% on diagnostic reasoning tasks.
Physicians with GPT-4 access showed no significant improvement over traditional tools (76% vs 74%, p = .60).
Efficiency gains were minimal, and no differences were observed based on specialty or LLM experience.
Implication: The tool works, but human improvement depends on how it’s integrated. Interface design, training, and structured collaboration matter more than simply adding AI to the workflow.
Full article: Goh et al., 2024
#AIinHealthcare #ClinicalReasoning #GPT4 #HumanAICollaboration
Closing Thought
Each study this week calls attention to a central truth: tools, whether clinical tests, consensus frameworks, or AI models, are only as effective as the systems and context in which they’re used. Progress in healthcare requires not just new evidence, but better judgment in how we apply it.