Observational Study on AI Accuracy in Diagnosing and Treating Failed or Painful Hip Arthroplasty
- Conditions
- Total Hip Arthroplasty (THA)
- Registration Number
- NCT07012577
- Lead Sponsor
- Istituto Ortopedico Rizzoli
- Brief Summary
Primary Goal:
This study aims to evaluate the diagnostic and therapeutic accuracy of GPT-4 (an advanced AI language model) compared to three orthopedic surgeons with varying experience levels in cases of failed or painful total hip arthroplasty.
Key Research Questions:
Diagnostic Accuracy:
Does GPT-4 provide correct, partially correct, or incorrect diagnoses compared to human orthopaedic surgeons?
Diagnostic Completeness:
Are GPT-4's diagnostic suggestions complete, partially complete, or incomplete compared to those of orthopedic surgeons?
Treatment Accuracy:
Does GPT-4 recommend correct, partially correct, or incorrect treatments for failed hip arthroplasty?
Treatment Completeness:
Are GPT-4's treatment recommendations fully comprehensive, partially complete, or incomplete compared to those of orthopaedic surgeon?
Study Design:
Participants:
20 anonymized patient cases (ages 18-80) with failed or painful hip arthroplasties, treated at IRCCS Istituto Ortopedico Rizzoli (Bologna, Italy) between 2004-2024.
Cases were selected based on clear diagnostic and treatment records (no ambiguous or incomplete data).
Comparison Groups:
GPT-4 (via ChatGPT interface)
Three orthopedic doctors (with different experience levels: resident, specialist, senior surgeon)
Method:
Each case (clinical summary + X-ray image) is presented to GPT-4 and the three doctors.
They must provide a diagnosis and treatment recommendations.
Two independent evaluators (principal investigator + department head) blindly assess responses for correctness and completeness using a 3-point scale (0=wrong/incomplete, 2=correct/complete).
Statistical analysis compares GPT-4 vs. human performance.
Expected Outcomes:
Determine if AI can match or outperform doctors in diagnosing and treating hip arthroplasty failures.
Assess whether GPT-4 could serve as a supplementary tool in orthopedic decision-making.
Ethical \& Privacy Considerations:
No real-time patient data is used-only anonymized past cases.
No personal/sensitive data is shared with OpenAI (GPT-4 is used via a standard web interface).
Study complies with GDPR, HIPAA, and ethical AI guidelines.
Timeline:
Study duration: \~8 months (from ethics approval to final analysis).
Results will be published regardless of outcome.
Why This Study Matters:
First study evaluating GPT-4's role in complex orthopedic diagnostics.
Could influence future AI-assisted clinical decision-making in joint replacement surgeries.
- Detailed Description
Not available
Recruitment & Eligibility
- Status
- RECRUITING
- Sex
- All
- Target Recruitment
- 20
- Adults (≥18 and ≤80 years old).
- Documented painful or failed total hip arthroplasty requiring clinical/radiological evaluation (2004-2024).
- Complete pre-operative clinical history, imaging (X-ray/tomography), and surgical reports.
- Clear diagnosis of failure mode (e.g., aseptic loosening, infection, fracture, wear).
- Treatment and outcomes fully documented in the institutional database.
- "Exemplary" cases with minimal diagnostic ambiguity (per Engh/MusculoSkleletal Infection Society criteria, etc.).
- total hip arthroplasty with no documented failure/pain (well-functioning implants).
- Incomplete clinical/radiological records (e.g., missing pre-operative imaging or surgical notes).
- Complex/multifactorial failures (e.g., concurrent infection + loosening + fracture).
- Radiographs/images non-interpretable (poor quality, missing views).
- Cases with conflicting diagnoses/treatments in original records.
Study & Design
- Study Type
- OBSERVATIONAL
- Study Design
- Not specified
- Primary Outcome Measures
Name Time Method Diagnostic correctness Immediate (post-case evaluation) Proportion of fully correct diagnoses (score=2) by each rater, Scale 0 (worst outcome) - 2 (best outcome). 0: incorrect, 1: imprecise, 2: correct
Diagnostic completeness Immediate (post-case evaluation) Proportion of fully complete diagnoses (score=2). Scale 0 (worst outcome) - 2 (best outcome). 0: incomplete, 1: partially complete, 2: complete
Treatment recommendation correctness Immediate (post-case evaluation) Proportion of fully correct treatments (score=2) by each rater. Scale 0 (worst outcome) - 2 (best outcome). 0: incorrect, 1: imprecise, 2: correct
Treatmetn recommendation completeness Immediate (post-case evaluation) Proportion of fully complete treatments (score=2). Scale 0 (worst outcome) - 2 (best outcome). 0: incomplete, 1: partially complete, 2: complete
- Secondary Outcome Measures
Name Time Method
Related Research Topics
Explore scientific publications, clinical data analysis, treatment approaches, and expert-compiled information related to the mechanisms and outcomes of this trial. Click any topic for comprehensive research insights.
Trial Locations
- Locations (1)
SC Ortopedia e Traumatologia e Chirurgia Protesica e dei Reimpianti di Anca e Ginocchio, IRCCS Istituto Ortopedico Rizzoli
🇮🇹Bologna, Italy
SC Ortopedia e Traumatologia e Chirurgia Protesica e dei Reimpianti di Anca e Ginocchio, IRCCS Istituto Ortopedico Rizzoli🇮🇹Bologna, ItalyFrancesco Castagnini, MDContact+390516366418francescocastagnini@hotmail.it