Neurosurgery residents appear to remain superior to ChatGPT in answering neurosurgery board examination-like questions, suggests a study in the Philippines. However, the studies included in the meta-analysis have high heterogeneity.
Researchers performed a literature search following the PRISMA guidelines, covering the period of ChatGPT’s inception (November 2022) until 25 October 2024. Two reviewers screened for eligible studies and selected those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with scores from residents.
The research team assessed the risk of bias using JBI critical appraisal tool and determined the overall effect sizes using a fixed-effects model.
Six studies met the eligibility criteria for the qualitative and quantitative analyses. The accuracy of ChatGPT ranged from 50.4 percent to 78.8 percent, as opposed to 58.3 percent to 73.7 percent among neurosurgery residents.
Four out of six studies had a low risk of bias, while the rest had moderate risk. An overall trend favouring neurosurgery residents over ChatGPT (p<0.00001) was observed, with high heterogeneity (I2=96).
These findings persisted in a subgroup analysis of studies that used the Self-assessment in Neurosurgery examination questions. On sensitivity analysis, however, ChatGPT appeared to perform better than the residents when the highest weighted study was removed.
“Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education,” the researchers said.