A paper in the New England Journal of Medicine recently caught the eye of Prof Lilford.[1] The article focuses on ‘data-shift’. Data-shift comes when the data used to derive a risk score/category are ephemeral – changing from one time and place to another. The article starts with a character called Tim de Dombal. Prof Lilford got to hear about de Dombal while he was still a medical student (top student!) in Johannesburg in 1973. de Dombal had come up with a Bayesian algorithm to diagnose the cause of abdominal pain based on clinical signs and symptoms. He had recently published a paper in the BMJ showing that the algorithm (run on a mainframe!) out-diagnosed specialist surgeons – it made the correct diagnoses 92% of the time.[2] Later, as it turned out, Lilford worked with de Dombal at St James’ Hospital, Leeds. In the meantime, de Dombal’s system had been tried out in Copenhagen with far lower success – 65% correct.[3] What had happened? The data-set to which the results were applied had different characteristics – different ages, referral patterns, gender mix – all sorts of things resulted in a mis-specified model.
This is the problem with artificial intelligence. And it is the problem with risk scores. One approach to mitigate this problem is to use additional data-sets structured in different ways. “Yet this strategy is expensive, laborious and incomplete.”[1] There will always be the problem of ‘transporting’, ‘generalising’ or ‘particularising’. Lilford has pointed to this problem in numerous previous News Blogs on risk scoring [4] and AI.[5-6] In a future News Blog we will discuss potential approaches to this problem.
Richard Lilford, ARC West Midlands Director
References:
- Lea AS & Jones DS. Mind the Gap – Machine Learning, Dataset Shift, and History in the Age of Clinical Algorithms. New Engl J Med. 2024; 390; 4: 293-5.
- de Dombal FT, et al. Computer-aided Diagnosis of Acute Abdominal Pain. Br Med J. 1972; 2: 9.
- Bjerregaard B, et al. Computer-aided diagnosis of the acute abdomen: a system from Leeds used on Copenhagen patients. In: de Dombal FT, Gremy F, eds. Decision making and medical care: can information science help? Amsterdam: North-Holland; 1976.
- Lilford RJ. Limitations of Risk-Scoring Generally and AI in Particular in Clinical Practice. NIHR ARC West Midlands News Blog. 2022; 4(10): 2-3.
- Lilford RJ. Generative, Artificial Intelligence and Diagnostic Accuracy: Expect More of These. NIHR ARC West Midlands News Blog. 2023; 5(7): 3.
- Lilford RJ. Commercial Evidence of the Limitations of AI in Studying Medical Notes. NIHR ARC West Midlands News Blog. 2022; 4(8): 9.