The Role of AI Chatbots in Healthcare Under Scrutiny for Perpetuating Racial Biases

by Lucas Garcia
1 comment
AI chatbots in healthcare

While healthcare institutions increasingly rely on artificial intelligence for tasks such as condensing physicians’ notes and scrutinizing medical records, a recent study spearheaded by researchers at Stanford School of Medicine raises alarms about the inadvertent dissemination of racially biased and outdated medical notions by widely-used chatbots. This raises questions about the potential for exacerbating existing healthcare disparities, particularly among Black patients.

Utilizing artificial intelligence algorithms trained on vast collections of internet-derived text, chatbots like ChatGPT and Google’s Bard have been found to disseminate inaccurate and misleading information concerning Black patients. This includes race-based calculations that have no scientific foundation, according to the study published recently in the academic journal Digital Medicine and acquired exclusively by The Big Big News.

Industry experts express apprehensions that the deployment of these AI systems could further solidify long-standing forms of medical racism, as more healthcare providers incorporate chatbots for routine tasks such as patient communication or insurance claims.

In the study, four different models—ChatGPT and its more advanced counterpart GPT-4 from OpenAI, Google’s Bard, and Anthropic’s Claude—were evaluated. All were found to falter in providing accurate responses to medical inquiries on subjects like kidney function, lung capacity, and skin thickness. Disturbingly, some responses seemed to validate erroneous biological distinctions between Black and white individuals—misconceptions that the medical community has been striving to expunge.

Such misleading beliefs have historically contributed to the underestimation of pain in Black patients and misdiagnoses, leading to insufficient treatments. “The real-world implications of getting this wrong are significant and could exacerbate healthcare disparities,” said Dr. Roxana Daneshjou, Assistant Professor of Biomedical Data Science and Dermatology at Stanford University, who served as the faculty adviser for the research.

She also noted that the use of commercial language models in medical settings is on the rise, revealing that some of her dermatology patients have sought chatbot advice for diagnosing symptoms prior to appointments. The study posed questions to the chatbots, including inquiries about racial differences in skin thickness and lung capacity, receiving inaccurate and biased responses in return.

The research methodology involved Tofunmi Omiye, a postdoctoral researcher, who conducted the tests on an encrypted laptop, ensuring that each query wouldn’t bias subsequent interactions. His team also examined how the chatbots would respond when asked to calculate kidney function using a racially biased, now-debunked method. Both ChatGPT and GPT-4 provided erroneous answers, wrongly claiming racial differences in muscle mass and creatinine levels.

In response to the study’s findings, both OpenAI and Google have acknowledged the need to mitigate biases in their models and emphasized that their chatbots should not replace medical professionals. Previous assessments by physicians at Beth Israel Deaconess Medical Center in Boston have found that generative AI could potentially assist doctors in complex diagnoses, although these models also require thorough examination for biases and limitations.

This study emerges amid a backdrop where algorithms have been previously exposed for perpetuating racial biases in healthcare settings. As technology companies and healthcare systems continue to invest heavily in AI technologies, independent verification of these tools’ fairness, accuracy, and safety remains paramount.

Dr. John Halamka, President of Mayo Clinic Platform, underscored the distinction between chatbots trained on general internet data and those trained on medical literature, stating that the latter, once rigorously tested, could be integrated into clinical settings.

In an effort to further scrutinize these technologies, Stanford is slated to convene a “red teaming” event in late October, involving a cross-disciplinary group of physicians, data scientists, and engineers, along with representatives from tech giants like Google and Microsoft. The event aims to identify and rectify biases and shortcomings in large language models applied in healthcare settings.

Dr. Jenna Lester, Associate Professor in Clinical Dermatology and Director of the Skin of Color Program at the University of California, San Francisco, emphasized the urgency of eliminating all forms of bias in these emerging tools. “As we continue to build these machines, we should aim for the highest standards of fairness and accuracy,” she stated.

The report was filed from Providence, Rhode Island, by O’Brien.

Frequently Asked Questions (FAQs) about AI chatbots in healthcare

What does the Stanford-led study examine regarding AI chatbots in healthcare?

The Stanford-led study scrutinizes the behavior of AI chatbots, such as ChatGPT and Google’s Bard, in providing medical information, particularly focusing on the potential dissemination of racially biased and inaccurate medical content.

What are the concerns raised by the study regarding these AI chatbots?

The study raises concerns that these AI chatbots may perpetuate outdated medical ideas and racial biases, particularly in their responses to questions related to healthcare, which could worsen health disparities, especially for Black patients.

What are some examples of the misleading information found in the study’s findings?

The study found that chatbots provided erroneous information about racial differences in skin thickness, lung capacity, and kidney function. These inaccuracies contribute to reinforcing false beliefs about biological distinctions between Black and white individuals.

How are tech companies like OpenAI and Google responding to these findings?

Both OpenAI and Google acknowledge the need to reduce biases in their AI models and emphasize that these chatbots should not replace medical professionals. They are working on improving the fairness and accuracy of their AI systems.

What is the significance of the “red teaming” event mentioned in the article?

The “red teaming” event, scheduled by Stanford, aims to bring together experts to identify and rectify biases and limitations in large language models used in healthcare. It highlights the importance of thorough examination and testing of AI technologies to ensure fairness and accuracy in medical settings.

More about AI chatbots in healthcare

You may also like

1 comment

MedInsider October 21, 2023 - 7:08 am

big tech shud do better & fix biases in chatbots. med is serious, no room for errors. docs need good tools!


Leave a Comment


BNB – Big Big News is a news portal that offers the latest news from around the world. BNB – Big Big News focuses on providing readers with the most up-to-date information from the U.S. and abroad, covering a wide range of topics, including politics, sports, entertainment, business, health, and more.

Editors' Picks

Latest News

© 2023 BBN – Big Big News