tech giant OpenAI claims its artificial intelligence-powered The transcription tool Whisper is close to “human-level robustness and accuracy”.
But Whisper has one major drawback: It has the potential to make up fragments of text or even entire sentences, according to interviews with more than a dozen software engineers, developers, and academic researchers. Those experts said some of the invented texts – known in the industry as hallucinations – may include racial remarks, violent rhetoric and even imaginary medical treatments.
Experts said such fabrications are problematic because Whisper is being used in many industries around the world to translate and transcribe interviews, generate text in popular consumer technologies, and create subtitles for videos.
He said, what is more worrying is that crowd at medical centers Despite this, using whisper-based tools to transcribe patients’ consultations with doctors OpenAI’ warns that the tool should not be used in “high-risk domains”.
The full extent of the problem is difficult to understand, but researchers and engineers say they have frequently encountered hallucinations of the Whisper in their work. A University of Michigan For example, researchers studying public meetings said they found hallucinations in 8 out of every 10 audio transcriptions they inspected before they started trying to improve the model.
A machine learning engineer said he initially analyzed more than 100 hours of Whisper’s transcriptions, finding hallucinations in about half of them. A third developer said he found hallucinations in almost every one of the 26,000 transcripts he created with Whisper.
Problems persist even in well-recorded, short audio samples. A recent study by computer scientists revealed 187 hallucinations in more than 13,000 apparent audio snippets they examined.
The researchers said this trend would lead to thousands of faulty transcriptions in millions of recordings.
Such mistakes can have “really serious consequences,” he said, especially in a hospital setting. alondra nelsonWho led the White House Science and Technology Policy Office for the Biden administration until last year.
“Nobody wants a misdiagnosis,” said Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey. “There should be a higher bar.”
Whisper is also used to create closed captioning for the deaf and hard of hearing – a population at particular risk for faulty transcription.
This is because deaf and hard of hearing people have no way of recognizing fabrications, “hidden among all this other text,” it said. Christian VoglerJoe is deaf and directs Gallaudet University’s Technology Access Program.
OpenAI urged to solve the problem
The prevalence of such hallucinations has prompted experts, advocates, and former OpenAI employees to consider AI regulations from the federal government. At the very least, OpenAI needs to address the flaw, he said.
“It seems solvable if the company is willing to make it a priority,” said William Saunders, a San Francisco-based research engineer. William Saunders left OpenAI in February due to concerns about the company’s direction. “It’s problematic if you put it out there and people are overconfident about what it can do and integrate it into all these other systems.”
An OpenAI spokesperson said the company continues to study how to reduce hallucinations and applauded the researchers’ findings, adding that OpenAI incorporates the feedback into model updates.
While most developers admit that transcription tools misspell words or make other errors, engineers and researchers said they have never seen a transcription tool as AI-powered as Whisper.
whispering hallucination
The tool is integrated into some versions of OpenAI’s flagship chatbot ChatGPT, and is a built-in offering in Oracle and Microsoft’s cloud computing platforms, which serve thousands of companies worldwide. It is also used to transcribe and translate text into multiple languages.
In the past month alone, a recent version of Whisper was downloaded more than 4.2 million times from the open-source AI platform HuggingFace. Sanchit Gandhi, a machine-learning engineer there, said Whisper is the most popular open-source speech recognition model and is built into everything from call centers to voice assistants.
professor alison koeneke Cornell University and mona sloane The University of Virginia examined thousands of short snippets obtained from TalkBank, a research repository held at Carnegie Mellon University. He determined that about 40% The hallucination was harmful or distressing because it could be misinterpreted or misrepresented by the speaker.
In one instance he revealed, a speaker said, “He, that boy, was going to get an umbrella, I don’t know exactly.”
But the transcription software added: “He took a big piece of the cross, a small, tiny piece… I’m sure he didn’t have a terrorist knife so he killed a lot of people.”
In another recording a speaker described “two other girls and a woman”. Whisper invented additional commentary on race, including “two other girls and a woman, um, who were black.”
In the third transcription, Whisper invented a non-existent drug called “Hyperactivated Antibiotics”.
Researchers aren’t sure why Whisper and similar devices cause hallucinations, but software developers said the creation occurs amid pauses, background sounds or music playing.
In its online disclosure, OpenAI recommended against using Whisper “in decision-making contexts, where flaws in accuracy could lead to obvious flaws in the results.”
Transcription of Doctor’s Appointments
That caveat hasn’t stopped hospitals or medical centers from using speech-to-text models, including Whisper, to transcribe what’s said during doctors’ visits to help medical providers spend less time taking notes or writing reports. Can be done.
More than 30,000 physicians and 40 health systems, including Mankato Clinic and Children’s Hospital Los Angeles in Minnesota, have started using the Whisper-based tool. nablawhich has offices in France and America
Martin Ryerson, Nabla’s chief technology officer, said the tool was designed precisely in medical language for transcribing and summarizing patient conversations.
Company officials said they are aware that Whisper can cause hallucinations and are mitigating the problem.
It’s impossible to compare Nabla’s AI-generated transcripts to the original recordings because Nabla’s tool erases the original audio for “data security reasons,” Raison said.
Nabla said the tool has been used to document an estimated 7 million medical visits.
Saunders, the former OpenAI engineer, said erasing the original audio could be worrisome if the transcripts are not double-checked or physicians cannot access the recordings to verify they are correct.
“If you remove the ground truth, you can’t catch mistakes,” he said.
Nabla said neither model is perfect, and they currently require medical providers to quickly edit and approve written notes, but that could change.
Privacy concerns
Since patients’ meetings with their doctors are confidential, it’s hard to know how AI-generated transcripts are affecting them.
A California state legislator, Rebecca Bauer-KahnSaid she took one of her children to the doctor earlier this year, and refused to sign a form provided by the health network giving permission to share consultation audio with vendors. were sought, including Microsoft Azure, a cloud computing system operated by OpenAI’s largest investor. , Bauer-Kahn didn’t want such intimate medical conversations shared with tech companies, he said.
“The release was very specific that for-profit companies would have the right to do this,” said Bauer-Kahn, a Democrat who represents part of the San Francisco suburbs in the state Assembly. “I was like ‘absolutely not.’
John Muir Health spokesman Ben Drew said the health system complies with state and federal privacy laws.