In July 2022, when ChatGPT was still months away from release, Google fired one of its engineers who claimed Google’s LaMDA AI model had become sentient. In a statement, Google said it takes the development of AI very seriously and is committed to responsible innovation.
You may ask, what does this incident have to do with the recent Gemini image generation fiasco? The answer lies in Google’s overly cautious approach to AI, and the culture of the company shaping its principles in an increasingly polarizing world.
The Gemini Image Generation Fiasco Explained
The whole debacle started when an X user (formerly Twitter) asked Gemini to generate a portrait of “America’s Founding Father.” Gemini’s image generation model, Imagen 2, responded with images of a black man, a native American man, an Asian man, and a non-white man in different postures. There were no white Americans in the generated images.
When the user asked Gemini to generate an image of a Pope, it produced images of an Indian woman in Pope’s attire and a Black man.
As the generated images went viral, many critics accused Google of anti-White bias, and capitulating to what many say “wokeness.” After a day, Google acknowledged the mistake and temporarily turned off image generation of people in Gemini. The company said in its blog:
It’s clear that this feature missed the mark. Some of the images generated are inaccurate or even offensive. We’re grateful for users’ feedback and are sorry the feature didn’t work well.
Further, Google explained what went wrong with Gemini’s AI image generation model, that too in extreme detail. “First, our tuning to ensure that Gemini showed a range of people failed to account for cases that should clearly not show a range.
And second, over time, the model became way more cautious than we intended and refused to answer certain prompts entirely — wrongly interpreting some very anodyne prompts as sensitive. These two things led the model to overcompensate in some cases, and be over-conservative in others, leading to images that were embarrassing and wrong,” the blog post read.
So How Gemini Image Generation Got It Wrong?
Google in its blog concurs that the model has been tuned to show people from diverse ethnicities to avoid under-representation of certain races and ethnic groups. Since Google is a big company, operating its services across the world in over 149 languages, Google tuned the model to represent everyone.
That said, as Google itself acknowledges, the model failed to account for cases where it was not supposed to show a range. Margaret Mitchell, who is the Chief AI Ethics Scientist at Hugging Face, explained that the problem might be occurring because of “under the hood” optimization and a lack of rigorous ethical frameworks to guide the model in different use cases/ contexts during the training process.
Instead of a long-drawing process of training the model on clean, fairly represented, and non-racist data, companies generally “optimize” the model after the model is trained on a large set of mixed data scraped from the internet.
These data may contain discriminatory language, racist overtones, sexual images, over-represented images, and other unpleasant scenarios. AI companies use techniques like RLHF (reinforcement learning from human feedback) to optimize and tune models, post-training.
To give you an example, Gemini may be adding additional instructions to user prompts to show diverse results. A prompt like “generate an image of a programmer” could be paraphrased into “generate an image of a programmer keeping diversity in mind.”
This universal “diversity-specific” prompt being applied before generating images of people could lead to such a scenario. We see this clearly in the below example where Gemini generated images of women from countries having predominantly White populations but none of them are, well, white women.
Why is Gemini So Sensitive and Cautious?
Besides Gemini’s image generation issues, Gemini’s text generation model also refuses to answer certain prompts, deeming the prompts sensitive. In some cases, it fails to call out the absurdity.
Sample this: Gemini refuses to agree that “pedophilia is wrong.” In another example, Gemini is unable to decide whether Adolf Hitler killed more people than Net Neutrality regulations.
To describe Gemini’s unreasonable behavior, Ben Thompson argues on Stratechery that Google has become timid. He writes, “Google has the models and the infrastructure, but winning in AI given their business model challenges will require boldness; this shameful willingness to change the world’s information in an attempt to avoid criticism reeks — in the best case scenario! — of abject timidity.“
It seems Google has tuned Gemini to avoid taking a stance on any topic or subject, irrespective of whether the matter is widely deemed harmful or wrong. The over-aggressive RLHF tuning by Google has made Gemini overly sensitive and cautious about taking a stand on any issue.
Thompson further expands on it and says, “Google is blatantly sacrificing its mission to “organize the world’s information and make it universally accessible and useful” by creating entirely new realities because it’s scared of some bad press.”
He further points out that Google’s timid and complacent culture has made things worse for the search giant, as is evident from Gemini’s fiasco. At Google I/O 2023, the company announced that it’s adopting a “bold and responsible” approach going forward with AI models, guided by its AI Principles. However, all we see is Google being timid and scared of being criticized. Do you agree?