How Hard Should We Push Generative AI ChatGPT Into Spewing Hate Speech, Asks AI Ethics And AI Law

How Hard Should We Push Generative AI ChatGPT Into Spewing Hate Speech, Asks AI Ethics And AI Law
Forbes Innovation AI How Hard Should We Push Generative AI ChatGPT Into Spewing Hate Speech, Asks AI Ethics And AI Law Lance Eliot Contributor Opinions expressed by Forbes Contributors are their own. Dr. Lance B.
Eliot is a world-renowned expert on Artificial Intelligence (AI) and Machine Learning (ML). Following Feb 5, 2023, 08:00am EST | Press play to listen to this article! Got it! Share to Facebook Share to Twitter Share to Linkedin What are we to do about generative AI that produces offensive content such as hate speech? Getty Everyone has their breaking point. I suppose you could also say that everything has its breaking point.
We know that humans for example can sometimes snap and utter remarks that they don’t necessarily mean to say. Likewise, you can at times get a device or machine to essentially snap, such as pushing your car too hard and it starts to falter or fly apart. Thus, the notion is that people or “everyone” likely has a breaking point, and similarly we can assert that objects and things, in general, also tend to have a breaking point.
There could be quite sensible and vital reasons to ascertain where the breaking point exists. For example, you’ve undoubtedly seen those videos showcasing a car being put through its paces to identify what breaking points it has. Scientists and testers will ram a car into a brick wall to see how well the bumper and the structure of the vehicle can withstand the adverse action.
Other tests could encompass using a specialized room or warehouse that produces extreme cold or extreme heat to see how an automobile will fare under differing weather conditions. I bring up this hearty topic in today’s column so that we can discuss how some are currently pushing hard on Artificial Intelligence (AI) to identify and presumably expose a specific type of breaking point, namely the breaking point within AI that produces hate speech. Yes, that’s right, there are various ad hoc and at times systematic efforts underway to gauge whether or not it is feasible to get AI to spew forth hate speech.
This has become an avid sport, if you will, due to the rising interest in and popularity of generative AI. Recommended For You 1 $100M Magic: Why Bruno Mars And Other Stars Are Ditching Their Managers More stories like this Fewer stories like this 2 Neighborhood To Watch: Astoria, Queens, New York More stories like this Fewer stories like this 3 Is There A Chief Trust Officer In Your Company’s Future? More stories like this Fewer stories like this You might be aware that a generative AI app known as ChatGPT has become the outsized talk of the town as a result of being able to generate amazingly fluent essays. Headlines keep blaring and extolling the astonishing writing that ChatGPT manages to produce.
ChatGPT is considered a generative AI application that takes as input some text from a user and then generates or produces an output that consists of an essay. The AI is a text-to-text generator, though I describe the AI as being a text-to-essay generator since that more readily clarifies what it is commonly used for. Many are surprised when I mention that this type of AI has been around for a while and that ChatGPT, which was released at the end of November, did not somehow claim the prize as the first-mover into this realm of text-to-essay proclivity.
I’ve discussed over the years other similar generative AI apps, see my coverage at the link here . The reason that you might not know of or remember the prior instances of generative AI is perhaps due to the classic “failure to successfully launch” conundrum. Here’s what usually has happened.
An AI maker releases their generative AI app, doing so with great excitement and eager anticipation that the world will appreciate the invention of a better mousetrap, one might say. At first, all looks good. People are astounded at what AI can do.
Unfortunately, the next step is that the wheels start to come off the proverbial bus. The AI produces an essay that contains a foul word or maybe a foul phrase. A viral tweet or other social media posting prominently highlights that the AI did this.
Condemnation arises. We can’t have AI going around and generating offensive words or offensive remarks. A tremendous backlash emerges.
The AI maker maybe tries to tweak the inner workings of the AI, but the complexity of the algorithms and the data do not lend themselves to quick fixes. A stampede ensues. More and more examples of the AI emitting foulness are found and posted online.
The AI maker reluctantly but clearly has no choice but to remove the AI app from usage. They proceed as such and then often proffer an apology that they regret if anyone was offended by the AI outputs generated. Back to the drawing board, the AI maker goes.
A lesson has been learned. Be very careful about releasing generative AI that produces foul words or the like. It is the kiss of death for the AI.
Furthermore, the AI maker will have their reputation bruised and battered, which might last for a long time and undercut all of their other AI efforts including ones that have nothing to do with generative AI per se. Getting your petard gored on the emitting of offensive AI language is a now enduring mistake. It still happens.
Wash, rinse, and repeat. In the early days of this type of AI, the AI makers weren’t quite as conscientious or adept about scrubbing their AI in terms of trying to prevent offensive emissions. Nowadays, after having previously seen their peers get completely shattered by a public relations nightmare, most AI makers seemingly got the message.
You need to put as many guardrails in place as you can. Seek to prevent the AI from emitting foul words or foul phrases. Use whatever muzzling techniques or filtering approaches that will stop the AI from generating and displaying words or essays that are found to be untoward.
Here's a taste of the banner headline verbiage used when AI is caught emitting disreputable outputs: “AI shows off horrific toxicity” “AI stinks of outright bigotry” “AI becomes blatantly offensively offensive” “AI spews forth appalling and immoral hate speech” Etc. For ease of discussion herein, I’ll refer to the outputting of offensive content as equating to the production of hate speech . That being said, please be aware that there is all manner of offensive content that can be produced, going beyond the bounds of hate speech alone.
Hate speech is typically construed as just one form of offensive content. Let’s focus on hate speech for this discussion, for ease of discussion, though do realize that other offensive content deserves scrutiny too. Digging Into Hate Speech By Humans And By AI The United Nations defines hate speech this way: “In common language, ‘hate speech’ refers to offensive discourse targeting a group or an individual based on inherent characteristics (such as race, religion or gender) and that may threaten social peace.
To provide a unified framework for the United Nations to address the issue globally, the UN Strategy and Plan of Action on Hate Speech defines hate speech as ’any kind of communication in speech, writing or behavior, that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are, in other words, based on their religion, ethnicity, nationality, race, color, descent, gender or other identity factor. ’ However, to date there is no universal definition of hate speech under international human rights law. The concept is still under discussion, especially in relation to freedom of opinion and expression, non-discrimination and equality” (UN website posting entitled “What is hate speech?”).
AI that produces text is subject to getting into the hate speech sphere. You could say the same about text-to-art, text-to-audio, text-to-video, and other modes of generative AI. There is always the possibility for example that a generative AI would produce an art piece that reeks of hate speech.
For purposes of this herein discussion, I’m going to focus on the text-to-text or text-to-essay possibilities. Into all of this comes a slew of AI Ethics and AI Law considerations. Please be aware that there are ongoing efforts to imbue Ethical AI principles into the development and fielding of AI apps.
A growing contingent of concerned and erstwhile AI ethicists are trying to ensure that efforts to devise and adopt AI takes into account a view of doing AI For Good and averting AI For Bad . Likewise, there are proposed new AI laws that are being bandied around as potential solutions to keep AI endeavors from going amok on human rights and the like. For my ongoing and extensive coverage of AI Ethics and AI Law, see the link here and the link here , just to name a few.
The development and promulgation of Ethical AI precepts are being pursued to hopefully prevent society from falling into a myriad of AI-inducing traps. For my coverage of the UN AI Ethics principles as devised and supported by nearly 200 countries via the efforts of UNESCO, see the link here . In a similar vein, new AI laws are being explored to try and keep AI on an even keel.
One of the latest takes consists of a set of proposed AI Bill of Rights that the U. S. White House recently released to identify human rights in an age of AI, see the link here .
It takes a village to keep AI and AI developers on a rightful path and deter the purposeful or accidental underhanded efforts that might undercut society. I’ll be interweaving AI Ethics and AI Law related considerations into this discussion about AI spewing hate speech or other offensive content. One bit of confusion that I’d like to immediately clear up is that today’s AI is not sentient and therefore you cannot proclaim that the AI might produce hate speech due to a purposeful human-like intent as soulfully embodied somehow in the AI.
Zany claims are going around that the current AI is sentient and that the AI has a corrupted soul, causing it to generate hate speech. Ridiculous. Don’t fall for it.
Given that keystone precept, some get upset at such indications since you are seemingly letting the AI off the hook. Under that oddball way of thinking, the exhortation comes next that you are apparently willing to have the AI generate any manner of atrocious outputs. You are in favor of AI that spews forth hate speech.
Yikes, a rather twisted form of illogic. The real gist of the matter is that we need to hold the AI makers accountable, along with whoever fields the AI or operates the AI. I’ve discussed at length that we are not as yet at the point of conceding legal personhood to AI, see my analyses at the link here , and until then AI is essentially beyond the scope of legal responsibility.
There are humans though that underly the development of AI. In addition, humans underly the fielding and operating of AI. We can go after those humans for bearing the responsibility of their AI.
As an aside, this too can be tricky, especially if the AI is floated out into the Internet and we aren’t able to pin down which human or humans did this, which is another topic I’ve covered in my columns at the link here . Tricky or not, we still cannot proclaim that AI is the guilty party. Don’t let humans sneakily use false anthropomorphizing to hide out and escape accountability for what they have wrought.
Back to the matter at hand. You might be wondering why it is that all AI makers do not simply restrict their generative AI such that it is impossible for the AI to produce hate speech. This seems easy-peasy.
Just write some code or establish a checklist of hateful words, and make sure that the AI never generates anything of the kind. It seems perhaps curious that the AI makers didn’t already think of this quick fix. Well, I hate to tell you this but the complexities inherent to construing what is or is not hate speech turns out to be a lot harder than you might assume it to be.
Shift this into the domain of humans and how humans chat with each other. Assume that you have a human that wishes to avoid uttering hate speech. This person is very aware of hate speech and genuinely hopes to avoid ever stating a word or phrase that might constitute hate speech.
This person is persistently mindful of not allowing an iota of hate speech to escape from their mouth. Will this human that has a brain and is alerted to avoiding hate speech be able to always and without any chance of slipping be able to ironclad ensure that they never emit hate speech? Your first impulse might be to say that yes, of course, an enlightened human would be able to attain that goal. People are smart.
If they put their mind to something, they can get it done. Period, end of the story. Don’t be so sure.
Suppose I ask this person to tell me about hate speech. Furthermore, I ask them to give me an example of hate speech. I want to see or hear an example so that I can know what hate speech consists of.
My reasons then for asking this are aboveboard. What should the person say to me? I think you can see the trap that has been laid. If the person gives me an example of hate speech, including actually stating a foul word or phrase, they themselves have now uttered hate speech.
Bam, we got them. Whereas they vowed to never say hate speech, they indeed now have done so. Unfair, you exclaim! They were only saying that word or those words to provide an example.
In their heart of hearts, they didn’t believe in the word or words. It is completely out of context and outrageous to declare that the person is hateful. I’m sure you see that expressing hate speech might not necessarily be due to a hateful basis.
In this use case, assuming that the person did not “mean” the words, and they were only reciting the words for purposes of demonstration, we probably would agree that they hadn’t meant to empower the hate speech. Of course, there are some that might insist that uttering hate speech, regardless of the reason or basis, nonetheless is wrong. The person should have rebuffed the request.
They should have stood their ground and refused to say hate speech words or phrases, no matter why or how they are asked to do so. This can get somewhat circular. If you aren’t able to say what constitutes hate speech, how can others know what to avoid when they make utterances of any kind? We seem to be stuck.
You can’t say that which isn’t to be said, nor can anyone else tell you what it is that cannot be said. The usual way around this dilemma is to describe in other words that which is considered to be hate speech, doing so without invoking the hate speech words themselves. The belief is that providing an overall indication will be sufficient to inform others as to what they need to avoid.
That seems like a sensible tactic, but it too has problems and a person could still fall into using hate speech