Understanding the limitations and dangers of large language models
Large language models (or generative pre-trained transformers, GPT) need more reliable information accuracy checks to be considered for Search.
These models are great at creative applications such as storytelling, art, or music and creating privacy-preserving synthetic data for applications.
These models fail, however, at consistent factual accuracy due to AI hallucinations and transfer learning limitations in ChatGPT, Bing Chat, and Google Bard.
First, let’s define what AI hallucinations are. There are instances where a large language model creates information that is not based on factual evidence but may be influenced by its transformer architecture’s bias or erroneous decoding. In other words, the model makes up facts, which can be problematic in domains where factual accuracy is critical.
Ignoring consistent factual accuracy is dangerous in a world where accurate and reliable information is paramount in battling misinformation and disinformation.
Search companies should reconsider “re-inventing search” by mixing Search with unfiltered GPT-powered chat modalities to avoid potential harm to public health, political stability, or social cohesion.
This article extends this assertion with an example of how ChatGPT is convinced that I have been dead for four years and how my obituary, which seems very real, highlights the risks of using GPTs for search-based information retrieval. You can try it by plugging my name into ChatGPT and then convince it that I’m alive.
A few weeks ago, I decided to dive into some light research after learning that Google wiped $100 billion off its market cap because of a rushed demo where Bard, the ChatGPT competitor, shared some inaccurate information. The market seems to react negatively to the reliability and trustworthiness of this tech, but I don’t feel we’re connecting these concerns with the medium enough.
I decided to “egosurf” on ChatGPT. Note: I just discovered the word egosurf. We’ve all Googled ourselves before but this time with ChatGPT.
This decision was intentional because what better way to test for factual accuracy than to ask it about me? And this decision didn’t disappoint; I consistently got the same result: I learned I was dead.
Here is a truncated copy of the entire conversation.
ChatGPT thinks I’m dead!?
ChatGPT insisted I was dead, doubled down when I pushed back and created a whole new persona. I now understand why large language models are unreliable information stores and why Microsoft Bing should pull the chat modality out from it’s search experience.
Oh… and I also learned had I had created other tech ventures after my previous startup, LynxFit. It seems confused by what my co-founders and I built at LynxFit, and it makes up a whole story that I founded a transportation company in Ghana. Ghana? That’s also where I’m from. Wait…falsehoods mixed with truth is classic misinformation. What’s going on?
Ok, it got one fact half right and made up pretty much every other fact is upsetting. I’m pretty sure I’m still alive. At Lynxfit, I built AR software to track and coach users’ workouts with Wearables, not a smart jump rope. Also, I’m Ghanaian by heritage, but I have never built a transportation app for Ghana.
All seems plausible, but ole’ Mendacious Menendez over here made up the entire thing.
OpenAI’s documentation clearly states that ChatGPT has techniques to admit its mistakes through users’ contextual clues or feedback. So naturally, I gave it a few contextual clues and feedback to let it know it was “dreaming of a variant Earth-Two Noble Ackerson” and not the one from this reality. That did not work, and it doubled down and chose to fail harder.
Um…are you sure? Trying to nudge a chatbot towards being factual is like yelling at a PA system that is playing back a recorded message. It is a whacky thing to do but for “research” I spent an hour with this thing. After all, OpenAI claims it admits mistakes with some ‘prompt coaxing’.
A total waste of time.
A while later, it switches to a new mode after I constrain it by asking it to admit it didn’t know an answer.