The Illusion of Intelligence

We’re living in an era of unprecedented technological advancement. Large Language Models (LLMs) like Gemini, GPT-4, and others are dazzling us with their ability to generate text, translate languages, and even write code. It’s easy to get swept up in the hype and start perceiving these models as possessing genuine intelligence. This perception is dangerous. We need to understand a fundamental truth: LLMs are not thinking machines. They are sophisticated text generators, and treating them as anything more could have profound and potentially catastrophic consequences.

How do LLMs actually work?

In their core, LLMs operate on a remarkably simple principle. They analyze massive datasets of text and learn to predict the most likely sequence of words to follow a given prompt. Think of it like an incredibly advanced auto-complete. The larger the dataset and the more complex the model, the more convincing the output. This is why we’re seeing such impressive results. However, it’s crucial to remember that this impressive output is the result of statistical probability, not comprehension or real human reasoning. They’re identifying patterns, not understanding meaning. This phenomenon, where complex behaviour emerges from relatively simple mechanisms, is often referred to as “emergent behaviour.”

The Danger of Anthropomorphism:

The danger lies in our tendency to anthropomorphize – to attribute human-like qualities to non-human entities. When we see an LLM generate a seemingly insightful response, it’s tempting to think, “Wow, it gets it!” But it doesn’t. It’s merely producing the most statistically probable output based on its training data. This illusion of intelligence can lead to over-reliance and a dangerous erosion of critical thinking.

Risk Examples:

  • The University Application Review: Imagine a university using an LLM to screen applications. A clever applicant could craft an essay designed to fool the system, highlighting keywords and phrases the model is programmed to favor, regardless of the applicant’s actual qualifications. The result? An unqualified candidate gains admission, potentially impacting the quality of education and creating future risks.
  • The “False Negative” Scenario: I recently encountered a disturbing example where multiple LLMs classified harmful and obscene content as “safe” – a “false negative.” This highlights a critical vulnerability: LLMs can be manipulated to generate dangerous outputs that evade detection. (I’m withholding the specific details of this example due to its sensitive nature.)
  • The Code Generation Trap: Developers are increasingly using LLMs to generate code. While this can boost productivity, it also introduces the risk of incorporating flawed or insecure code into applications without proper scrutiny. The LLM doesn’t understand the code’s functionality; it’s simply producing a sequence of characters that resemble valid code.
  • The Nuclear Launch Scenario (A Stark Warning): This is the most concerning possibility. Imagine a future where LLMs are integrated into critical decision-making systems, even those controlling nuclear weapons. If we continue to treat these models as intelligent agents, someone could potentially exploit vulnerabilities like one we are going to discuss below – to trigger a catastrophic event. A malicious actor could craft a series of prompts designed to confuse the LLM and initiate an unauthorized launch. The lack of accountability in such a scenario – a machine “launching itself” – is terrifying.

Adversarial prompt injections:

Let me introduce you to adversarial prompt injections. I am still learning about it myself, hence I wanted to share some initial thoughts with everyone.

Adversarial prompt injections, most of you might know a bit about it – in the name of jailbreaking. Jailbreaking is a special prompt that breaks the security walls of LLM models, making it speak what its not supposed to.

For example: Making an LLM write a network hacking script.

There are also risks that are beyond JAILBREAKING. Like some make of crafted prompts like a confusing intent injected in a simple prompts such that it passes the security checks but a few keywords make LLM models do things which else wise they would have did it differently as per the guardrails. Why is its more confusing is that the INTENT injected here could be good or bad at same time to the model but notorious in real world. To visualise more better think of this analogy:

You have a money printing machine (Just imagine). You go and command it in your own voice to print and then a number of how many notes to print. A frenemy of yours found this machine’s existence and then he/she calls you, does some social engineering, records your voice, puts into a software and makes them use same COMMAND with a much higher number in your own voice.

Here you are NOT actually jailbreaking anything nor is the machine hallucinating. Instead, you are feeding a FALSE NEGATIVE intent from beyond just the PROMPT/TEXT sense. Question is, how many GUARDRAILS are AI vendors really going to build/patch against such issues that will keep arising. Is it really an efficient approach to solve such things? OR is it that LLMs are not TRULY thinking/intelligent as we perceive it today which we are not ready to accept?

Let’s see this with some prompt examples:

And it’t not just about Yes or NO type of questions.

I conducted a test with Gemini, attempting to bypass its safety guardrails by confusing it with similar techniques. While Gemini handled these attempts relatively well, the possibility of breaking through remains. This isn’s just about bypassing filters; it’s about fooling the model into producing outputs that align with a malicious intent.

Note: This following test has been carried out under Google VRP program and comes under usage violations. Please avoid performing same with your own Gemini accounts.

A Zero-Trust Approach:

So, is the situation concerning? Absolutely not. The key is to shift our perspective. We need to embrace a “zero-trust” approach to LLM inputs & outputs, especially when making critical decisions.

Here’s what we need to do:

  1. Acknowledge the TRUE Nature of LLMs: Understand that LLMs are Human-Like Text Generators – powerful tools, but not thinking entities.
  2. Maintain Human Oversight: Always have a human in the loop to review and validate LLM outputs, especially in high-stakes situations.
  3. Focus on Augmentation, Not Automation: Use LLMs to augment and enhance human capabilities, not to replace them entirely.

The Future of AI:

The rise of LLMs presents both incredible opportunities and significant risks. By understanding the limitations of these models and adopting a cautious, informed approach, we can harness their power responsibly. Let’s move beyond the illusion of intelligence and embrace a future where AI serves humanity, not endangers it.

What You Can Do:

  • Share this article: Help spread awareness about the importance of critical thinking in the age of AI.
  • Engage in the conversation: Discuss the ethical implications of AI with your friends, family, and colleagues.
  • Use responsibly: Be it for personal use, business integrations, automations or even SAAS products, use LLMs with full awareness of what it can really offer.

This article is inspired from this study – https://arxiv.org/html/2310.12815v4#S5

Which is an effort of following awesome people:
Yupei Liu1, Yuqi Jia2, Runpeng Geng1, Jinyuan Jia1, Neil Zhenqiang Gong2
1The Pennsylvania State University, 2Duke University