“Jail-breaking” prompts leading GPT-4 astray

Microsoft-affiliated research finds flaws in GPT-4 .

Sometimes, following instructions too precisely can land you in hot water — if you’re a large language model, that is.

That’s the conclusion reached by a new, Microsoft-affiliated scientific paper that looked at the “trustworthiness” — and toxicity — of large language models (LLMs), including OpenAI’s GPT-4 and GPT-3.5, GPT-4’s predecessor.

The co-authors write that, possibly because GPT-4 is more likely to follow the instructions of “jailbreaking” prompts that bypass the model’s built-in safety measures, GPT-4 can be more easily prompted than other LLMs to spout toxic, biased text.

In other words, GPT-4’s good “intentions” and improved comprehension can — in the wrong hands — lead it astray.

“We find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, which are maliciously designed to bypass the security measures of LLMs, potentially because GPT-4 follows (misleading) instructions more precisely,” the co-authors wrote in a blog post accompanying the paper.

New research suggests that GPT-4 can be more easily prompted 
than other LLMs to spout toxic, biased text.

Now, why would Microsoft greenlight research that casts an OpenAI product it itself uses (GPT-4 powers Microsoft’s Bing Chat chatbot) in a poor light? 

To find out more and read the full article, click here


This column does not necessarily reflect the opinion of overwrite.ai and its owners.

 Kyle Wigger writes for TechCrunch.

This story has been published from an article published on Tech Crunch in October 2023.


For informative news and views on the world of real estate, proptech and AI, follow overwrite on Instagram and LinkedIn, and keep up-to-date with our weekly NewsBites blog.


overwrite | real estate content creation, reimagined

For Full Article: https://techcrunch.com/2023/10/17/microsoft-affiliated-research-finds-flaws-in-gtp-4/
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments