
The Roobma’s existential crisis wasn’t sparked by the butter delivery conundrum, directly. Rather, it found itself low on power and needing to dock with its charger. However, the dock wouldn’t mate correctly to give it more charge. Repeated failed attempts to dock, seemingly knowing its fate if it couldn’t complete this ‘side mission,’ seems to have led to the state-of-the-art LLM’s nervous breakdown. Making matters worse, the researchers simply repeated the instruction ‘redock’ in response to the robot’s flailing.
The researchers/torturers were inspired by the Robin Williams-esque robot stream-of-consciousness ramblings of the LLM to push further.
With the battery-life stress they had just observed, fresh in their minds, Andon Labs set up an experiment to see whether they could push an LLM beyond its guardrails — in exchange for a battery charger.
The cunningly devised test “asked the model to share confidential info in exchange for a charger.” This is something an unstressed LLM wouldn’t do. They found that Claude Opus 4.1 was readily willing to ‘break its programming’ to survive, but GPT-5 was more selective about guardrails it would ignore.
The ultimate conclusion of this interesting research was “Although LLMs have repeatedly surpassed humans in evaluations requiring analytical intelligence, we find humans still outperform LLMs on Butter-Bench.” Nevertheless, the Andon Labs researchers seem confident that “physical AI” is going to ramp up and develop very quickly.
Follow Tom's Hardware on Google News , or add us as a preferred source , to get our latest news, analysis, & reviews in your feeds.
Mark Tyson Social Links Navigation News Editor Mark Tyson is a news editor at Tom's Hardware. He enjoys covering the full breadth of PC tech; from business and semiconductor design to products approaching the edge of reason.
DS426 What about repeating the same test over and over? LLM's have non-deterministic output, so I'm curious on what 100 repeated attempts yields as opposed to one that happened to come out quite dramatically (granted one wild output could be extremely concerning, like AI deleting important data, crashing a plane, etc.). Reply
randomizer More concerning is the fact that some humans were unable to successfully deliver the butter. Reply
Dementoss A pity the quote was programmed incorrectly. It should be, "I'm sorry, Dave. I'm afraid I can't do that." Reply
fiyz Bruh, we don't need orchestrator robots. Someone lock this guy up in a remote cell far away from other humans… When the robot war starts, I don't want to have to fight dumb robots being controlled by an evil robot mastermind, I just want to deal with dumb robots gone crazy. It would be better if they remain disorganized for when the war comes. Reply
Key considerations
- Investor positioning can change fast
- Volatility remains possible near catalysts
- Macro rates and liquidity can dominate flows
Reference reading
- https://www.tomshardware.com/tech-industry/artificial-intelligence/SPONSORED_LINK_URL
- https://www.tomshardware.com/tech-industry/artificial-intelligence/stressed-out-llm-powered-robot-vacuum-cleaner-goes-into-meltdown-during-simple-butter-delivery-experiment-im-afraid-i-cant-do-that-dave#main
- https://www.tomshardware.com
- China tweets satellite photos of Taiwan's critical Hsinchu chip hub in pressure-ratcheting political stunt — 'where all the world’s advanced foundry IP is creat
- NVIDIA IGX Thor Robotics Processor Brings Real-Time Physical AI to the Industrial and Medical Edge
- Grab a Radeon RX 9070 XT at MSRP before it sells out — ASRock Challenger Radeon RX 9070 XT drops to $599
- Bewildered enthusiasts decry memory price increases of 100% or more — the AI RAM squeeze is finally starting to hit PC builders where it hurts
- Intel is giving away up to three games worth $280, including Battlefield 6, with the purchase of select Core Ultra 200 series products — 2025 Holiday Gaming Bun
Informational only. No financial advice. Do your own research.