AI researchers trick chatbots into sharing how to make cocaine as long as they believe a user is wearing a green shirt — ‘CoT Forgery’ exploit spurs LLMs to div

To determine whether confusion about roles was specific to their attack or a more generalizable principle that explains why prompt injection works, the researchers took a different approach. They hid a command in a webpa…