28 votes

Preventing the worst supply chain attack you can imagine in the Python ecosystem

Posted July 18 by xk3

Tags: security, docker, python, supply chain attacks, github, author.andrey polkovnichenko, author.brian moussalli, author.sachar menashe, sourch.jfrog

https://jfrog.com/blog/leaked-pypi-secret-token-revealed-in-binary-preventing-suppy-chain-attack/

Link information

This data is scraped automatically and may be incorrect.

Title: Binary secret scanning helped us prevent (what might have been) the worst supply chain attack you can imagine
Authors: drewt
Published: Jul 9 2024
Word count: 1327 words

10 comments

[9]
winther
July 18
Link
Yikes. This is certainly scary as I work with Python professionally and such a scenario seems virtually impossible to prevent. Ideally you should verify the source code of every package in all...

Yikes. This is certainly scary as I work with Python professionally and such a scenario seems virtually impossible to prevent. Ideally you should verify the source code of every package in all docker containers before deployment, though that is beyond the resources for most companies.

9 votes
1. [8]
  xk3 (OP)
  July 18 (edited July 18)
  Link Parent
  I don't know exactly how the individual here accidentally added the pyc file to the docker image but I imagine it was an interactive session and then a docker commit. I can think of three...
  
  virtually impossible to prevent
  
  I don't know exactly how the individual here accidentally added the pyc file to the docker image but I imagine it was an interactive session and then a docker commit.
  
  I can think of three strategies that would have prevented this:
  
  Have a pre-commit script that deletes all *.pyc files in the container and then runs python -m compileall .
  
  Use ENV PYTHONDONTWRITEBYTECODE=1 https://docs.python.org/3/using/cmdline.html#cmdoption-B
  
  Use .dockerignore and only build docker images from dockerfiles. Never use interactive sessions.
  
  Looks like someone foresaw this... kind of:
  
  Most likely that you don't even need to think about it, but cache files can contain some sort of sensitive information. Due to the current implementation, in .pyc files presented an absolute path to the actual files. There are situations when you don't want to share such information.
  https://stackoverflow.com/questions/59684674/should-i-add-pythons-pyc-files-to-dockerignore
  
  6 votes
  1. [2]
    winther
    July 18
    Link Parent
    I was more thinking how to prevent this kind of attack when you use pypi packages in your project. If every open source package in the official Pypi repo is contaminated then it is pretty much...
    
    I was more thinking how to prevent this kind of attack when you use pypi packages in your project. If every open source package in the official Pypi repo is contaminated then it is pretty much game over.
    
    3 votes
    
    overbyte
    July 18
    Link Parent
    Self-host your dependencies, run a vulnerability scanner, pin your packages and run auditing steps with something like jake during CI to mitigate it. Artifactory (product of the company in the...
    
    Self-host your dependencies, run a vulnerability scanner, pin your packages and run auditing steps with something like jake during CI to mitigate it. Artifactory (product of the company in the link) can mirror all the popular public ones and integrate X-Ray. In-house packages are namespaced like companyname.package and this pattern will only be allowed to pull from internal hosts (Artifactory can do this too) and prevent collisions with public packages of the same name.
    
    But even if you mirrored public sources you're still pulling from the same source that may or may not be tainted, but with a scanner you might have a chance to block the package in time or at least know about it. You can still hand-vet packages, but that will kill productivity when something like NPM can pull hundreds of dependencies from a single npm install before your dev team has started writing code.
    
    Other than that, you stick with the herd. Your developers would need to be restrained enough to avoid bleeding edge, relatively unknown frameworks in favor of popular "boring" ones like Laravel, Springboot or Flask. Also to not pull the whole world when starting up a project.
    
    We've also started using tools to generate and validate SBOMs (like syft and grype) across the stack so at the very least any report of compromised packages can be surfaced on an org-wide search in a consistent format.
    
    3 votes
  2. [5]
    sparksbet
    July 18
    Link Parent
    The steps that lead to the authorization token being publicly available are avoidable. But if something at the same scale happened again without being caught, there are absolutely zero steps a...
    
    The steps that lead to the authorization token being publicly available are avoidable. But if something at the same scale happened again without being caught, there are absolutely zero steps a company that uses Python could do to prevent themselves from being affected, since the attacker would be able to hide malicious code within CPython itself.
    
    2 votes
    
    [4]
    xk3 (OP)
    July 19 (edited July 19)
    Link Parent
    I'm not sure absolutely zero is quite true. Self-hosted PyPI is pretty common in enterprise environments. But sure, there could be backdoors in CPython right now. They would at least be...
    
    absolutely zero steps a company that uses Python could do to prevent themselves from being affected
    
    I'm not sure absolutely zero is quite true. Self-hosted PyPI is pretty common in enterprise environments.
    
    But sure, there could be backdoors in CPython right now. They would at least be auditable--it's not quite as bad as Reflections on Trusting Trust. It might be worse if PyPI the service was compromised because they could send malicious code to only specific people and no one would know... but PyPI itself is just code so it is auditable too.
    
    The token in the article was in the wild from between 2023-03-03 and 2024-06-28... 15 months. I don't think it's beyond reason to assume that at least some nation state actors knew about it.
    
    1 vote
    
    [3]
    sparksbet
    July 19
    Link Parent
    I'm not super pessimistic about this particular breach -- I think the credentials were in a place where sufficiently few people would look and I think it's more likely than not that nothing was...
    
    I'm not super pessimistic about this particular breach -- I think the credentials were in a place where sufficiently few people would look and I think it's more likely than not that nothing was done. Presumably even if something were done they could audit changes made with those credentials during that period after the fact. I think the people actually working on these repos are good at their jobs. But I don't see any particularly easy methods to defend against compromise in CPython itself as a user. You're essentially relying on things like this being caught and fixed quickly (which luckily happened in this case).
    
    1 vote
    
    [2]
    xk3 (OP)
    July 19 (edited July 19)
    Link Parent
    For CPython specifically, I think the trust required is similar in scope to trust in the Linux kernel. Linux is many more millions of lines of code but there are also more people reading it. It's...
    
    For CPython specifically, I think the trust required is similar in scope to trust in the Linux kernel. Linux is many more millions of lines of code but there are also more people reading it.
    
    It's pretty easy to build CPython from source but imho it makes sense to trust the version of python that comes with your OS package manager because I think it's more likely than not that at least one additional person, the package maintainer, has read through some of the code changes.
    
    Using pyenv or https://www.python.org/ftp/ directly is risky if you suspect that the build system was compromised.
    
    caught and fixed quickly (which luckily happened in this case)
    
    15 months is quick...?
    
    edit: I took a look at the cpython repo. It looks like Ee Durbin only made two commits within the relevant timeframe: 1829a3c9a3712b6a68a3a449e4a08787c73da51d and e1a90ec75cd0f5cab21b467a73404081eaa1077f. Both commits seem pretty innocuous
    
    1 vote
    
    sparksbet
    July 19
    Link Parent
    It was fixed very quickly after it was caught, as is noted in the article, and it was caught before we have any evidence of anyone using it at least.
    
    It was fixed very quickly after it was caught, as is noted in the article, and it was caught before we have any evidence of anyone using it at least.
    
    2 votes
rue
July 18
Link
Yikes...

Yikes...