Are We PEP740 Yet?

(trailofbits.github.io)

39 points | by djoldman 3 hours ago

3 comments

  • simonw 2 hours ago
    I suggest reading this detailed article to understand why they built this: https://blog.trailofbits.com/2024/11/14/attestations-a-new-g...

    The implementation is interesting - it's a static page built using GitHub Actions, and the key part of the implementation is this Python function here: https://github.com/trailofbits/are-we-pep740-yet/blob/a87a88...

    If you read the code you can see that it's hitting pages like https://pypi.org/simple/pydantic/ - which return HTML - but sending this header instead:

        Accept: application/vnd.pypi.simple.v1+json
    
    Then scanning through the resulting JSON looking for files that have a provenance that isn't set to null.

    Here's an equivalent curl + jq incantation:

        curl -s \
          -H 'Accept: application/vnd.pypi.simple.v1+json' \
          https://pypi.org/simple/pydantic/ \
        | jq '.files | map(select(.provenance != null)) | length'
  • marky1991 2 hours ago
    Could someone explain why this is important? My uninformed feeling towards PEP 740 is 'who cares?'.
    • hadlock 2 hours ago
      I believe this is a system where a human/system builds a package and uploads and cryptographically signs it, verifying end to end that the code uploaded to github for widget-package 3.2.1 is the code you're downloading to your laptop for widget-package 3.2.1 and there's no chance it is modified/signed by a adversarial third party
      • TZubiri 6 minutes ago
        1- Why not compile it? 2- does pip install x not guarantee that?
      • marky1991 2 hours ago
        That's my understanding also, but I still feel like 'who cares' about that attack scenario. Am I just insufficiently paranoid? Is this kind of attack really likely? (How is it done, other than evil people at pypi?)
        • OutOfHere 1 hour ago
          Yes, it is likely. It is done by evil intermediaries on hosts that are used to create and upload the package. It is possible for example if the package is created and uploaded on the developer laptop which is compromised.

          ---

          From the docs:

          > PyPI's support for digital attestations defines a strong and verifiable association between a file on PyPI and the source repository, workflow, and even the commit hash that produced and uploaded the file.

          • abotsis 1 hour ago
            It still doesn’t protect against rogue commits to packages by bad actors. Which, IMO, is the larger threat (and one that’s been actively exploited). So while a step in the right direction, it certainly doesn’t completely solve the supply chain risk.
          • mikepurvis 33 minutes ago
            It’s honestly a bit nuts that in 2024 a system as foundational as PyPI just accepts totally arbitrary, locally built archives for its “packages”.

            I appreciate that it’s a community effort and compute isn’t free, but Launchpad did this correctly from the very beginning — dput your signed dsc and it will build and sign binary debs for you.

    • rty32 2 hours ago
      • marky1991 2 hours ago
        But that involved one of the developers of said package committing malicious code and it being accepted and then deployed. How would this prevent that from happening?

        I thought this was about ensuring the code that developers pushed is what you end up downloading.

        • rty32 1 hour ago
          No, part of the malicious code is in test data file, and the modified m4 file is not in the git repo. The package signed and published by Jia Tan is not reproducible from the source and intentionally done that way.

          You might want to revisit the script of xz backdoor.

          • epcoa 1 hour ago
            An absolutely irrelevant detail here. While there was an additional flourish of obfuscation of questionable prudence, the attack was not at all dependent on that. It’s a library that justifies all kinds of seemingly innocuous test data. There were plenty of creative ways to smuggle in selective backdoors to the build without resorting to a compromised tar file. The main backdoor mechanism resided in test data in the git repo, the entire compromise could have.
  • zahlman 2 hours ago
    >Using a Trusted Publisher is the easiest way to enable attestations, since they come baked in! See the PyPI user docs and official PyPA publishing action to get started.

    For many smaller packages in this top 360 list I could imagine this representing quite a bit of a learning curve.

    • amiga386 2 hours ago
      Or it could see Microsoft tightening its proprietary grip over free software by not only generously offering gratis hosting, but now also it's a Trusted Publisher and you're not - why read those tricky docs? Move all your hosting to Microsoft today, make yourself completely dependent on it, and you'll be rewarded with a green tick!
      • simonw 2 hours ago
        I think it's a little rude to imply that the people who worked on this are serving an ulterior motive.
        • akira2501 1 hour ago
          It's possible they're just naive.
      • zahlman 1 hour ago
        Thankfully, the PyPI side of the hosting is done by a smaller, unrelated company (Fastly).
    • simonw 2 hours ago
      I think it's pretty hard to get a Python package into the top 360 list while not picking up any maintainers who could climb that learning curve pretty quickly. I wrote my own notes on how to use Trusted Publishers here: https://til.simonwillison.net/pypi/pypi-releases-from-github

      The bigger problem is for projects that aren't hosting on GitHub and using GitHub Actions - I'm sure there are quite a few of those in the top 360.

      I expect that implementing attestations without using the PyPA GitHub Actions script has a much steeper learning curve, at least for the moment.

    • woodruffw 35 minutes ago
      I suspect that most of the packages in the top 360 list are already hosted on GitHub, so this shouldn’t be a leap for many of them. This is one of the reasons we saw Trusted Publishing adopted relatively quickly: it required less work and was trivial to adopt within existing CI workflows.