SBN

Themes from PyCon US 2022

After two long years of lockdowns, virtual meetups, quarantines, and general chaos, the Python community gathered en masse to Salt Lake City for PyCon 2022. Two of our engineers attended the conference, and we are happy to report that the Python community is not only alive and well but also thriving, with multiple speakers showing off projects they worked on during the pandemic.

Here are some of the themes and highlights we enjoyed!

Supply chain security

How PyPI Survives the Coming Zombie Apocalypse

Supply chain attacks and bad actors are an active threat for the entire world. Installing a malicious package can have devastating impacts on organizations and everyday people. The Python Package Index (PyPI) maintainers are actively working to provide additional protections and security for the Python supply chain. Protections cover both the package maintainer with additional authentication security and the user downloading the package through new verification and trust techniques built directly into the platform.

Dustin Ingram detailed how PyPI is adopting new security measures that greatly improve the security and trust of the entire PyPI platform. Several of the enhancements within PyPI and related tooling were thanks in large part to our senior engineers Will Woodruff and Alex Cameron, who received a special thanks during the presentation.

  • pip-audit, a new tool that can identify vulnerable dependencies in Python applications.
  • A new sigstore client for Python, allowing ordinary uses to sign for and verify signatures on Python packages.
  • Two-factor authentication will be required for packages that are important to the community and PyPI itself, and other package authors will have the ability to opt-in to 2FA in the near future.
  • With credential-less publication using OpenID Connect integration, you will soon be able to publish packages, without credentials, directly from GitHub Actions using a simple configuration setting.
  • Signed package metadata (PEP 458) and a PyPI disaster recovery plan (PEP 480) provide security and trust for both everyday users and recovery from catastrophic events.

Also, Ashish Bijlani presented a new tool called Packj that tests packages within PyPI to identify behaviors that may add risk to applications and checks whether its metadata and code may be malicious.

Recent Python successes

Since release 3.9, Python has switched to a yearly release schedule (PEP 602), which ensures that new features are regularly being added to the language to keep it modern.

Annotations: Documenting code with code

The first day’s keynote was from Łukasz Langa, who encouraged everyone to use annotations in new code, add them to existing code, and to take advantage of the modern annotation syntactic sugar. Why? Because ugly type annotations hint at ugly code.

Annotations improve code readability and empower IDEs and static code analyzers to identify issues before they are encountered at runtime. With modern syntactic sugar, writing annotations is easier than ever, including:

  • Using built-in container types in annotations (PEP 585). No more typing.List[] or typing.Dict[], just use list[] and dict[].
  • Replacing the Union type (PEP 604) with the or operator, |. No more typing.Union[].
  • Creating distinct types to provide meaning, thus making code more readable. For example: If a function accepts a str argument, can it be any string? Can it be a blob of text or a single word? Type aliases and the NewType construct are zero-runtime overhead and can easily be added to convey meaning for users and improve static type-checking results.

Pattern matching

Python 3.9 gained a powerful pattern matching mechanism and the author of the original pattern matching PEP provided a talk detailing the history, implementation, and future of pattern matching. Brandt Bucher gave an overview of the PEP process, which included four different PEPs of different scope and target audience:

  • PEP 622 – Initial discussion. This original PEP proved to be too all encompassing so it was broken down into three smaller PEPs.
  • PEP 634 – Specification, intended for the implementers
  • PEP 635 – Motivation and Rationale, intended for the steering council
  • PEP 636 – Tutorial, intended for the end-user

Pattern matching is not a switch statement! It provides optimizations at the bytecode and interpreter level over the traditional if/elif/else pattern.

The design and implementation was influenced by established standards in other languages such as Rust, Scala, and Haskell, with some Python syntax magic included. Pattern matching is a very exciting development with some future work coming that will make it even more powerful and performant.

Python in space!

The second day’s keynote by Sara Issaoun described how Python was a critical component of reconstructing the famous first image of a black hole. Sara went into the details of using the entire earth as a massive satellite dish to capture petabytes of data and then developing optimized data analysis pipelines in Python to transform all the data to an image that is only a couple kilobytes. This is perhaps the largest dataset ever processed in history and it was done primarily in Python.

Credit: Event Horizon Telescope Collaboration

Many of us have seen this famous image, but knowing that Python played a central role in making it provides more perspective and appreciation for the language. Python is helping answer some of the most important questions in science. On May 12, the first images of the massive black hole at the center of our very own Milky Way galaxy were revealed. Python was again a major component of bringing the images of Sagittarius A* to the world.

Tooling ecosystem

There are so many tools within the Python ecosystem that make the process of development, testing, building, collaboration, and publishing easier or automated.

Binary extensions on easy mode

Building and packaging binary extensions can be cumbersome and may require developers to write extensions in C using the CPython API. There have been several improvements by the community that add more options to building and deploying binary extensions.

Henry Schreiner III provided a deep-dive into several packages and methods to make the process of building binary extensions easier.

  • pybind11 is a header-only API to write Python extensions in C++, which makes integrating C++ code and libraries easier.
  • scikit-build allows projects to build their binary extensions with CMake, rather than the setuptools or distutils method.
  • cibuildwheel package makes building and testing wheels with binary extensions easy whether with CI or locally.

Open-source maintenance on autopilot

For managing all of the un-fun parts of project maintenance, John Reese gave a talk on tips and tricks for automating all of the necessary but tedious tasks when leading an open-source project.

Time is the only resource we can’t buy. With this in mind, John offered multiple guidelines that can make the maintenance and contribution processes easier.

  • Use a pyproject.toml file to define well-formed metadata about the project.
  • Provide a well-defined list of dependencies. Versions should not be so specific that developers using the package encounter version conflicts with other dependencies. And the versions should not be so generic that incompatibilities can be introduced silently after upgrading dependencies.
  • Create reproducible and automated development workflows starting from initial setup to building, testing, and publishing. Performing a release should be as easy as possible to keep pace with the user’s needs.
  • Introduce automated code quality checks and code formatting to identify bugs and potential issues early and remove the guess-work around code style.
  • Write accessible documentation that includes expectations for contributors.

The future is bright

Python, now with browser OS support

Saturday’s keynote speaker, Peter Wang, introduced an alpha version of PyScript—Python running entirely in the browser with support for interacting with the DOM and JavaScript libraries. The web browser has silently won the OS wars and putting Python in the browser will make it even more approachable for new users.

Several demos were shown that exercised core Python functionality, such as a REPL session, and HTML capabilities such as manipulating the DOM and a simple ToDo application. More advanced demonstrations show how Python can now be used in conjunction with the popular data visualization library d3 and create interactive 3D animations with WebGL. All of this can be done with Python running within the browser, and in most cases, within a single HTML file.

To show the full power of PyScript, Peter played Super Mario in the browser, controlling Mario with hand gestures and computer vision, all in Python, which was a huge crowd pleaser. PyScript is pushing the envelope of Python and future-proofs the language for new platforms and architectures.

Python and the need for speed

Every Python developer, at some point, has been asked the question, Isn’t Python Slow? With performance-optimized packages such as NumPy and Pandas, Python has proven that it’s fast enough to solve some of the most complex problems (see The counter-intuitive rise of Python in scientific computing). But, as an interpreted language, there is still work to be done to decrease the interpretation overhead and improve overall performance.

The upcoming Python 3.11 release will have the first set of performance improvements from CPython maintainers, who are using Anaconda’s Pyston and Instagram’s Cinder as guides for improvement.

As Kevin Modzelewski detailed in his talk, there are patterns that developers can start adopting today that will take advantage of new optimizations as they become available in future releases. In the past, optimizations have been difficult to implement because of the dynamic nature of Python. For example, performing an attribute lookup on an attribute set in the init() versus one set dynamically via setattr(), had the same performance cost. As a developer, you get dynamic features at zero cost.

However, these truly dynamic features are used much less frequently than traditional programming practices. So one of the approaches for speeding up CPython is to optimize for static use cases and allow the truly dynamic cases to be slower and have costs associated with them. Now, with this principle that prioritizes static code practices, static lookups can be cached and optimized while dynamic features can be slower since they occur much less frequently.

Here’s what developers can do to prepare for CPython’s new optimizations:

  • Do not reassign global variables. Set once and reference or mutate. This will take advantage of the new lookup cache.
  • Keep objects the same shape with the same attributes. Use flag attributes instead of conditionally setting attributes so that attribute lookups take advantage of the new lookup cache.
  • Call methods directly on objects rather than locally caching the method. With attribute lookups optimized, this replaces the traditional wisdom of caching a method prior to using it repeatedly within a loop.
  • Traditional advice is to move performance-critical code to C to see significant improvements, however, this may not be the case going forward. All of the optimizations so far can only be taken advantage of within Python code. So, C code will not have the same optimizations, at least for now.

Closing thoughts

In addition to some amazing technical developments and discoveries discussed at PyCon 2022, there are several intangibles that made the conference enjoyable. Everyone was extremely kind, helpful, and courteous. Speakers used inclusive language and the entire event felt welcoming to non-technical folks, beginners, and experts alike. The wide array of topics, booths, and events made sure there was something for everyone. And the Salt Lake City Convention center was a great spot to host PyCon 2022 with plenty of room for talks with so many great restaurants within a short walking distance.

PyCon 2022 really felt like both a return to normalcy for the community and a breakthrough moment for Python to not only remain one of the most popular programming platforms across a wide variety of industries but also grow its already massive community and use case. As the closing keynote speaker Naomi Ceder so eloquently put it, the Python community, and entire open source model, is built upon a culture of gift giving. The common saying is that Python is a language that comes with “batteries included,” which, upon reflection, is only true because so much of the community has given the gift of their time, their work, and their expertise. Thanks to everyone for a fantastic PyCon 2022!

*** This is a Security Bloggers Network syndicated blog from Trail of Bits Blog authored by Trail of Bits. Read the original post at: https://blog.trailofbits.com/2022/06/09/themes-from-pycon-us-2022/