PyTorch is an open source deep learning framework originally created by Meta (aka Facebook) in 2016, which allows developers and data scientists to use the Python programming language to create Machine Learning (ML) projects based on:
- Computer Vision, such as self-driving cars, medical diagnosis, defect inspection, etc.
- Natural Language Processing, such as translation, text generation, virtual assistants, etc.
- Predictive Analytics, such as recommendation engines, fraud detection, automated stock trading, etc.
PyTorch competes directly with Google’s TensorFlow, which is considered to be more scalable and production-ready than PyTorch, but has a much steeper learning curve. For more information, see Neural Network Showdown: TensorFlow vs PyTorch.
While PyTorch is widely used in academia, it has struggled to compete with TensorFlow in the business world. As a result, Meta is surrendering the promotion and marketing of PyTorch to the Linux Foundation’s newly formed PyTorch Foundation, which, as a neutral organization, may have better success in promoting the commercialization of the framework. Meta/Facebook has come under fire numerous times, not least for its use of ML algorithms that has been found to promote inappropriate posts, censor (or fail to censor) content, and result in addictive social media use.
With the foundation stepping in, the direction of PyTorch should be far more transparent, being made in an open manner by all stakeholders in order to eliminate any conjecture.
Hopefully, another positive side effect of the change is an acceleration of the PyTorch feature backlog, since the move brings to the governing board industry heavyweights like AMD, Google and AWS, all of which were former contributors. Otherwise, little has changed:
- Meta will continue investing in PyTorch, and making it the primary focus of their AI research and commercial applications.
- Nvidia will continue to lead the GPU-related aspects of PyTorch.
- Microsoft will continue maintaining PyTorch integration with ONNX, the open standard for machine learning interoperability.
Why Use PyTorch for Machine Learning?
PyTorch is one of the top five fastest growing open source software projects in the world, and has become synonymous with deep learning. With more than 2,400 contributors and 18,000 projects across academic and commercial organizations, it’s often considered second only to TensorFlow for creating and working with neural networks.
PyTorch offers data scientists a key innovation that speeds up prototyping, namely dynamic computational graphs that can be defined on the fly. Instead of static computational graphs that must be defined prior to runtime (such as those created with TensorFlow), PyTorch’s graphs are rebuilt from scratch every time, allowing data scientists to iterate models much faster.
PyTorch key features and benefits include:
- TorchScript simplifies the move to production, taking PyTorch modules as input and converting them to a production-friendly format optimized for performance and portability (i.e., does not require a Python runtime).
- Dynamic Graph Computation, which allows for faster prototyping, training and experimentation since you can change network behavior on the fly, rather than waiting for the entire code to be executed.
- torch.autograd is PyTorch’s automatic differentiation engine that helps to automate both the reverse and (upcoming) forward passes, as well.
- Extensibility primarily via Python, but also via a C++ front end interface option.
- Easy Adoption based on its user-friendly interface, (relatively) simple learning curve, and extensive documentation and community support.
PyTorch underlies such high-profile projects as:
- Tesla’s Autopilot, which is at the heart of their self-driving car initiative.
- Uber’s Pyro, which is a universal Probabilistic Programming Language (PPL) that not only makes predictions on just about anything, but can also scale to large data sets with little overhead.
How to Secure PyTorch
PyTorch, like most Python packages, can be installed from PyPI. However, the Python Package Index (PyPI) offers no guarantees as to the security and integrity of the prebuilt packages they provide.
To counter this risk, security-conscious organizations will often build Python packages like PyTorch from source code. Unfortunately, most Python package build systems either:
- Create one-off builds, meaning the codebase is never updated which results in buggy, vulnerable applications over time, or
- Generate high operational overhead due to the costs of implementing and maintaining multiple build systems, one for each OS your developers and deployment systems require (e.g., Windows, Mac and Linux).
While building all Python dependencies from source code is better than implicitly trusting PyPI, there’s still no guarantee you won’t become the next Solarwinds without the proper controls in place.
As a solution to these issues, ActiveState recently introduced the industry’s only artifact repository with a secure build service that supports the security and integrity controls defined in the highest level of the Supply Chain Levels for Software Artifacts (SLSA) framework. You can use it to create your own private repository that can then be populated with the Python dependencies your teams require. Each dependency is automatically built from source code, including any linked C libraries, and can then be seamlessly distributed to all your developers, systems and processes. The result is a closed-loop environment that maximizes supply chain security.
For example, I have:
- Created a PyTorch project on the ActiveState Platform.
- Automatically built PyTorch v1.12.0 for Windows, Mac and Linux.
- Made my own custom artifact repository publicly available for anyone to use, which includes a securely built version of torch and all its dependencies.
All of which means that anyone can now install PyTorch by simply running the following command:
pip install --index-url https://har.activestate.com/Pizza-Team/PyTorch/pytorch
In effect, the ActiveState Artifact Repository acts as a secure version of PyPI, ensuring that all binary Python packages such as those used in data science projects are secure from sandbox to production. And when vulnerabilities are discovered, you’ll be notified and can automatically update, rebuild and repopulate the artifact repository with the secured artifact(s).
Want to learn more about how you can enable your data scientists to work more securely? Let our experts show you how: Contact Sales
- Register for our upcoming Workshop: ActiveState Artifact Repository
- Read more about our secure build service: The ActiveState Approach to Supply chain Levels for Software Artifacts (SLSA)
- See a 3-minute video of how it works: Use case: Create An Artifact Repository for a Python Project
- Data Sheet: ActiveState Artifact Repository