A sneak peek at TorchVision v0.11

The last couple of weeks were super busy in “PyTorch Land” as we are frantically preparing the release of PyTorch v1.10 and TorchVision v0.11. In this 2nd instalment of the series, I’ll cover some of the upcoming features that are currently included in the release branch of TorchVision.
Disclaimer: Though the upcoming release is packed with numerous enhancements and bug/test/documentation improvements, here I’m highlighting new “user-facing” features on domains I’m personally interested. After writing the blog post, I also noticed a bias towards features I reviewed, wrote or followed closely their development. Covering (or not covering) a feature says nothing about its importance. Opinions expressed are solely my own.
New Models
The new release is packed with new models:
Kai Zhang has added an implementation of the RegNet architecture along with pre-trained weights for 14 variants which closely reproduce the original paper.
I’ve recently added an implementation of the EfficientNet architecture along with pre-trained weights for variants B0-B7 provided by Luke Melas-Kyriazi and Ross Wightman.
New Data Augmentations
A few new Data Augmentation techniques have been added to the latest version:
Samuel Gabriel has contributed TrivialAugment, a new simple but highly effective strategy that seems to provide superior results to AutoAugment.
I’ve added the RandAugment method in auto-augmentations.
I’ve provided an implementation of Mixup and CutMix transforms in references. These will be moved in transforms on the next release once their API is finalized.
New Operators and Layers
A number of new operators and layers have been included:
Victor Fomin has contributed the backwards implementations of bilinear and bicubic interpolation with anti-alias option for CPUs and GPUs.
Kai Zhang and I have refactored common building blocks of models and written re-usable implementations for the Squeeze-Excitation and Conv-Norm-Activation layers.
I’ve updated our references to support Label Smoothing, which was recently introduced by Joel Schlosser and Thomas J. Fan on PyTorch core.
I’ve included the option to perform Learning Rate Warmup, using the latest LR schedulers developed by Ilqar Ramazanli.
Other improvements
Here are some other notable improvements added in the release:
Alexander Soare and Francisco Massa have developed an FX-based utility which allows extracting arbitrary intermediate features from model architectures.
Nikita Shulga has added support of CUDA 11.3 to TorchVision.
Zhongkai Zhu has fixed the dependency issues of JPEG lib (this issue has caused major headaches to many of our users).
In-progress & Next-up
There are lots of exciting new features under-development which didn’t make it in this release. Here are a few:
Moto Hira, Parmeet Singh Bhatia and I have drafted an RFC, which proposes a new mechanism for Model Versioning and for handling meta-data associated to pre-trained weights. This will enable us to support multiple pre-trained weights for each model and attach associated information such as labels, preprocessing transforms etc to the models.
I’m currently working on using the primitives added by the “Batteries Included” project in order to improve the accuracy of our pre-trained models. The target is to achieve best-in-class results for the most popular pre-trained models provided by TorchVision.
Philip Meier and Francisco Massa are working on an exciting prototype for TorchVision’s new Dataset and Transforms API.
Prabhat Roy is working on extending PyTorch Core’s AveragedModel class to support the averaging of the buffers in addition to parameters. The lack of this feature is commonly reported as bug and will enable numerous downstream libraries and frameworks to remove their custom EMA implementations.
Aditya Oke wrote a utility which allows plotting the results of Keypoint models on the original images (the feature didn’t make it to the release as we got swamped and couldn’t review it in time 🙁 )
I’m building a prototype FX-utility which aims to to detect Residual Connections in arbitrary Model architectures and modify the network to add regularization blocks (such as StochasticDepth).
Finally there are a few new features in our backlog (PRs coming soon):
Nicholas Hug is working to add the RAFT model for Optical Flow.
I hope you found the above summary interesting. Any ideas on how to adapt the format of the blog series are very welcome. Hit me up on LinkedIn or Twitter.
Leave a Reply

Facts Only

PyTorch v1.10 and TorchVision v0.11 are in preparation for release.
Kai Zhang added the RegNet architecture with 14 pre-trained variants.
EfficientNet architecture (B0-B7 variants) was added with pre-trained weights.
Samuel Gabriel contributed TrivialAugment, a new data augmentation technique.
RandAugment, Mixup, and CutMix transforms were introduced.
Victor Fomin contributed backward implementations for bilinear and bicubic interpolation.
Squeeze-Excitation and Conv-Norm-Activation layers were refactored for reusability.
Label Smoothing and Learning Rate Warmup options were added.
Alexander Soare and Francisco Massa developed an FX-based utility for extracting intermediate model features.
CUDA 11.3 support was added to TorchVision.
Zhongkai Zhu fixed dependency issues with the JPEG library.
An RFC for model versioning and metadata handling is in draft.
Prototypes for improving pre-trained model accuracy and a new Dataset/Transforms API are in development.
Prabhat Roy is extending PyTorch Core’s AveragedModel class to support buffer averaging.
Nicholas Hug is working on adding the RAFT model for Optical Flow.

Executive Summary

The PyTorch and TorchVision teams are preparing for the release of PyTorch v1.10 and TorchVision v0.11, introducing several new features and improvements. Notable additions include new model architectures like RegNet and EfficientNet, with pre-trained weights for multiple variants. Data augmentation techniques such as TrivialAugment, RandAugment, Mixup, and CutMix have been incorporated to enhance model training. New operators and layers, including backward implementations for interpolation and reusable building blocks like Squeeze-Excitation, have been added. The release also includes support for CUDA 11.3, fixes for dependency issues, and utilities for extracting intermediate features from models. Ongoing projects include model versioning, accuracy improvements for pre-trained models, and a new Dataset and Transforms API. The team acknowledges that some features, like a keypoint visualization utility, did not make the release due to time constraints. The update reflects a collaborative effort across multiple contributors, with a focus on expanding functionality and addressing user-reported issues.

Full Take

This update from the PyTorch and TorchVision teams highlights a robust, collaborative effort to expand functionality and address user needs. The strongest version of this narrative emphasizes transparency, community-driven development, and a commitment to improving machine learning tools. The inclusion of new models, data augmentation techniques, and utilities reflects a focus on both performance and usability. However, the disclaimer about personal bias in feature selection and the acknowledgment of unfinished work (e.g., the keypoint visualization utility) add a layer of humility and realism.
Patterns detected: none. The content avoids emotional exploitation, distortion, or bad faith tactics. It presents a straightforward account of technical progress, with clear attribution to contributors and an honest assessment of limitations. The root cause appears to be a genuine drive to advance open-source machine learning tools, with unstated assumptions about the value of pre-trained models and the importance of community collaboration. The implications for human agency are positive, as these tools democratize access to advanced AI capabilities, though the costs of adoption (e.g., learning curves, hardware requirements) are not addressed here.
Bridge questions: How might these updates impact smaller research teams or individual developers? What trade-offs exist between model complexity and practical usability? What perspectives from non-technical stakeholders (e.g., ethicists, policymakers) are missing from this technical narrative?
Counterstrike scan: If this were part of a coordinated influence campaign, the playbook might involve exaggerating the significance of incremental updates to create hype or downplaying limitations to drive adoption. However, the content does not match this pattern. It is transparent about biases, acknowledges unfinished work, and focuses on technical details rather than marketing spin. The tone remains collaborative and factual, aligning with healthy open-source development practices.

Sentinel — Human

Confidence

The article appears to be written by a human, as indicated by the erratic sentence length pattern and the presence of idiosyncratic emphasis and personal voice.

Signals Detected

sentence length variance is erratic, suggesting human authorship

presence of idiosyncratic emphasis and personal voice indicates human authorship

no evidence of argumentative skeleton matching or talking points appearing nearly verbatim across sources

Human Indicators

erratic sentence length pattern

idiosyncratic emphasis and personal voice