Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

fastervit

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

fastervit

FasterViT: Fast Vision Transformers with Hierarchical Attention

  • 0.9.8
  • PyPI
  • Socket score

Maintainers
1

FasterViT: Fast Vision Transformers with Hierarchical Attention

FasterViT achieves a new SOTA Pareto-front in terms of accuracy vs. image throughput without extra training data !

Note: Please use the latest NVIDIA TensorRT release to enjoy the benefits of optimized FasterViT ops.

Quick Start

We can import pre-trained FasterViT models with 1 line of code. First, FasterViT can be simply installed by:

pip install fastervit

A pretrained FasterViT model with default hyper-parameters can be created as in the following:

>>> from fastervit import create_model

# Define fastervit-0 model with 224 x 224 resolution

>>> model = create_model('faster_vit_0_224', 
                          pretrained=True,
                          model_path="/tmp/faster_vit_0.pth.tar")

model_path is used to set the directory to download the model.

We can also simply test the model by passing a dummy input image. The output is the logits:

>>> import torch

>>> image = torch.rand(1, 3, 224, 224)
>>> output = model(image) # torch.Size([1, 1000])

We can also use the any-resolution FasterViT model to accommodate arbitrary image resolutions. In the following, we define an any-resolution FasterViT-0 model with input resolution of 576 x 960, window sizes of 12 and 6 in 3rd and 4th stages, carrier token size of 2 and embedding dimension of 64:

>>> from fastervit import create_model

# Define any-resolution FasterViT-0 model with 576 x 960 resolution
>>> model = create_model('faster_vit_0_any_res', 
                          resolution=[576, 960],
                          window_size=[7, 7, 12, 6],
                          ct_size=2,
                          dim=64,
                          pretrained=True)

Note that the above model is intiliazed from the original ImageNet pre-trained FasterViT with original resolution of 224 x 224. As a result, missing keys and mis-matches could be expected since we are addign new layers (e.g. addition of new carrier tokens, etc.)

We can simply test the model by passing a dummy input image. The output is the logits:

>>> import torch

>>> image = torch.rand(1, 3, 576, 960)
>>> output = model(image) # torch.Size([1, 1000])

Results + Pretrained Models

ImageNet-1K

FasterViT ImageNet-1K Pretrained Models

NameAcc@1(%)Acc@5(%)Throughput(Img/Sec)Resolution#Params(M)FLOPs(G)Download
FasterViT-082.195.95802224x22431.43.3model
FasterViT-183.296.54188224x22453.45.3model
FasterViT-284.296.83161224x22475.98.7model
FasterViT-384.997.21780224x224159.518.2model
FasterViT-485.497.3849224x224424.636.6model
FasterViT-585.697.4449224x224975.5113.0model
FasterViT-685.897.4352224x2241360.0142.0model

ImageNet-21K

FasterViT ImageNet-21K Pretrained Models (ImageNet-1K Fine-tuned)

NameAcc@1(%)Acc@5(%)Resolution#Params(M)FLOPs(G)Download
FasterViT-4-21K-22486.697.8224x224271.940.8model
FasterViT-4-21K-38487.698.3384x384271.9120.1model
FasterViT-4-21K-51287.898.4512x512271.9213.5model
FasterViT-4-21K-76887.998.5768x768271.9480.4model

Robustness (ImageNet-A - ImageNet-R - ImageNet-V2)

All models use crop_pct=0.875. Results are obtained by running inference on ImageNet-1K pretrained models without finetuning.

NameA-Acc@1(%)A-Acc@5(%)R-Acc@1(%)R-Acc@5(%)V2-Acc@1(%)V2-Acc@5(%)
FasterViT-023.957.645.960.470.990.0
FasterViT-131.263.347.561.972.691.0
FasterViT-238.268.949.663.473.791.6
FasterViT-344.273.051.965.675.092.2
FasterViT-449.075.456.069.675.792.7
FasterViT-552.777.656.970.076.093.0
FasterViT-653.778.457.170.176.193.0

A, R and V2 denote ImageNet-A, ImageNet-R and ImageNet-V2 respectively.

Citation

Please consider citing FasterViT if this repository is useful for your work.

@article{hatamizadeh2023fastervit,
  title={FasterViT: Fast Vision Transformers with Hierarchical Attention},
  author={Hatamizadeh, Ali and Heinrich, Greg and Yin, Hongxu and Tao, Andrew and Alvarez, Jose M and Kautz, Jan and Molchanov, Pavlo},
  journal={arXiv preprint arXiv:2306.06189},
  year={2023}
}

Licenses

Copyright © 2023, NVIDIA Corporation. All rights reserved.

This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.

For license information regarding the timm repository, please refer to its repository.

For license information regarding the ImageNet dataset, please see the ImageNet official website.

Acknowledgement

This repository is built on top of the timm repository. We thank Ross Wrightman for creating and maintaining this high-quality library.

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc