
Research
Security News
The Growing Risk of Malicious Browser Extensions
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
Keras Vision Models (KVMM) is a collection of vision models with pretrained weights, built entirely with Keras 3. It supports a range of tasks, including segmentation, object detection, vision-language modeling (VLMs), and classification. KVMM includes custom layers and backbone support, providing flexibility and efficiency across various vision applications. For backbones, there are various weight variants like in1k
, in21k
, fb_dist_in1k
, ms_in22k
, fb_in22k_ft_in1k
, ns_jft_in1k
, aa_in1k
, cvnets_in1k
, augreg_in21k_ft_in1k
, augreg_in21k
, and many more.
From PyPI (recommended)
pip install -U kvmm
From Source
pip install -U git+https://github.com/IMvision12/keras-vision-models
Shows all available models, including backbones, segmentation models, object detection models, and vision-language models (VLMs). It also includes the names of the weights available for each specific model variant.
import kvmm
print(kvmm.list_models())
## Output:
"""
CaiTM36 : fb_dist_in1k_384
CaiTM48 : fb_dist_in1k_448
CaiTS24 : fb_dist_in1k_224, fb_dist_in1k_384
...
ConvMixer1024D20 : in1k
ConvMixer1536D20 : in1k
...
ConvNeXtAtto : d2_in1k
ConvNeXtBase : fb_in1k, fb_in22k, fb_in22k_ft_in1k, fb_in22k_ft_in1k_384
...
"""
import kvmm
print(kvmm.list_models("swin"))
# Output:
"""
SwinBaseP4W12 : ms_in1k, ms_in22k, ms_in22k_ft_in1k
SwinBaseP4W7 : ms_in1k, ms_in22k, ms_in22k_ft_in1k
SwinLargeP4W12 : ms_in22k, ms_in22k_ft_in1k
SwinLargeP4W7 : ms_in22k, ms_in22k_ft_in1k
SwinSmallP4W7 : ms_in1k, ms_in22k, ms_in22k_ft_in1k
SwinTinyP4W7 : ms_in1k, ms_in22k
"""
import kvmm
# Example 1
layer = kvmm.layers.StochasticDepth(drop_path_rate=0.1)
output = layer(input_tensor, training=True)
# Example 2
window_partition = WindowPartition(window_size=7)
windowed_features = window_partition(features, height=28, width=28)
import kvmm
import numpy as np
# default configuration
model = kvmm.models.vit.ViTTiny16()
# For Fine-Tuning (default weight)
model = kvmm.models.vit.ViTTiny16(include_top=False, input_shape=(224,224,3))
# Custom Weight
model = kvmm.models.vit.ViTTiny16(include_top=False, input_shape=(224,224,3), weights="augreg_in21k_224")
# Backbone Support
model = kvmm.models.vit.ViTTiny16(include_top=False, as_backbone=True, input_shape=(224,224,3), weights="augreg_in21k_224")
random_input = np.random.rand(1, 224, 224, 3).astype(np.float32)
features = model(random_input)
print(f"Number of feature maps: {len(features)}")
for i, feature in enumerate(features):
print(f"Feature {i} shape: {feature.shape}")
"""
Output:
Number of feature maps: 13
Feature 0 shape: (1, 197, 192)
Feature 1 shape: (1, 197, 192)
Feature 2 shape: (1, 197, 192)
...
"""
from keras import ops
from keras.applications.imagenet_utils import decode_predictions
import kvmm
from PIL import Image
model = kvmm.models.swin.SwinTinyP4W7(input_shape=[224, 224, 3])
image = Image.open("bird.png").resize((224, 224))
x = ops.convert_to_tensor(image)
x = ops.expand_dims(x, axis=0)
# Predict
preds = model.predict(x)
print("Predicted:", decode_predictions(preds, top=3)[0])
#output:
Predicted: [('n01537544', 'indigo_bunting', np.float32(0.9135666)), ('n01806143', 'peacock', np.float32(0.0003379386)), ('n02017213', 'European_gallinule', np.float32(0.00027174334))]
import kvmm
# Pre-Trained weights (cityscapes or ade20kor mit(in1k))
# ade20k and cityscapes can be used for fine-tuning by giving custom `num_classes`
# If `num_classes` is not specified by default for ade20k it will be 150 and for cityscapes it will be 19
model = kvmm.models.segformer.SegFormerB0(weights="ade20k", input_shape=(512,512,3))
model = kvmm.models.segformer.SegFormerB0(weights="cityscapes", input_shape=(512,512,3))
# Fine-Tune using `MiT` backbone (This will load `in1k` weights)
model = kvmm.models.segformer.SegFormerB0(weights="mit", input_shape=(512,512,3))
import kvmm
# With no backbone weights
backbone = kvmm.models.resnet.ResNet50(as_backbone=True, weights=None, include_top=False, input_shape=(224,224,3))
segformer = kvmm.models.segformer.SegFormerB0(weights=None, backbone=backbone, num_classes=10, input_shape=(224,224,3))
# With backbone weights
import kvmm
backbone = kvmm.models.resnet.ResNet50(as_backbone=True, weights="tv_in1k", include_top=False, input_shape=(224,224,3))
segformer = kvmm.models.segformer.SegFormerB0(weights=None, backbone=backbone, num_classes=10, input_shape=(224,224,3))
import kvmm
from PIL import Image
import numpy as np
model = kvmm.models.segformer.SegFormerB0(weights="ade20k_512")
image = Image.open("ADE_train_00000586.jpg")
processed_img = kvmm.models.segformer.SegFormerImageProcessor(image=image,
do_resize=True,
size={"height": 512, "width": 512},
do_rescale=True,
do_normalize=True)
outs = model.predict(processed_img)
outs = np.argmax(outs[0], axis=-1)
visualize_segmentation(outs, image)
import keras
import kvmm
processor = kvmm.models.clip.CLIPProcessor()
model = kvmm.models.clip.ClipVitBase16(
weights="res_224px",
input_shape=(224, 224, 3),
)
inputs = processor(text=["mountains", "tortoise", "cat"], image_paths="cat1.jpg")
output = model(
{
"images": inputs["images"],
"token_ids": inputs["input_ids"],
"padding_mask": inputs["attention_mask"],
}
)
print("Raw Model Output:")
print(output)
preds = keras.ops.softmax(output["image_logits"]).numpy().squeeze()
result = dict(zip(["mountains", "tortoise", "cat"], preds))
print("\nPrediction probabilities:")
print(result)
#output:
"""{'image_logits': <tf.Tensor: shape=(1, 3), dtype=float32, numpy=array([[11.042501, 10.388493, 18.414747]], dtype=float32)>, 'text_logits': <tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[11.042501],
[10.388493],
[18.414747]], dtype=float32)>}
Prediction probabilities:
{'mountains': np.float32(0.0006278555), 'tortoise': np.float32(0.000326458), 'cat': np.float32(0.99904567)}"""
Backbones:
Segmentation
π·οΈ Model Name | π Reference Paper | π¦ Source of Weights |
---|---|---|
SegFormer | SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers | transformers |
Vision-Language-Models (VLMs)
π·οΈ Model Name | π Reference Paper | π¦ Source of Weights |
---|---|---|
CLIP | Learning Transferable Visual Models From Natural Language Supervision | transformers |
This project leverages timm and transformers for converting pretrained weights from PyTorch to Keras. For licensing details, please refer to the respective repositories.
@misc{gc2025kvmm,
author = {Gitesh Chawda},
title = {Keras Vision Models},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/IMvision12/keras-vision-models}}
}
FAQs
Pretrained keras 3 vision models
We found that kvmm demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket researchers uncover how browser extensions in trusted stores are used to hijack sessions, redirect traffic, and manipulate user behavior.
Research
Security News
An in-depth analysis of credential stealers, crypto drainers, cryptojackers, and clipboard hijackers abusing open source package registries to compromise Web3 development environments.
Security News
pnpm 10.12.1 introduces a global virtual store for faster installs and new options for managing dependencies with version catalogs.