TransNet V2: Shot Boundary Detection Neural Network (PyTorch)
This repository contains a PyTorch implementation of TransNet V2: An effective deep network architecture for fast shot transition detection.
This is a PyTorch reimplementation of the TransNetV2 model that produces identical results as the original TensorFlow version. The code is for inference only.
Performance
Our reevaluation of other publicly available state-of-the-art shot boundary methods (F1 scores):
Installation
pip install transnetv2-pytorch
Or install from source:
git clone https://github.com/allenday/transnetv2_pytorch.git
cd transnetv2_pytorch
pip install -e .
Usage
Command Line Interface
The package provides both a direct command and Python module execution:
transnetv2_pytorch path/to/video.mp4
python -m transnetv2_pytorch path/to/video.mp4
CLI Arguments
transnetv2_pytorch path/to/video.mp4
transnetv2_pytorch path/to/video.mp4 --output predictions.txt
transnetv2_pytorch path/to/video.mp4 --device cuda
transnetv2_pytorch path/to/video.mp4 --threshold 0.3
transnetv2_pytorch --help
Note: See Device Support section for detailed information about device selection and MPS considerations.
Python API
High-Level Methods (Recommended)
import torch
from transnetv2_pytorch import TransNetV2
model = TransNetV2(device='auto')
model.eval()
state_dict = torch.load("transnetv2-pytorch-weights.pth", map_location=model.device)
model.load_state_dict(state_dict)
with torch.no_grad():
scenes = model.detect_scenes("video.mp4")
print(f"Found {len(scenes)} scenes")
for scene in scenes[:3]:
print(f"Scene {scene['shot_id']}: {scene['start_time']}s - {scene['end_time']}s")
scene_count = model.get_scene_count("video.mp4")
timestamps = model.get_scene_timestamps("video.mp4")
scenes = model.detect_scenes("video.mp4", threshold=0.3)
Mid-Level Methods (Advanced Users)
results = model.analyze_video("video.mp4")
print(f"Video FPS: {results['fps']}")
print(f"Total scenes: {results['total_scenes']}")
raw_predictions = results['single_frame_predictions']
scenes = results['scenes']
video_frames, single_frame_pred, all_frame_pred = model.predict_video("video.mp4")
Low-Level Methods (Expert Users)
frames = load_frames_somehow()
single_frame_pred, all_frame_pred = model.predict_raw(frames)
import numpy as np
predictions = single_frame_pred.cpu().detach().numpy()
scenes = model.predictions_to_scenes(predictions, threshold=0.5)
scenes_with_data = model.predictions_to_scenes_with_data(predictions, fps=25.0, threshold=0.5)
API Consistency
The CLI tool uses the same methods as the programmatic API:
- CLI:
transnetv2_pytorch video.mp4 --threshold 0.5
- API:
model.detect_scenes("video.mp4", threshold=0.5)
Both produce identical results.
Device Support
This implementation supports multiple compute devices with intelligent auto-detection:
Supported Devices
- CPU: Works on all systems (consistent, reliable)
- CUDA: For NVIDIA GPUs (fastest, consistent)
- MPS: For Apple Silicon Macs (fast but with consistency limitations)
Device Auto-Detection
By default (--device auto), the model uses this priority order:
Priority: CUDA > CPU > MPS
transnetv2_pytorch video.mp4 --device auto
When MPS is available but auto-detection chooses CPU instead, you'll see:
ℹ️ MPS device detected but not used due to numerical inconsistency issues.
Use --device mps to explicitly enable MPS (faster but inconsistent results).
MPS Device Considerations
⚠️ Important: MPS has numerical inconsistency issues with this neural network architecture.
- The Problem: Some 3D convolution operations fall back to CPU inconsistently, causing different scene detection results compared to pure CPU execution
- Impact: Same video produces different scene counts (e.g., MPS: 66 scenes, CPU: 108 scenes)
- Performance: MPS is ~3x faster than CPU but less accurate
Using MPS Explicitly
If you prioritize speed over consistency, you can explicitly request MPS:
transnetv2_pytorch video.mp4 --device mps
When explicitly using MPS, you'll see this warning:
⚠️ WARNING: MPS device has numerical inconsistency issues!
This neural network architecture has operations that fall back to CPU
inconsistently, causing different scene detection results vs. pure CPU.
Device Selection Examples
transnetv2_pytorch video.mp4 --device auto
transnetv2_pytorch video.mp4 --device cpu
transnetv2_pytorch video.mp4 --device mps
transnetv2_pytorch video.mp4 --device cuda
Python API Device Selection
from transnetv2_pytorch import TransNetV2
model = TransNetV2(device='auto')
model = TransNetV2(device='cpu')
model = TransNetV2(device='mps')
model = TransNetV2(device='cuda')
Recommendations
- For Production/Research: Use
--device auto or --device cpu for consistent, reproducible results
- For Quick Prototyping: Use
--device mps on Apple Silicon if speed is more important than precision
- For GPU Systems: Use
--device cuda for optimal performance and consistency
Memory Optimization
TransNetV2 includes transparent memory optimizations that work automatically without affecting the detection algorithm:
Automatic Memory Management
The model automatically:
- Performs periodic memory cleanup to prevent accumulation
- Uses efficient tensor management during processing
- Applies device-specific memory optimizations (MPS, CUDA, CPU)
model = TransNetV2(device='auto')
Handling Memory Issues
The memory optimizations are built-in and transparent. For persistent memory issues with very large videos:
- Reduce video resolution before processing
- Split longer videos into shorter segments
- Close other memory-intensive applications
All optimizations preserve the original algorithm parameters and accuracy!
Original Work & Training
This PyTorch implementation is based on the original TensorFlow version. For:
- Training code and datasets
- TensorFlow implementation
- Weight conversion utilities
- Research replication
Please visit the original repository: soCzech/TransNetV2
Credits
Original Work
This PyTorch implementation is based on the original TensorFlow TransNet V2 by Tomáš Souček and Jakub Lokoč.
If found useful, please cite the original work:
@article{soucek2020transnetv2,
title={TransNet V2: An effective deep network architecture for fast shot transition detection},
author={Sou{\v{c}}ek, Tom{\'a}{\v{s}} and Loko{\v{c}}, Jakub},
year={2020},
journal={arXiv preprint arXiv:2008.04838},
}
PyTorch Implementation
This production-ready PyTorch package was developed by [Your Name] with significant improvements including:
- Complete PyTorch reimplementation for inference
- Cross-platform device support (CPU, CUDA, MPS)
- Command-line interface
- Package distribution and installation
- Comprehensive testing and error handling
Related Papers
License
MIT License
Original work Copyright (c) 2020 Tomáš Souček, Jakub Lokoč
PyTorch implementation Copyright (c) 2025 Allen Day
See the original TransNetV2 repository for the original license.