Launch Week Day 1: Socket for Jira Is Now Available.Learn More
Socket
Book a DemoSign in
Socket

rtx50-compat

Package Overview
Dependencies
Maintainers
1
Versions
6
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

rtx50-compat

RTX 50-series GPU compatibility layer for PyTorch and CUDA - enables sm_120 support

pipPyPI
Version
3.0.2
Maintainers
1

rtx50-compat: Enable RTX 50-series GPUs in PyTorch 🚀

PyPI Python License

Enable NVIDIA RTX 50-series GPU support (sm_120) in PyTorch and the entire Python AI ecosystem with a single import.

🎯 Why This Exists

The RTX 5090 features the new sm_120 compute capability, which isn't recognized by current PyTorch/CUDA libraries. This package provides a runtime patch that makes your RTX 5090 work seamlessly with existing AI frameworks.

🚀 Quick Start

Installation

# Recommended: use uv
uv pip install rtx50-compat

# Or with pip
pip install rtx50-compat

Basic Usage

import rtx50_compat  # Must be imported before PyTorch!
import torch

# Verify GPU is recognized
print(torch.cuda.get_device_name(0))  # NVIDIA GeForce RTX 5090
print(torch.cuda.is_available())      # True

# Now use PyTorch normally
model = torch.nn.Linear(1024, 1024).cuda()

📊 Realistic Benchmarks

Based on RTX 5090's 32GB GDDR7 VRAM and 70 TFLOPS compute:

Models that fit entirely in VRAM (fastest)

ModelRTX 5090i9-14900KSpeedup
Llama 3-8B180-250 tokens/s8-12 tokens/s~20x
Llama 3-13B120-180 tokens/s4-6 tokens/s~30x
Stable Diffusion XL40-60 img/min0.5 img/min~100x

Large models with partial offloading

ModelRTX 5090 (with offload)i9-14900KSpeedup
Llama 3-70B Q425-35 tokens/s1-3 tokens/s~15x
Mixtral 8x7B40-60 tokens/s2-4 tokens/s~20x

Note: 70B models require ~35GB for Q4 quantization, exceeding the RTX 5090's 32GB VRAM. Performance depends on offloading efficiency.

📖 Examples

Hello World - Verify Installation

import rtx50_compat
import torch

# Check if patch was applied
if torch.cuda.is_available():
    print(f"✅ GPU: {torch.cuda.get_device_name(0)}")
    print(f"✅ VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
    
    # Quick performance test
    x = torch.randn(10000, 10000, device='cuda')
    y = torch.matmul(x, x)
    print("✅ CUDA operations working!")
else:
    print("❌ CUDA not available")

Running Llama 3-8B (Fits in VRAM)

import rtx50_compat
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model - fits entirely in 32GB VRAM
model_id = "meta-llama/Meta-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="cuda"
)

# Generate at 180-250 tokens/s!
inputs = tokenizer("The future of AI is", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=100, temperature=0.8)
print(tokenizer.decode(outputs[0]))
# First convert to GGUF format for efficient memory usage
# pip install llama-cpp-python

import rtx50_compat
from llama_cpp import Llama

# Load 70B model with automatic GPU/CPU splitting
llm = Llama(
    model_path="llama-3-70b-q4_k_m.gguf",
    n_gpu_layers=-1,  # Offload all layers that fit
    n_ctx=4096,
    verbose=False
)

# Generate at 25-35 tokens/s with partial offloading
response = llm("The meaning of life is", max_tokens=100)
print(response['choices'][0]['text'])

Stable Diffusion XL

import rtx50_compat
from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16"
).to("cuda")

# Generate at 40-60 images per minute!
images = pipe(
    "A majestic mountain landscape at sunset, highly detailed, 8k",
    num_images_per_prompt=4,
    guidance_scale=7.5
).images

🔧 Technical Details

What the patch does:

  • Capability Masquerading: Makes sm_120 report as sm_90 (H100) for compatibility
  • CUDA Compilation: Adds sm_120 flags when compiling CUDA extensions
  • Memory Management: Optimizes for consumer GPU memory patterns
  • Library Fixes: Patches flash-attention, xformers, and other CUDA libraries

How it works:

# The patch intercepts CUDA capability queries
original_get_device_capability = torch.cuda.get_device_capability

def patched_get_device_capability(device=None):
    major, minor = original_get_device_capability(device)
    if major == 12 and minor == 0:  # sm_120 (RTX 50-series)
        return (9, 0)  # Masquerade as sm_90 (H100)
    return (major, minor)

🦇 Batman Mode

For subtle operations:

export RTX50_BATMAN_MODE=1
python your_script.py

Output:

🦇 I am Batman - at your local jujitsu establishment
RTX 5090 successfully disguised as H100
You didn't see anything... 🌙

📂 Repository Structure

rtx50-compat/
├── rtx50_compat.py      # Main compatibility layer
├── patches/             # PyTorch & vLLM patches (for reference)
│   ├── pytorch_rtx5090.patch
│   ├── vllm_rtx5090.patch
│   └── README.md        # Patch application guide
├── benchmarks/          # Benchmark scripts
│   ├── benchmark_8b.py  # Llama 3-8B benchmark
│   ├── benchmark_70b.py # Llama 3-70B with offloading
│   └── benchmark_sd.py  # Stable Diffusion benchmark
├── examples/            # Usage examples
│   ├── hello_world.py
│   ├── comfyui_integration.py
│   └── llama_cpp_example.py
├── tests/               # Unit tests
│   └── test_compatibility.py
├── LICENSE
├── README.md
└── setup.py

🐛 Troubleshooting

"No kernel image available" Error

Ensure rtx50_compat is imported before any other CUDA/PyTorch imports:

import rtx50_compat  # MUST be first
import torch  # Now this works

Memory Errors with Large Models

For models exceeding 32GB VRAM, use quantization and offloading:

# Use 4-bit quantization
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    device_map="auto"  # Automatic CPU/GPU splitting
)

Performance Lower Than Expected

  • Check if model fits entirely in VRAM: nvidia-smi
  • Use appropriate batch sizes (larger = more efficient)
  • Enable flash attention if available
  • Consider using specialized inference engines (vLLM, TGI, llama.cpp)

🤖 AI Assistant Integration

Using with Claude CLI

# Install the package and verify it works
claude "I have an RTX 5090. Help me set up rtx50-compat and run a Llama 3-8B model for maximum performance"

# Optimize for 70B models with offloading
claude "Show me how to run Llama 3-70B on my RTX 5090 using llama.cpp with optimal settings"

# Debug performance issues
claude "My RTX 5090 is only getting 10 tokens/s on Llama 3-13B. Help me diagnose and fix this"

# Integration with existing projects
claude "Add rtx50-compat support to my ComfyUI installation at ~/ComfyUI"

Using with Gemini CLI

# Setup and verification
gemini -p "I have an RTX 5090 with 32GB VRAM. Guide me through installing rtx50-compat and running a benchmark"

# Model recommendations
gemini -p "What's the largest LLM I can run entirely in VRAM on my RTX 5090? Include quantization options"

# Performance optimization
gemini -p "Analyze my RTX 5090 setup and suggest optimizations for running Mixtral 8x7B at maximum speed"

# Troubleshooting
gemini -p "Getting 'no kernel image' error with RTX 5090 in PyTorch. Show me how to fix with rtx50-compat"

Prompt Templates for Complex Tasks

Full Stack Setup

Help me set up a complete local AI workstation with RTX 5090:
1. Install rtx50-compat
2. Configure vLLM for serving
3. Set up Stable Diffusion XL
4. Create benchmarks for both text and image generation

Production Deployment

I need to deploy a Llama 3-70B model on RTX 5090 for production use:
- Optimize for throughput (multiple users)
- Set up proper memory management
- Configure monitoring and logging
- Handle model switching between 8B/13B/70B based on load

🤝 Contributing

PRs welcome! Areas needing help:

  • RTX 5080/5070 Ti testing
  • Additional framework patches (JAX, MXNet)
  • Performance optimizations
  • Documentation improvements

Upstream Integration

We're working on getting these patches merged upstream:

  • PyTorch: [PR #pending]
  • vLLM: [PR #pending]

📄 License

MIT License - see LICENSE

🙏 Acknowledgments

  • NVIDIA for the incredible RTX 5090 hardware
  • PyTorch team for the amazing framework
  • The local LLM community for inspiration and testing

Note: This is a community compatibility layer. Once PyTorch officially supports sm_120, this package will become obsolete. Until then, enjoy running large models locally at impressive speeds! 🚀

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts