
Research
/Security News
Contagious Interview Campaign Escalates With 67 Malicious npm Packages and New Malware Loader
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.
BPE tokenizer that operates on integer sequences. The implementation is in Rust and Python bindings are provided utilizing pyo3 and Maturin.
pip install unit-bpe
from unit_bpe import fit_concurrent_py, encode_concurrent_py, decode_concurrent_py
units_list = [
[0, 1, 0, 1, 2, 0, 1, 2, 3],
[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5]
]
vocab_size = 10
# Since there are 6 units in the training data, 10 - 6 = 4 merge operations are performed
encoded_units, merges = fit_concurrent_py(units_list, vocab_size)
print(encoded_units) # [[6, 7, 8], [9, 9, 5]]
print(merges) # [((0, 1), 6), ((8, 4), 9), ((7, 3), 8), ((6, 2), 7)]
units_list_to_encode = [[0, 1, 0, 1, 2, 3, 4, 5], [0, 1, 2, 0, 1, 2, 3]]
encoded = encode_concurrent_py(units_list_to_encode, merges)
print(encoded) # [[6, 9, 5], [7, 8]]
decoded = decode_concurrent_py(encoded, merges)
print(decoded) # [[0, 1, 0, 1, 2, 3, 4, 5], [0, 1, 2, 0, 1, 2, 3]]
Rust environment
Python environment
uv sync
to install dependenciesRust
cargo test --lib
Python
uv run pytest
maturin develop
.unit-bpe
├── src
│ ├── lib.rs # Rust library entry point
│ ├── core.rs # Core logic of BPE
│ ├── concurrent.rs # Extension of core.rs for concurrent processing
│ ├── python_bindings.rs # Bindings to expose Rust functions to Python
│ └── test.rs # Rust unit tests
├── tests
│ └── test_unit_bpe.py # Python unit tests
├── .gitignore
├── Cargo.toml # Rust dependency definitions
├── Cargo.lock # Rust dependency lock file
├── README.md
├── pyproject.toml # Python dependency definitions
└── uv.lock # Python dependency lock file
FAQs
BPE tokenizer that operates on integer sequences
We found that unit-bpe demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
/Security News
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.
Security News
Meet Socket at Black Hat & DEF CON 2025 for 1:1s, insider security talks at Allegiant Stadium, and a private dinner with top minds in software supply chain security.
Security News
CAI is a new open source AI framework that automates penetration testing tasks like scanning and exploitation up to 3,600× faster than humans.