
Research
SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.
py-tlsh
Advanced tools
TLSH (Trend Micro Locality Sensitive Hash) is a fuzzy matching library. Given a byte stream with a minimum length of 50 bytes TLSH generates a hash value which can be used for similarity comparisons. Similar objects will have similar hash values which allows for the detection of similar objects by comparing their hash values. Note that the byte stream should have a sufficient amount of complexity. For example, a byte stream of identical bytes will not generate a hash value.
This Python module supercedes the python-tlsh package on PyPi.
The improvements in 4.7.2 were:
The improvements in 4.5.0 were:
import tlsh
tlsh.hash(data)
Note data needs to be bytes - not a string. This is because TLSH is for binary data and binary data can contain a NULL (zero) byte.
In default mode the data must contain at least 50 bytes to generate a hash value and that it must have a certain amount of randomness. To get the hash value of a file, try
tlsh.hash(open(file, 'rb').read())
Note: the open statement has opened the file in binary mode.
import tlsh
h1 = tlsh.hash(data)
h2 = tlsh.hash(similar_data)
score = tlsh.diff(h1, h2)
h3 = tlsh.Tlsh()
with open('file', 'rb') as f:
for buf in iter(lambda: f.read(512), b''):
h3.update(buf)
h3.final()
# this assertion is stating that the distance between a TLSH and itself must be zero
assert h3.diff(h3) == 0
score = h3.diff(h1)
The diffxlen function removes the file length component of the tlsh header from the comparison.
tlsh.diffxlen(h1, h2)
If a file with a repeating pattern is compared to a file with only a single instance of the pattern,
then the difference will be increased if the file lenght is included.
But by using the diffxlen function, the file length will be removed from consideration.
If you use the "conservative" option, then the data must contain at least 256 characters. For example,
import os
tlsh.conservativehash(os.urandom(256))
should generate a hash, but
tlsh.conservativehash(os.urandom(100))
will generate TNULL as it is less than 256 bytes.
If you need to generate old style hashes (without the "T1" prefix) then use
tlsh.oldhash(os.urandom(100))
The old and conservative options may be combined:
tlsh.oldconservativehash(os.urandom(500))
FAQs
TLSH (C++ Python extension)
We found that py-tlsh demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.

Company News
Socket is proud to join the OpenJS Foundation as a Silver Member, deepening our commitment to the long-term health and security of the JavaScript ecosystem.

Security News
npm now links to Socket's security analysis on every package page. Here's what you'll find when you click through.