Product
Introducing License Enforcement in Socket
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
integv is a file integrity verifier based on the format of the file. It's capable of checking the integrity of multiple types of files without any additional information like Content-Length or checksum. The main goal of integv is to detect file corruption (mostly shortened) during file download caused by network glitch. But integv still can be used for many other purposes as well.
pip install integv
Sometimes when you download some media files using requests
, a network glitch
happens and your file downloaded is corrupted. If there's a Content-Length
header, you can compare it to the downloaded file size. But the worst thing is
most of the time, media files are served using HTTP chunked transfer encoding,
and there's no Content-Length
header. So you don't know if the download file
is good or not. And that's the time integv comes to help, just feed the
downloaded file to integv and it can verify the integrity of the file with zero
other information like Content-Length
. All integv needs are the type of the file.
integv has many advantages.
integv is light, integv is written in pure python with 0 dependencies. Which makes integv portable and easy to integrate into your project.
integv is fast, integv does not try to decode the file, it just checks all the key points in the file, so integv is much faster than other solutions that try to decode the file.
Here's a comparison of verifying a 70 MB mp4 file using integv and FFmpeg. integv only takes about 60 microseconds, FFmpeg takes about 10 seconds.
python3 -m timeit "import integv;integv.FileIntegrityVerifier().verify('../test.mp4')"
5000 loops, best of 5: 61.4 usec per loop
python3 -m timeit "import subprocess;subprocess.run('ffmpeg -v error -i ../test.mp4 -f null -', shell=True)"
1 loop, best of 5: 11.2 sec per loop
import integv
# load a test mp4 file
file_path = "./test/sample/video/sample.mp4"
with open(file_path, "rb") as f:
file = f.read()
# verify using the file and file_type
# file_type can be a simple filename extension like "mp4" or "jpg"
# or you can provide a full MIME type like "video/mp4" or "image/jpeg"
integv.verify(file, file_type="mp4") # True
# a corrupted file (in this case, shortened by one byte) will not pass the verification
integv.verify(file[:-1], file_type="mp4") # False
# the file input for the verifier can be bytes or a binary file like object
integv.verify(open(file_path, "rb"), file_type="mp4") # True
# it can also be a string representing a file path
# if the file path contains a proper filename extension, the file_type is not needed.
integv.verify(file_path) # True
video/mp4
video/x-matroska
video/webm
video/vnd.avi
video/x-flv
* not f4v. Basically, f4v is just mp4 with a different name. For f4v files, use mp4 integrity verifier.
image/jpeg
image/png
image/gif
image/webp
audio/x-wav
audio/ogg
The integv verifier only checks the file by the format information embedded in file like file size in header, chunk size in chunk header, end of file markers, etc. It does not try to decode the file which makes integv fast and simple. But that also means the possibility of false negative (corrupted files can't be detected). The baseline of all integv file integrity verifiers must be extremely sensitive to shortened files, which is very common in file downloaded from the network. Some types of files like png contain checksum inside, which is less error-prone. By all means, do not use integv for any kind of security verification. As a bad file which passes the verification can be simply forged.
A few bytes of data were deleted at the end of the file. The length of the file is reduced.
Original file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXY
A large chunk of data was deleted at the end of the file. The length of the file is reduced.
Original file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNO
A few bytes of data were substituted at the end of file. The length of the file remains the same.
Original file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYA
A large chunk of data was substituted at the end of file. The length of the file remains the same.
Original file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNAAAAAAAAAAAA
A few bytes of data were deleted at a random position of the file. The length of the file is reduced.
Original file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLNOPQRSTUVWXYZ
^
A large chunk of data was deleted at a random position of the file. The length of the file is reduced.
Original file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLYZ
^
A few bytes of data were substituted at a random position of the file. The length of the file remains the same.
Original file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLANOPQRSTUVWXYZ
^
A large chunk of data wass substituted at a random position of the file. The length of the file remains the same.
Original file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
Corrupted file: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLAAAAAAAAAAWXYZ
^
From my personal experience, the most common types of corruption happen during
file downloading using requests
or similar things are SDE and LDE.
SDE | LDE | SSE | LSE | SDR | LDR | SSR | LSR | |
---|---|---|---|---|---|---|---|---|
mp4 | :smiley: | :smiley: | :frowning: | :smiley: | :smiley: | :smiley: | :frowning: | :smiley: |
mkv | :smiley: | :smiley: | :frowning: | :smiley: | :smiley: | :smiley: | :frowning: | :smiley: |
webm | :smiley: | :smiley: | :frowning: | :smiley: | :smiley: | :smiley: | :frowning: | :smiley: |
avi | :smiley: | :smiley: | :frowning: | :frowning: | :smiley: | :smiley: | :frowning: | :frowning: |
flv | :smiley: | :smiley: | :smiley: | :smiley: | :smiley: | :smiley: | :frowning: | :smiley: |
jpeg | :smiley: | :smiley: | :smiley: | :smiley: | :frowning: | :frowning: | :frowning: | :frowning: |
png | :smiley: | :smiley: | :smiley: | :smiley: | :smiley: | :smiley: | :smiley: | :smiley: |
gif | :smiley: | :smiley: | :smiley: | :smiley: | :frowning: | :frowning: | :frowning: | :frowning: |
webp | :smiley: | :smiley: | :frowning: | :frowning: | :smiley: | :smiley: | :frowning: | :frowning: |
wav | :smiley: | :smiley: | :frowning: | :frowning: | :smiley: | :smiley: | :frowning: | :frowning: |
ogg | :smiley: | :smiley: | :frowning: | :smiley: | :smiley: | :smiley: | :frowning: | :smiley: |
ogg(slow) | :smiley: | :smiley: | :smiley: | :smiley: | :smiley: | :smiley: | :smiley: | :smiley: |
You can use a FileIntegrityVerifier object to verify your file just like
integv.verify
.
from integv import FileIntegrityVerifier
verifier = FileIntegrityVerifier()
verifier.verify("./test/sample/video/sample.mp4") # True
There are some specialized file integrity verifier for different types of files.
You can find them in integv.video
, integv.image
and integv.audio
. They are
used exactly like the FileIntegrityVerifier
except file_type
are not needed.
from integv.video import MP4IntegrityVerifier
verifier = MP4IntegrityVerifier()
verifier.verify("./test/sample/video/sample.mp4") # True
slow
argument in verifier initializationA boolean argument slow
can be provided in verifier initialization. It will
enable some sophisticated verification to eliminate false negatives. And that
will consume more time. The default value of slow
is False
. For now, only
one verifier, OGGIntegrityVerifier
has a slow
method of verification.
from integv import FileIntegrityVerifier
verifier = FileIntegrityVerifier()
slow_verifier = FileIntegrityVerifier(slow=True)
file_path = "./test/sample/audio/sample.ogg"
verifier.verify(file_path) # True
slow_verifier.verify(file_path) # also True, but slower
FAQs
A file integrity verifier based on the format of the file.
We found that integv demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Product
We're launching a new set of license analysis and compliance features for analyzing, managing, and complying with licenses across a range of supported languages and ecosystems.
Product
We're excited to introduce Socket Optimize, a powerful CLI command to secure open source dependencies with tested, optimized package overrides.