
Product
Introducing Webhook Events for Alert Changes
Add real-time Socket webhook events to your workflows to automatically receive software supply chain alert changes in real time.
identify
Advanced tools
File identification library for Python.
Given a file (or some information about a file), return a set of standardized tags identifying what the file is.
pip install identify
If you have an actual file on disk, you can get the most information possible (a superset of all other methods):
>>> from identify import identify
>>> identify.tags_from_path('/path/to/file.py')
{'file', 'text', 'python', 'non-executable'}
>>> identify.tags_from_path('/path/to/file-with-shebang')
{'file', 'text', 'shell', 'bash', 'executable'}
>>> identify.tags_from_path('/bin/bash')
{'file', 'binary', 'executable'}
>>> identify.tags_from_path('/path/to/directory')
{'directory'}
>>> identify.tags_from_path('/path/to/symlink')
{'symlink'}
When using a file on disk, the checks performed are:
>>> identify.tags_from_filename('file.py')
{'text', 'python'}
>>> identify.tags_from_interpreter('python3.5')
{'python', 'python3'}
>>> identify.tags_from_interpreter('bash')
{'shell', 'bash'}
>>> identify.tags_from_interpreter('some-unrecognized-thing')
set()
$ identify-cli --help
usage: identify-cli [-h] [--filename-only] path
positional arguments:
path
optional arguments:
-h, --help show this help message and exit
--filename-only
$ identify-cli setup.py; echo $?
["file", "non-executable", "python", "text"]
0
$ identify-cli setup.py --filename-only; echo $?
["python", "text"]
0
$ identify-cli wat.wat; echo $?
wat.wat does not exist.
1
$ identify-cli wat.wat --filename-only; echo $?
1
identify also has an api for determining what type of license is contained
in a file. This routine is roughly based on the approaches used by
licensee (the ruby gem that github uses to figure out the license for a
repo).
The approach that identify uses is as follows:
To use the api, install via pip install identify[license]
>>> from identify import identify
>>> identify.license_id('LICENSE')
'MIT'
The return value of the license_id function is an SPDX id. Currently
licenses are sourced from choosealicense.com.
A call to tags_from_path does this:
By design, this means we don't need to partially read files where we recognize the file extension.
FAQs
File identification library for Python
We found that identify demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Product
Add real-time Socket webhook events to your workflows to automatically receive software supply chain alert changes in real time.

Security News
ENISA has become a CVE Program Root, giving the EU a central authority for coordinating vulnerability reporting, disclosure, and cross-border response.

Product
Socket now scans OpenVSX extensions, giving teams early detection of risky behaviors, hidden capabilities, and supply chain threats in developer tools.