
Product
Introducing Reports: An Extensible Reporting Framework for Socket Data
Explore exportable charts for vulnerabilities, dependencies, and usage with Reports, Socket’s new extensible reporting framework.
identify
Advanced tools
File identification library for Python.
Given a file (or some information about a file), return a set of standardized tags identifying what the file is.
pip install identify
If you have an actual file on disk, you can get the most information possible (a superset of all other methods):
>>> from identify import identify
>>> identify.tags_from_path('/path/to/file.py')
{'file', 'text', 'python', 'non-executable'}
>>> identify.tags_from_path('/path/to/file-with-shebang')
{'file', 'text', 'shell', 'bash', 'executable'}
>>> identify.tags_from_path('/bin/bash')
{'file', 'binary', 'executable'}
>>> identify.tags_from_path('/path/to/directory')
{'directory'}
>>> identify.tags_from_path('/path/to/symlink')
{'symlink'}
When using a file on disk, the checks performed are:
>>> identify.tags_from_filename('file.py')
{'text', 'python'}
>>> identify.tags_from_interpreter('python3.5')
{'python', 'python3'}
>>> identify.tags_from_interpreter('bash')
{'shell', 'bash'}
>>> identify.tags_from_interpreter('some-unrecognized-thing')
set()
$ identify-cli --help
usage: identify-cli [-h] [--filename-only] path
positional arguments:
path
optional arguments:
-h, --help show this help message and exit
--filename-only
$ identify-cli setup.py; echo $?
["file", "non-executable", "python", "text"]
0
$ identify-cli setup.py --filename-only; echo $?
["python", "text"]
0
$ identify-cli wat.wat; echo $?
wat.wat does not exist.
1
$ identify-cli wat.wat --filename-only; echo $?
1
identify also has an api for determining what type of license is contained
in a file. This routine is roughly based on the approaches used by
licensee (the ruby gem that github uses to figure out the license for a
repo).
The approach that identify uses is as follows:
To use the api, install via pip install identify[license]
>>> from identify import identify
>>> identify.license_id('LICENSE')
'MIT'
The return value of the license_id function is an SPDX id. Currently
licenses are sourced from choosealicense.com.
A call to tags_from_path does this:
By design, this means we don't need to partially read files where we recognize the file extension.
FAQs
File identification library for Python
We found that identify demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Product
Explore exportable charts for vulnerabilities, dependencies, and usage with Reports, Socket’s new extensible reporting framework.

Product
Socket for Jira lets teams turn alerts into Jira tickets with manual creation, automated ticketing rules, and two-way sync.

Company News
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.