cy_ioc_extract
is a Python library designed to extract various Indicators of Compromise (IOCs) from raw text using regular expressions (regex) and optional validation mechanisms. It supports extracting IP addresses, domains, URLs, hashes, emails, registry keys, autonomous system numbers, and more.
Features
- Supports extraction of various IOC types like IPs, Domains, URLs, Hashes, CVEs, etc.
- Option to validate extracted values (e.g., domains are validated against IANA TLDs and OpenNIC TLDs).
- Allows selective extraction using the
extract_fields
parameter.
- Handles false positives by filtering invalid data when validation is enabled.
Installation
pip install cy_ioc_extract
Usage
from cy_ioc_extract import IOCEXtract
txt = """
### IPv4 Addresses:
192.168.1.1
10.0.0.1
8.8.8.8
172.16.32.45
### Domains:
example.com
subdomain.test.org
my-site.net
web.co.uk
### Email Addresses:
john.doe@example.com
alice_smith@corporate.org
user123@test.net
contact@web.co.uk
"""
iocs = IOCEXtract(txt, extract_fields=("DOMAIN", "EMAIL")).extract_ioc()
print(iocs)
Output:
{
"DOMAIN": ["my-site.net", "subdomain.test.org", "web.co.uk", "example.com"],
"EMAIL": ["contact@web.co.uk", "john.doe@example.com", "alice_smith@corporate.org", "user123@test.net"]
}
iocs = IOCEXtract(txt).extract_ioc()
print(iocs)
Output:
{
"IP": ["172.16.32.45", "10.0.0.1", "192.168.1.1", "8.8.8.8"],
"DOMAIN": ["subdomain.test.org", "my-site.net", "example.com", "web.co.uk"],
"EMAIL": [
"contact@web.co.uk",
"user123@test.net",
"alice_smith@corporate.org",
"john.doe@example.com",
],
"FIND_EMAIL": [
"contact@web.co.uk",
"user123@test.net",
"alice_smith@corporate.org",
"john.doe@example.com",
],
"CVE": ["CVE-2022-7654321", "CVE-2021-98765", "CVE-2023-1234", "CVE-2019-45678"],
"URL": [("https://example.com", ""), ("ftp://192.168.1.1/resource", "")],
"SHA256": [
"d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2",
"d3b6f7c8e5a4d9b2c7e0f8b1a6d0c5e9a2b3d4f7e6c1a9b0f2d8c3a5e7b6d9c4",
"5d41402abc4b2a76b9719d911017c59216dcd8d1a3f32a5e3a0d867d8e448be5",
"d4eaa4b4e9c3e5d0b5a3c2a7f6b0e9c3d4f2e5a7b6c9f8e0d5c4e3a2b7f0d6c1",
"46e95f20ad2a7dcd491ee6b0d56e0b7fd4f5e0c19ff2eb6d6bfa6a4c7a5c7e9b",
"a9b0d6c3e5a4d9f7b2c1e8b3d0c7e6f9a5b4d8c2e3f0a7c6d1e9b0a2f8b5c7d6",
"e8a2b7d6c5f3e4d0a1c2b7e9f0d6c3a5e8b4f7d9c0a2e3b6c5f1d0a7e2c9b8f0",
"8a2c7d6b5f3e4d0a1c2b7e9f0d6c3a5e8b4f7d9c0a2e3b6c5f1d0a7e2c9b8f0d",
],
"MD5": [
"e99a18c428cb38d5f260853678922e03",
"6f4922f45568161a8cdf4ad2299f6d23",
"9e107d9d372bb6826bd81d3542a419d6",
"098f6bcd4621d373cade4e832627b4f6",
],
"SHA1": [
"da39a3ee5e6b4b0d3255bfef95601890afd80709",
"2fd4e1c67a2d28fced849ee1bb76e7391b93eb12",
"9c1185a5c5e9fc54612808977ee8f548b2258d31",
"a94a8fe5ccb19ba61c4c0873d391e987982fbbd3",
],
"UUID": [
"3f2504e0-4f89-11d3-9a0c-0305e82c3301",
"6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"550e8400-e29b-41d4-a716-446655440000",
"123e4567-e89b-12d3-a456-426614174000",
],
"IPv4_CIDR": ["192.168.1.0/24", "10.0.0.0/8", "172.16.0.0/16", "8.8.8.0/24"],
"IPv6": [
"::1",
"2001:0db8:85a3:0000:0000:8a2e:0370:7334",
"2001:db8::ff00:42:8329",
"fe80::1ff:fe23:4567:890a",
],
"SHA224": [
"3a7bd3e2360a6c8a1e4c0e5a2b9f7d38c6f7c7c2e3d6a7f0c1b2a5e1",
"d14a028c2a3a2bc9476102bb288234c415a2b01f828ea62ac5b3e42f",
"758c57b4a7c8f3f3955b05bbd5e3c61a2cbf6d8fd98f48a263d7653b",
"c97c24a8b9ac4a0c6e78a9a31a4b6ff8a2e9d5fffe71c3d6629d1a7a",
],
"SHA384": [
"ca737f0d0c89f6d1d172875e9d10c7c3350c1096c4bdb49f003ee927b4e6db32b08690b279b6c5abf0dcbd4f9d786c0b",
"cf83e1357eefb8bd62ec7761d6d529b18b94ff7f3d8b3c1d5281fbbf6e6c077bbd7af5d15fa1c20b9a785e6cf0d630da",
],
"SHA512": [],
"SSDEEP": [],
"DIRECTORY": [
"C:\\Users\\Public\\Documents\\",
"/etc/systemd/system/",
"/home/user/docs/",
"/var/log/nginx/",
],
"FILE_PATH": [
"/home/user/.bashrc",
"C:\\Program Files\\MyApp\\config.ini",
"/var/www/html/index.php",
"D:\\Games\\Game.exe",
],
"AUTONOMOUS_SYSTEM": ["AS24680", "AS12345", "AS67890", "AS13579"],
"WINDOWS_REGISTRY_KEY": [
"HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\nHKLM\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters\nHKEY_USERS\\.DEFAULT\\Control Panel\\International\nHKCU\\SOFTWARE\\Policies\\Microsoft\\Windows\\System\n\n### Autonomous System Numbers"
],
"MAC_ADDRESS": [
"52:54:00:12:34:56",
"00:1A:2B:3C:4D:5E",
"A1:B2:C3:D4:E5:F6",
"08:00:27:00:55:AA",
],
"VALID_HOST": [],
"WHIRLPOOL": [],
"SHA3512": [],
"SHA3384": [],
"SHA3256": [],
"SHA3224": [],
}
Validation and Custom TLDs
- When
validate_ioc=True
, domains are validated against IANA TLDs and custom TLDs (.bbs
, .chan
, .cyb
, etc.).
- Default value of
validate_ioc
is True
- If validation is disabled, regex matches are returned without validation, which may include false positives.
iocs = IOCEXtract(txt, extract_fields=("DOMAIN",) validate_ioc=False).extract_ioc()
print(iocs)
Output (Includes False Positives in Domains)
{
"DOMAIN": ["192.168", "example.com", "subdomain.test.org", "my-site.net", "web.co.uk", "172.16.32.45"]
}
Error Handling
If an invalid field is passed in extract_fields
, an error is raised.
iocs = IOCEXtract(txt, extract_fields=("DOMAIN", "INVALID_FIELD")).extract_ioc()
Error:
cy_ioc_extract.exception.UnsupportedExtractFieldError: Invalid fields {'INVALID_FIELD'} given to extract. Supported types are {"AUTONOMOUS_SYSTEM", "IPv6", "URL", "SHA384", "DIRECTORY", "IPv4_CIDR", "SHA512", "DOMAIN", "VALID_HOST", "WINDOWS_REGISTRY_KEY", "SSDEEP", "EMAIL", "SHA256", "MAC_ADDRESS", "CVE", "FIND_EMAIL", "IP", "MD5", "SHA3512", "SHA3256", "UUID", "SHA3384", "SHA1", "SHA224", "FILE_PATH", "SHA3224", "WHIRLPOOL"}.
{'IPv4_CIDR', 'SHA256', 'MAC_ADDRESS', 'EMAIL', 'MD5', 'SHA3224', 'IPv6', 'SSDEEP', 'DOMAIN', 'DIRECTORY', 'AUTONOMOUS_SYSTEM', 'CVE', 'SHA384', 'VALID_HOST', 'SHA3512', 'SHA3384', 'SHA512', 'SHA1', 'WHIRLPOOL', 'SHA3256', 'URL', 'WINDOWS_REGISTRY_KEY', 'UUID', 'SHA224', 'FIND_EMAIL', 'IP', 'FILE_PATH'}
IP | IPv4 addresses |
DOMAIN | Domains and subdomains |
EMAIL | Email addresses |
FIND_EMAIL | Extract emails from free text |
CVE | CVE Identifiers (CVE-YYYY-NNNN) |
URL | HTTP, HTTPS, FTP URLs |
SHA256 | SHA-256 Hashes |
MD5 | MD5 Hashes |
SHA1 | SHA-1 Hashes |
UUID | Universally Unique Identifiers |
IPv4_CIDR | IPv4 CIDR notation (e.g., 192.168.1.0/24) |
IPv6 | IPv6 addresses |
SHA224 , SHA384 , SHA512 , SHA3512 , SHA3256 , SHA3224 | Various hash formats |
SSDEEP | SSDEEP fuzzy hashes |
DIRECTORY | File system directory paths |
FILE_PATH | Specific file paths |
AUTONOMOUS_SYSTEM | ASN numbers (e.g., AS12345) |
WINDOWS_REGISTRY_KEY | Windows registry keys |
MAC_ADDRESS | MAC addresses |
VALID_HOST | Valid hostnames |
WHIRLPOOL | Whirlpool hashes |
License
MIT License.
Contributing
Pull requests are welcome! If you find any issue or want to request a new feature, open an issue in the repository.
Author
Enjoy using cy_ioc_extract
for threat intelligence extraction! 🚀