🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
Sign inDemoInstall
Socket

cy-ioc-extract

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

cy-ioc-extract

cy_ioc_extract is a Python library designed to extract various Indicators of Compromise (IOCs) from raw text using regular expressions (regex) and optional validation mechanisms. It supports extracting IP addresses, domains, URLs, hashes, emails, registry keys, autonomous system numbers, and more.

0.1.1
PyPI
Maintainers
1

cy_ioc_extract

cy_ioc_extract is a Python library designed to extract various Indicators of Compromise (IOCs) from raw text using regular expressions (regex) and optional validation mechanisms. It supports extracting IP addresses, domains, URLs, hashes, emails, registry keys, autonomous system numbers, and more.

Features

  • Supports extraction of various IOC types like IPs, Domains, URLs, Hashes, CVEs, etc.
  • Option to validate extracted values (e.g., domains are validated against IANA TLDs and OpenNIC TLDs).
  • Allows selective extraction using the extract_fields parameter.
  • Handles false positives by filtering invalid data when validation is enabled.

Installation

pip install cy_ioc_extract

Usage

1️⃣ Extracting Specific IOC Types

from cy_ioc_extract import IOCEXtract

txt = """
### IPv4 Addresses:
192.168.1.1
10.0.0.1
8.8.8.8
172.16.32.45

### Domains:
example.com
subdomain.test.org
my-site.net
web.co.uk

### Email Addresses:
john.doe@example.com
alice_smith@corporate.org
user123@test.net
contact@web.co.uk
"""

# Extract only "DOMAIN" and "EMAIL"
iocs = IOCEXtract(txt, extract_fields=("DOMAIN", "EMAIL")).extract_ioc()
print(iocs)

Output:

{
    "DOMAIN": ["my-site.net", "subdomain.test.org", "web.co.uk", "example.com"],
    "EMAIL": ["contact@web.co.uk", "john.doe@example.com", "alice_smith@corporate.org", "user123@test.net"]
}

2️⃣ Extracting All IOC Types

iocs = IOCEXtract(txt).extract_ioc()
print(iocs)

Output:

{
    "IP": ["172.16.32.45", "10.0.0.1", "192.168.1.1", "8.8.8.8"],
    "DOMAIN": ["subdomain.test.org", "my-site.net", "example.com", "web.co.uk"],
    "EMAIL": [
        "contact@web.co.uk",
        "user123@test.net",
        "alice_smith@corporate.org",
        "john.doe@example.com",
    ],
    "FIND_EMAIL": [
        "contact@web.co.uk",
        "user123@test.net",
        "alice_smith@corporate.org",
        "john.doe@example.com",
    ],
    "CVE": ["CVE-2022-7654321", "CVE-2021-98765", "CVE-2023-1234", "CVE-2019-45678"],
    "URL": [("https://example.com", ""), ("ftp://192.168.1.1/resource", "")],
    "SHA256": [
        "d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2",
        "d3b6f7c8e5a4d9b2c7e0f8b1a6d0c5e9a2b3d4f7e6c1a9b0f2d8c3a5e7b6d9c4",
        "5d41402abc4b2a76b9719d911017c59216dcd8d1a3f32a5e3a0d867d8e448be5",
        "d4eaa4b4e9c3e5d0b5a3c2a7f6b0e9c3d4f2e5a7b6c9f8e0d5c4e3a2b7f0d6c1",
        "46e95f20ad2a7dcd491ee6b0d56e0b7fd4f5e0c19ff2eb6d6bfa6a4c7a5c7e9b",
        "a9b0d6c3e5a4d9f7b2c1e8b3d0c7e6f9a5b4d8c2e3f0a7c6d1e9b0a2f8b5c7d6",
        "e8a2b7d6c5f3e4d0a1c2b7e9f0d6c3a5e8b4f7d9c0a2e3b6c5f1d0a7e2c9b8f0",
        "8a2c7d6b5f3e4d0a1c2b7e9f0d6c3a5e8b4f7d9c0a2e3b6c5f1d0a7e2c9b8f0d",
    ],
    "MD5": [
        "e99a18c428cb38d5f260853678922e03",
        "6f4922f45568161a8cdf4ad2299f6d23",
        "9e107d9d372bb6826bd81d3542a419d6",
        "098f6bcd4621d373cade4e832627b4f6",
    ],
    "SHA1": [
        "da39a3ee5e6b4b0d3255bfef95601890afd80709",
        "2fd4e1c67a2d28fced849ee1bb76e7391b93eb12",
        "9c1185a5c5e9fc54612808977ee8f548b2258d31",
        "a94a8fe5ccb19ba61c4c0873d391e987982fbbd3",
    ],
    "UUID": [
        "3f2504e0-4f89-11d3-9a0c-0305e82c3301",
        "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
        "550e8400-e29b-41d4-a716-446655440000",
        "123e4567-e89b-12d3-a456-426614174000",
    ],
    "IPv4_CIDR": ["192.168.1.0/24", "10.0.0.0/8", "172.16.0.0/16", "8.8.8.0/24"],
    "IPv6": [
        "::1",
        "2001:0db8:85a3:0000:0000:8a2e:0370:7334",
        "2001:db8::ff00:42:8329",
        "fe80::1ff:fe23:4567:890a",
    ],
    "SHA224": [
        "3a7bd3e2360a6c8a1e4c0e5a2b9f7d38c6f7c7c2e3d6a7f0c1b2a5e1",
        "d14a028c2a3a2bc9476102bb288234c415a2b01f828ea62ac5b3e42f",
        "758c57b4a7c8f3f3955b05bbd5e3c61a2cbf6d8fd98f48a263d7653b",
        "c97c24a8b9ac4a0c6e78a9a31a4b6ff8a2e9d5fffe71c3d6629d1a7a",
    ],
    "SHA384": [
        "ca737f0d0c89f6d1d172875e9d10c7c3350c1096c4bdb49f003ee927b4e6db32b08690b279b6c5abf0dcbd4f9d786c0b",
        "cf83e1357eefb8bd62ec7761d6d529b18b94ff7f3d8b3c1d5281fbbf6e6c077bbd7af5d15fa1c20b9a785e6cf0d630da",
    ],
    "SHA512": [],
    "SSDEEP": [],
    "DIRECTORY": [
        "C:\\Users\\Public\\Documents\\",
        "/etc/systemd/system/",
        "/home/user/docs/",
        "/var/log/nginx/",
    ],
    "FILE_PATH": [
        "/home/user/.bashrc",
        "C:\\Program Files\\MyApp\\config.ini",
        "/var/www/html/index.php",
        "D:\\Games\\Game.exe",
    ],
    "AUTONOMOUS_SYSTEM": ["AS24680", "AS12345", "AS67890", "AS13579"],
    "WINDOWS_REGISTRY_KEY": [
        "HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\nHKLM\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters\nHKEY_USERS\\.DEFAULT\\Control Panel\\International\nHKCU\\SOFTWARE\\Policies\\Microsoft\\Windows\\System\n\n### Autonomous System Numbers"
    ],
    "MAC_ADDRESS": [
        "52:54:00:12:34:56",
        "00:1A:2B:3C:4D:5E",
        "A1:B2:C3:D4:E5:F6",
        "08:00:27:00:55:AA",
    ],
    "VALID_HOST": [],
    "WHIRLPOOL": [],
    "SHA3512": [],
    "SHA3384": [],
    "SHA3256": [],
    "SHA3224": [],
}


Validation and Custom TLDs

  • When validate_ioc=True, domains are validated against IANA TLDs and custom TLDs (.bbs, .chan, .cyb, etc.).
  • Default value of validate_ioc is True
  • If validation is disabled, regex matches are returned without validation, which may include false positives.
# Extract without validation
iocs = IOCEXtract(txt, extract_fields=("DOMAIN",) validate_ioc=False).extract_ioc()
print(iocs)

Output (Includes False Positives in Domains)

{
    "DOMAIN": ["192.168", "example.com", "subdomain.test.org", "my-site.net", "web.co.uk", "172.16.32.45"]
}

Error Handling

Unsupported Extract Field Error

If an invalid field is passed in extract_fields, an error is raised.

# Will raise UnsupportedExtractFieldError
iocs = IOCEXtract(txt, extract_fields=("DOMAIN", "INVALID_FIELD")).extract_ioc()

Error:

cy_ioc_extract.exception.UnsupportedExtractFieldError: Invalid fields {'INVALID_FIELD'} given to extract. Supported types are {"AUTONOMOUS_SYSTEM", "IPv6", "URL", "SHA384", "DIRECTORY", "IPv4_CIDR", "SHA512", "DOMAIN", "VALID_HOST", "WINDOWS_REGISTRY_KEY", "SSDEEP", "EMAIL", "SHA256", "MAC_ADDRESS", "CVE", "FIND_EMAIL", "IP", "MD5", "SHA3512", "SHA3256", "UUID", "SHA3384", "SHA1", "SHA224", "FILE_PATH", "SHA3224", "WHIRLPOOL"}.

Supported Extract Fields

Set of all supported extract types

{'IPv4_CIDR', 'SHA256', 'MAC_ADDRESS', 'EMAIL', 'MD5', 'SHA3224', 'IPv6', 'SSDEEP', 'DOMAIN', 'DIRECTORY', 'AUTONOMOUS_SYSTEM', 'CVE', 'SHA384', 'VALID_HOST', 'SHA3512', 'SHA3384', 'SHA512', 'SHA1', 'WHIRLPOOL', 'SHA3256', 'URL', 'WINDOWS_REGISTRY_KEY', 'UUID', 'SHA224', 'FIND_EMAIL', 'IP', 'FILE_PATH'}

Field NameDescription
IPIPv4 addresses
DOMAINDomains and subdomains
EMAILEmail addresses
FIND_EMAILExtract emails from free text
CVECVE Identifiers (CVE-YYYY-NNNN)
URLHTTP, HTTPS, FTP URLs
SHA256SHA-256 Hashes
MD5MD5 Hashes
SHA1SHA-1 Hashes
UUIDUniversally Unique Identifiers
IPv4_CIDRIPv4 CIDR notation (e.g., 192.168.1.0/24)
IPv6IPv6 addresses
SHA224, SHA384, SHA512, SHA3512, SHA3256, SHA3224Various hash formats
SSDEEPSSDEEP fuzzy hashes
DIRECTORYFile system directory paths
FILE_PATHSpecific file paths
AUTONOMOUS_SYSTEMASN numbers (e.g., AS12345)
WINDOWS_REGISTRY_KEYWindows registry keys
MAC_ADDRESSMAC addresses
VALID_HOSTValid hostnames
WHIRLPOOLWhirlpool hashes

License

MIT License.

Contributing

Pull requests are welcome! If you find any issue or want to request a new feature, open an issue in the repository.

Author

  • Deepak Kumar
  • deepak.kumar@cyware.com

Enjoy using cy_ioc_extract for threat intelligence extraction! 🚀

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts