
Product
Announcing Precomputed Reachability Analysis in Socket
Socket’s precomputed reachability slashes false positives by flagging up to 80% of vulnerabilities as irrelevant, with no setup and instant results.
A tool for extracting Indicators of Compromise (IOCs) from security reports in HTML, PDF, and plain text formats.
Author: Marc Rivero | @seifreed
Version: 1.0.0
pip install iocparser-tool
# Clone the repository
git clone https://github.com/seifreed/iocparser.git
cd iocparser
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install as a package with all dependencies
pip install -e .
# Or install just the requirements
pip install -r requirements.txt
# Initialize and download MISP warning lists (do this first)
iocparser --init
# Analyze a PDF file
iocparser -f report.pdf
# Analyze an HTML file
iocparser -f report.html
# Analyze a text file
iocparser -f report.txt
# Initialize and download MISP warning lists (do this first)
iocparser --init
# Analyze a PDF file
iocparser -f report.pdf
# Analyze an HTML file
iocparser -f report.html
# Analyze a text file
iocparser -f report.txt
# Force specific file type (pdf, html, text)
iocparser -f report -t pdf
iocparser -f report -t html
iocparser -f report -t text
# Save outputs to a specific file
iocparser -f report.pdf -o results.json
iocparser -f report.pdf -o results.txt
# Print results to screen only
iocparser -f report.pdf -o -
# Use JSON format (default is text)
iocparser -f report.pdf --json
# Analyze a report from a URL
iocparser -u https://example.com/report.html
# Specify content type for a URL
iocparser -u https://example.com/report -t html
--no-defang Disable automatic defanging of IOCs
--no-check-warnings Don't check IOCs against MISP warning lists
--force-update Force update of MISP warning lists
--init Download and initialize MISP warning lists
-h, --help Show help message
You can use IOCParser as a library in your Python projects:
# Example 1: Extract IOCs from a file
from iocparser import extract_iocs_from_file
# Process a file (automatically detects file type)
normal_iocs, warning_iocs = extract_iocs_from_file('path/to/report.pdf')
print(f"Found {len(normal_iocs.get('domains', []))} normal domains")
print(f"Found {len(warning_iocs.get('domains', []))} potential false positive domains")
# With additional options
normal_iocs, warning_iocs = extract_iocs_from_file(
'path/to/report.html',
check_warnings=True, # Check against MISP warning lists
force_update=False, # Don't force update MISP lists
file_type='html', # Force file type (optional)
defang=True # Defang the IOCs
)
# Example 2: Extract IOCs from text content directly
from iocparser import extract_iocs_from_text
text = "This sample malware contacts evil.com with IP 192.168.1.1 and uses hash 5f4dcc3b5aa765d61d8327deb882cf99"
normal_iocs, warning_iocs = extract_iocs_from_text(text)
# Print the extracted IOCs
for ioc_type, iocs_list in normal_iocs.items():
print(f"{ioc_type}: {iocs_list}")
If you need more control, you can use the individual components directly:
from iocparser import IOCExtractor, PDFParser, HTMLParser, MISPWarningLists
# Extract text from a PDF or HTML file
parser = PDFParser("path/to/report.pdf")
# or
# parser = HTMLParser("path/to/report.html")
text_content = parser.extract_text()
# Extract IOCs
extractor = IOCExtractor(defang=True)
iocs = extractor.extract_all(text_content)
# Check against warning lists
warning_lists = MISPWarningLists()
normal_iocs, warning_iocs = warning_lists.separate_iocs_by_warnings(iocs)
from iocparser import IOCExtractor
extractor = IOCExtractor(defang=True)
# Extract specific IOC types
md5_hashes = extractor.extract_md5(text)
sha1_hashes = extractor.extract_sha1(text)
sha256_hashes = extractor.extract_sha256(text)
sha512_hashes = extractor.extract_sha512(text)
domains = extractor.extract_domains(text)
ips = extractor.extract_ips(text)
urls = extractor.extract_urls(text)
bitcoin = extractor.extract_bitcoin(text)
yara_rules = extractor.extract_yara_rules(text)
hosts = extractor.extract_hosts(text)
emails = extractor.extract_emails(text)
cves = extractor.extract_cves(text)
registry_keys = extractor.extract_registry(text)
filenames = extractor.extract_filenames(text)
filepaths = extractor.extract_filepaths(text)
# Extract all IOC types at once
all_iocs = extractor.extract_all(text) # Returns a dictionary with all IOCs
iocparser -f reports/APT28_report.pdf
iocparser -u https://example.com/security-report.pdf --json
iocparser -f report.html --no-defang
from iocparser import extract_iocs_from_file
import os
reports_dir = "path/to/reports"
for filename in os.listdir(reports_dir):
if filename.endswith(".pdf") or filename.endswith(".html"):
file_path = os.path.join(reports_dir, filename)
print(f"Processing {filename}...")
normal_iocs, warning_iocs = extract_iocs_from_file(file_path)
# Do something with the extracted IOCs
print(f"Found {sum(len(iocs) for iocs in normal_iocs.values())} IOCs")
This project is available under the MIT License. You are free to use, modify, and distribute it, provided that you include the original copyright notice and attribution to the original author.
Required Attribution:
When using this project in your own work, please include a clear reference to the original author and repository.
FAQs
A tool for extracting Indicators of Compromise from security reports
We found that iocparser-tool demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket’s precomputed reachability slashes false positives by flagging up to 80% of vulnerabilities as irrelevant, with no setup and instant results.
Product
Socket is launching experimental protection for Chrome extensions, scanning for malware and risky permissions to prevent silent supply chain attacks.
Product
Add secure dependency scanning to Claude Desktop with Socket MCP, a one-click extension that keeps your coding conversations safe from malicious packages.