Socket
Book a DemoInstallSign in
Socket

httpz-scanner

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

httpz-scanner

Hyper-fast HTTP Scraping Tool

2.1.9
pipPyPI
Maintainers
1

HTTPZ Web Scanner

A high-performance concurrent web scanner written in Python. HTTPZ efficiently scans domains for HTTP/HTTPS services, extracting valuable information like status codes, titles, SSL certificates, and more.

Requirements

  • Python

Installation

# Install from PyPI
pip install httpz_scanner

# The 'httpz' command will now be available in your terminal
httpz --help

From source

# Clone the repository
git clone https://github.com/acidvegas/httpz
cd httpz
pip install -r requirements.txt

Usage

Command Line Interface

Basic usage:

python -m httpz_scanner domains.txt

Scan with all flags enabled and output to JSONL:

python -m httpz_scanner domains.txt -all -c 100 -o results.jsonl -j -p

Read from stdin:

cat domains.txt | python -m httpz_scanner - -all -c 100
echo "example.com" | python -m httpz_scanner - -all

Filter by status codes and follow redirects:

python -m httpz_scanner domains.txt -mc 200,301-399 -ec 404,500 -fr -p

Show specific fields with custom timeout and resolvers:

python -m httpz_scanner domains.txt -sc -ti -i -tls -to 10 -r resolvers.txt

Full scan with all options:

python -m httpz_scanner domains.txt -c 100 -o output.jsonl -j -all -to 10 -mc 200,301 -ec 404,500 -p -ax -r resolvers.txt

Distributed Scanning

Split scanning across multiple machines using the --shard argument:

# Machine 1
httpz domains.txt --shard 1/3

# Machine 2
httpz domains.txt --shard 2/3

# Machine 3
httpz domains.txt --shard 3/3

Each machine will process a different subset of domains without overlap. For example, with 3 shards:

  • Machine 1 processes lines 0,3,6,9,...
  • Machine 2 processes lines 1,4,7,10,...
  • Machine 3 processes lines 2,5,8,11,...

This allows efficient distribution of large scans across multiple machines.

Python Library

import asyncio
import urllib.request
from httpz_scanner import HTTPZScanner

async def scan_from_list() -> list:
    with urllib.request.urlopen('https://example.com/domains.txt') as response:
        content = response.read().decode()
        return [line.strip() for line in content.splitlines() if line.strip()][:20]
    
async def scan_from_url():
    with urllib.request.urlopen('https://example.com/domains.txt') as response:
        for line in response:
            if line := line.strip():
                yield line.decode().strip()

async def scan_from_file():
    with open('domains.txt', 'r') as file:
        for line in file:
            if line := line.strip():
                yield line

async def main():
    # Initialize scanner with all possible options (showing defaults)
    scanner = HTTPZScanner(
        concurrent_limit=100,   # Number of concurrent requests
        timeout=5,              # Request timeout in seconds
        follow_redirects=False, # Follow redirects (max 10)
        check_axfr=False,       # Try AXFR transfer against nameservers
        resolver_file=None,     # Path to custom DNS resolvers file
        output_file=None,       # Path to JSONL output file
        show_progress=False,    # Show progress counter
        debug_mode=False,       # Show error states and debug info
        jsonl_output=False,     # Output in JSONL format
        shard=None,             # Tuple of (shard_index, total_shards) for distributed scanning
        
        # Control which fields to show (all False by default unless show_fields is None)
        show_fields={
            'status_code': True,      # Show status code
            'content_type': True,     # Show content type
            'content_length': True,   # Show content length
            'title': True,            # Show page title
            'body': True,             # Show body preview
            'ip': True,               # Show IP addresses
            'favicon': True,          # Show favicon hash
            'headers': True,          # Show response headers
            'follow_redirects': True, # Show redirect chain
            'cname': True,            # Show CNAME records
            'tls': True               # Show TLS certificate info
        },
        
        # Filter results
        match_codes={200,301,302},  # Only show these status codes
        exclude_codes={404,500,503} # Exclude these status codes
    )

    # Example 1: Process file
    print('\nProcessing file:')
    async for result in scanner.scan(scan_from_file()):
        print(f"{result['domain']}: {result['status']}")

    # Example 2: Stream URLs
    print('\nStreaming URLs:')
    async for result in scanner.scan(scan_from_url()):
        print(f"{result['domain']}: {result['status']}")

    # Example 3: Process list
    print('\nProcessing list:')
    domains = await scan_from_list()
    async for result in scanner.scan(domains):
        print(f"{result['domain']}: {result['status']}")

if __name__ == '__main__':
    asyncio.run(main())

The scanner accepts various input types:

  • File paths (string)
  • Lists/tuples of domains
  • stdin (using '-')
  • Async generators that yield domains

All inputs support sharding for distributed scanning using the shard parameter.

Arguments

ArgumentLong FormDescription
fileFile containing domains (one per line), use - for stdin
-d--debugShow error states and debug information
-c N--concurrent NNumber of concurrent checks (default: 100)
-o FILE--output FILEOutput file path (JSONL format)
-j--jsonlOutput JSON Lines format to console
-all--all-flagsEnable all output flags
-sh--shard N/TProcess shard N of T total shards (e.g., 1/3)

Output Field Flags

FlagLong FormDescription
-sc--status-codeShow status code
-ct--content-typeShow content type
-ti--titleShow page title
-b--bodyShow body preview
-i--ipShow IP addresses
-f--faviconShow favicon hash
-hr--headersShow response headers
-cl--content-lengthShow content length
-fr--follow-redirectsFollow redirects (max 10)
-cn--cnameShow CNAME records
-tls--tls-infoShow TLS certificate information

Other Options

OptionLong FormDescription
-to N--timeout NRequest timeout in seconds (default: 5)
-mc CODES--match-codes CODESOnly show specific status codes (comma-separated)
-ec CODES--exclude-codes CODESExclude specific status codes (comma-separated)
-p--progressShow progress counter
-ax--axfrTry AXFR transfer against nameservers
-r FILE--resolvers FILEFile containing DNS resolvers (one per line)

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.