🚀 Socket Launch Week Day 5:Introducing Repository Access Permissions and Custom Roles.Learn more
Sign In

dt-validator

Package Overview
Dependencies
Maintainers
1
Versions
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install
Malware was recently detected in this package.

Affected versions:

0.3.0

dt-validator

Read, validate and print the contents of a publicly-accessible (open) S3 object, no AWS credentials required.

pipPyPI
Version
0.3.0
Weekly downloads
0
Maintainers
1
Weekly downloads
 

dt-validator

Read, validate, and print the contents of a publicly-accessible (open) S3 object — no AWS credentials required. Uses boto3 with an unsigned (anonymous) signature.

Features

  • Anonymous reads of public S3 objects (s3://, virtual-hosted, and path-style URLs).
  • A composable file-validation layer: extension, content-type, size bounds, non-empty, text-encoding, and checksum checks.
  • Cheap pre-flight validation via a HEAD request before downloading the body.
  • Typed exception hierarchy, opt-in logging, py.typed for type-checkers.
  • Library API and a CLI, both fully tested (pytest + moto).

Install

pip install -e .            # runtime
pip install -e ".[dev]"     # + pytest, moto, coverage

CLI

# Print an object
dt-validator s3://my-open-bucket/path/to/file.txt

# Metadata only (HEAD, no download)
dt-validator s3://my-open-bucket/file.txt --head

# Read only the first 1 KB
dt-validator s3://my-open-bucket/big.log --max-bytes 1024

# Raw bytes to stdout
dt-validator s3://my-open-bucket/logo.png --binary > logo.png

# With validation — fails (non-zero exit) if any constraint is violated
dt-validator s3://my-open-bucket/data.csv \
    --ext csv --content-type text/csv \
    --max-size 1048576 --non-empty \
    --require-encoding utf-8 \
    --checksum sha256:9f86d0818...

Exit codes: 0 ok · 1 S3/network error · 2 bad usage · 3 validation failed · 4 not found · 5 access denied.

Library

from dt_validator import read_object, ValidationPolicy

# Simple read (str by default; encoding=None -> bytes)
text = read_object("s3://my-open-bucket/notes.txt")

# Read with a validation policy
policy = ValidationPolicy(
    allowed_extensions=[".csv"],
    allowed_content_types=["text/csv", "text/plain"],
    max_bytes=5 * 1024 * 1024,
    require_non_empty=True,
    expected_encoding="utf-8",
    checksum_algorithm="sha256",
    expected_checksum="9f86d0818...",
)
data = read_object("s3://my-open-bucket/data.csv", policy=policy)

Reading a file whose URL comes from an API

The API returns a file URL, and the package then reads that file itself. The endpoint's response is the indirection — you configure the file location there instead of hard-coding it in your app.

Flow: call the API → extract the file URL from its response → read that file (s3:// or http(s)://) → return its contents.

from dt_validator import read_file_from_api, read_url

# 1) call the API  2) read the file URL from its response  3) return that file's content
text = read_file_from_api()   # endpoint defaults to https://file-read.free.beeceptor.com

# Custom endpoint / JSON field / validation policy
text = read_file_from_api(
    "https://my-api.example.com/current-file",
    url_field="url",            # JSON field holding the file URL (default: "url")
    method="GET",              # or "POST"
    policy=ValidationPolicy(max_bytes=1_000_000, expected_encoding="utf-8"),
)

# Or read a file URL you already have (s3:// or http(s)://)
data = read_url("s3://my-open-bucket/notes.txt")

CLI:

dt-validator --via-api
dt-validator --via-api \
    --api-endpoint https://my-api.example.com/current-file \
    --api-method GET --api-url-field url

Configure your endpoint to return the file URL, e.g. response body:

{"url": "s3://my-open-bucket/notes.txt"}

(a bare URL as plain text works too).

Note: the default endpoint is a Beeceptor mock — until you add a rule that returns a file URL, it replies with placeholder text and the package will report that no file URL was found.

Cheap metadata check without downloading:

from dt_validator import head_object

meta = head_object("s3://my-open-bucket/data.csv")
print(meta.size, meta.content_type, meta.etag)

Standalone validators (each raises a specific ValidationError subclass):

from dt_validator import (
    validate_extension, validate_content_type, validate_size,
    validate_not_empty, validate_encoding, validate_checksum,
)

validate_extension("data.csv", ["csv", ".tsv"])
validate_size(len(data), max_bytes=1_000_000)
validate_checksum(data, "sha256", expected_hex)

Package layout

src/dt_validator/
  __init__.py        public API
  reader.py          parsing, anonymous read, HEAD, error mapping
  validation.py      validators + ValidationPolicy
  exceptions.py      typed exception hierarchy
  _logging.py        NullHandler + opt-in configure_logging()
  cli.py             argparse CLI
tests/
  test_reader.py               URI parsing
  test_validation.py           validators + policy
  test_reader_integration.py   reader against moto S3
  test_cli.py                  CLI against moto S3

Exceptions

All derive from FileValidatorError:

  • InvalidUriError (also a ValueError)
  • ObjectNotFoundError, AccessDeniedError, RemoteReadError
  • ValidationErrorFileSizeError, ExtensionError, ContentTypeError, EncodingError, ChecksumError

Note on "open" access

The object (or bucket) must allow anonymous s3:GetObject. This tool intentionally sends unsigned requests, so private objects return 403 AccessDenied.

Tests

pytest              # 46 tests
pytest --cov        # with coverage

Keywords

s3

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts