Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

aswan

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

aswan

Data collection manager

  • 0.5.15
  • PyPI
  • Socket score

Maintainers
1

aswan

Documentation Status codeclimate codecov pypi DOI

collect and organize data into a T1 data depot named after the Aswan Dam

Collect and compress data from the internet for later parsing

  • quick, parallel, customizable to collect
  • compressed to store
  • quick to sync with a remote store
    • sync to continue collecting
    • sync to parse
  • immutable collection

To Setup a Remote

set the environment variables ASWAN_AUTH_HEX and ASWAN_AUTH_PASS according to the zimmauth package, and ASWAN_REMOTE with the name of the default remote.

Concepts

  • objects
    • saved by collection events
  • events
    • collection
    • registration (v2: registration for parsing)
    • (v2) parsing
  • runs
    • manual run vs automated run
      • makes manual adding of urls easy but revertible
    • has unique id
    • generates events
    • linked to a specific version of the code
      • ideally commit hash + pip freeze
  • statuses
    • determined by base status + runs integrated
    • contains
      • what urls need to be collected
      • (v2) what collected objects need to be parsed
    • sqlite file, constantly trimmed

Structure

  • objects

    • 00, 01, ...
  • runs

    • run-hash
      • context.yaml
        • commit-hash, pip-freeze, ...
      • events.zip
  • statuses

    • status-hash
      • context.yaml
        • parent-status, integrated
      • db.sqlite.zip
  • current-run

    • context.yaml
    • events
      • these to be compressed into ../runs
    • status.sqlite
  • there is a 'TEST' status

    • cannot be integrated whatever is based on it
    • a test run can be made on it...

when starting a run:

  • check if current-run is empty
    • if not, fail with
  • find latest status
    • if it has not integrated all past runs, create a new status that has
  • start collection (+ registration)
  • either stops or breaks, all events and objects are saved to disk
  • if properly stops, move and compress stuff
    • based on one that was the starter, and current run id

Pre v1.0 laundry list

  • parallelize push / pull

  • parsing/connection/broken session error docs

  • transferring / ignoring cookies

  • template projects

    • oddsportal
      • updating thingy, based on latest match in season
    • footy
    • rotten
    • boxoffice

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc