Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

soupstars

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

soupstars

Declarative web parsers

  • 2.11.25
  • PyPI
  • Socket score

Maintainers
1

Soup Stars

Build Status

Version Python

Soup Stars is a framework for building web parsers with Python. It is designed to make building, deploying, and scheduling web parsers easier by simplifying what you need to get started.

Quickstart

pip install soupstars

Creating a parser

New parsers are created by typing soupstars create into a terminal, and supplying the name of a python module.

soupstars create myparser.py

Soup Stars will use a template parser to help you get started. This example creates a parser that extracts headlines from articles on the New York Times website.

from soupstars import data, follow

url = "https://www.nytimes.com"

@follow
def follow(url):
    return (url.domain == "www.nytimes.com") and (url.match("\d{4}\/\d{2}\/\d{2}"))

@parse
def h1(soup):
    return soup.h1.text

You can test that the parser functions correctly.

soupstars run myparser

Use soupstars --help to see a full list of available commands.

More documentation is available here.

Development

Start the docker services.

docker-compose up -d

Set up the containers.

docker-compose exec web flask s3 mb soupstars-archive
docker-compose exec web flask db upgrade
docker-compose exec web flask seed schedules
docker-compose exec web flask seed plans
docker-compose exec web flask seed user
docker-compose exec web flask seed parsers

Run the tests.

docker-compose run --rm client pytest -vs

Releasing

New tags that pass on CI will automatically be pushed to docker hub.

To deploy to PyPI requires manually running the following commands.

pip3 install twine
python3 setup.py sdist bdist_wheel
twine upload dist/*

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc