New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

scrab

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

scrab

Fast and easy to use scraper for the content-centered web pages, e.g. blog posts, news, etc.

0.0.6
PyPI

Maintainers: 1

scrab - Fuzzy content scraper

PyPI - Python Version

Fast and easy to use content scraper for topic-centred web pages, e.g. blog posts, news and wikis.

The tool uses heuristics to extract main content and ignores surrounding noise. No processing rules. No XPath. No configuration.

Installing

pip install scrab

Usage

scrab https://blog.post

Store extracted content to a file:

scrab https://blog.post > content.txt

ToDo List

Development

# Lint with flake8
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

# Check with mypy
mypy ./scrab
mypy ./tests

# Run tests
pytest

Publish to PyPI:

rm -rf dist/*
python setup.py sdist bdist_wheel
twine upload dist/*

License

This project is licensed under the MIT License.

Keywords

scrab scraper crawler extractor converter web content html text

FAQs

What is scrab?

Is scrab well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

scrab

scrab - Fuzzy content scraper

Installing

Usage

ToDo List

Development

License

Keywords

Related posts

React Team Updates CRA Migration Guidance After Community Pushback

Ransomware in 2024: Record-Low Payment Rate Signals Changing Economics of Cybercrime