![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Web crawling utility for downloading files from web pages.
This assumes you have Python 3.10+ installed and pip3
is on
your path:
~$ pip3 install the-crawler
...
~$ the-crawler -h
usage: the-crawler [-h] [--recurse] [--output-directory OUTPUT_DIRECTORY] [--extensions EXTENSIONS [EXTENSIONS ...]] [--max-workers MAX_WORKERS] base_url
Crawls given url for content
positional arguments:
base_url
options:
-h, --help show this help message and exit
--recurse, -r
--output-directory OUTPUT_DIRECTORY, -o OUTPUT_DIRECTORY
--extensions EXTENSIONS [EXTENSIONS ...], -e EXTENSIONS [EXTENSIONS ...]
--max-workers MAX_WORKERS
This assumes you have git, Python 3.10+, and poetry installed already.
~$ git clone git@gitlab.com:woodforsheep/the-crawler.git
...
~$ cd the-crawler
the-crawler$ poetry install
...
the-crawler$ poetry run the-crawler -h
usage: the-crawler [-h] [--quiet] [--verbose] [--collect-only] [--force-collection] [--recurse]
[--output-directory OUTPUT_DIRECTORY] [--extensions [EXTENSIONS]]
[--max-workers MAX_WORKERS]
base_url
Crawls given url for content
positional arguments:
base_url
options:
-h, --help show this help message and exit
--quiet Changes the console log level from INFO to WARNING; defers to --verbose
--verbose Changes the console log level from INFO to DEBUG; takes precedence over
--quiet
--collect-only Stops after collecting links to be downloaded; useful for checking the
cache before continuing
--force-collection Forces recollection of links, even if the cache file is present
--recurse, -r If specified, will follow links to child pages and search them for
content
--output-directory OUTPUT_DIRECTORY, -o OUTPUT_DIRECTORY
The location to store the downloaded content; must already exist
--extensions [EXTENSIONS], -e [EXTENSIONS]
If specified, will restrict the types of files downloaded to those
matching the extensions provided; case-insensitive
--max-workers MAX_WORKERS
The maximum number of parallel downloads to support; defaults to
os.cpu_count()
FAQs
For crawling web file explorers for content
We found that the-crawler demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.