Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Python library to work with ARC and WARC files, with fixes for ClueWeb09
Note: This is a fork of the original (now dead) warc repository.
Updated to handle problems with the ClueWeb09_ files.
.. _ClueWeb09: https://lemurproject.org/clueweb09/
Changes are based on this repository_ (which only supports python2)
.. _repository: https://github.com/cdegroc/warc-clueweb/blob/clueweb09/warc/warc.py
WARC (Web ARChive) is a file format for storing web crawls.
This warc
library makes it very easy to work with WARC files.::
import warc
with warc.open("test.warc") as f:
for record in f:
print(record['WARC-Target-URI'], record['Content-Length'])
And WET files.::
import warc
with warc.open("test.warc.wet") as f:
for record in f:
print(record['WARC-Target-URI'], record['Content-Length'])
The documentation of the warc library is available at http://warc.readthedocs.org/.
Apart from the install from pip, which will not work for this warc3 version, the interface as described there is unchanged.
This software is licensed under GPL v2. See LICENSE_ file for details.
.. LICENSE: http://github.com/internetarchive/warc/blob/master/LICENSE
Original Python2 Versions:
Python3 Port:
Modification
0.2.5 replace utf8 errors in headers
0.2.4 support ClueWeb09
0.2.3 Support seeking in WARC/WET
0.2.2 Allow WET parse
older... see https://github.com/internetarchive/warc
FAQs
Python library to work with ARC and WARC files, with fixes for ClueWeb09
We found that warc3-wet-clueweb09 demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.