
Research
PyPI Package Disguised as Instagram Growth Tool Harvests User Credentials
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
YACRAWLER is a simple web crawler written in Python. It is designed to be easy to use and flexible, allowing users to customize the crawling behavior and output format.
YACRAWLER is fully asynchronous, making it efficient and capable of handling large amounts of data quickly. It uses the aiohttp
library for making HTTP requests and asyncio
for managing the asynchronous tasks.
YACRAWLER is built using the Textual
library, which is a modern and powerful library for building rich text-based user interfaces in Python. It provides a simple and intuitive API for creating interactive applications with rich text and widgets.
To use YACRAWLER, you need to create an instance of the CrawlerApp
class and pass it the necessary parameters. Here is an example:
from yacrawler.core import Pipeline
from yacrawler.tui import CrawlerTuiApp
from yacrawler.utilities.aioadapter import AioRequest
from yacrawler.utilities.discoverers import SimpleRegexDiscoverer
from yacrawler.utilities.processors import parse_to_dict, write_dict_to_file
pipeline = Pipeline(
processors=[
parse_to_dict,
write_dict_to_file,
]
)
app = CrawlerTuiApp(start_url="https://blog.yurin.top", max_depth=3, max_workers=10, request_adapter=AioRequest(),
discoverer_adapter=SimpleRegexDiscoverer(), pipeline=pipeline)
Then, you can start the crawling process by calling the run
method:
python -m yacrawler YOUR_FILE.app
Pipelines are a powerful feature of YACRAWLER that allow users to customize the processing of the crawled data. Users can define their own processors and add them to the pipeline to perform tasks such as parsing the HTML content, extracting specific information, and writing the data to a file.
PROCESSORS OF PIPELINES HAVE STRONG TYPE CHECKING, SO YOU CAN'T ADD A PROCESSOR THAT DOESN'T MATCH THE TYPE OF THE DATA IT IS EXPECTED TO PROCESS.
YACRAWLER allows users to customize the request adapter to use their own HTTP client or library. The default request adapter is AioRequest
, which uses the aiohttp
library to make HTTP requests asynchronously.
YACRAWLER allows users to customize the discoverer adapter to use their own method for discovering new URLs to crawl. The default discoverer adapter is SimpleRegexDiscoverer
, which uses regular expressions to discover new URLs from the HTML content of the crawled pages.
YACRAWLER is licensed under the MIT License. See the LICENSE file for more information.
YACRAWLER is built using the following libraries:
aiohttp
: A library for making HTTP requests asynchronously.asyncio
: A library for managing asynchronous tasks.Textual
: A library for building rich text-based user interfaces in Python.aiofiles
: A library for handling file I/O operations asynchronously.Contributions are welcome! If you have any ideas for improvements or features, please open an issue or submit a pull request.
FAQs
Yet Another Internet Carwler
We found that yacrawler demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
A deceptive PyPI package posing as an Instagram growth tool collects user credentials and sends them to third-party bot services.
Product
Socket now supports pylock.toml, enabling secure, reproducible Python builds with advanced scanning and full alignment with PEP 751's new standard.
Security News
Research
Socket uncovered two npm packages that register hidden HTTP endpoints to delete all files on command.