Research
Security News
Malicious npm Package Targets Solana Developers and Hijacks Funds
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Feedsearch Crawler is a Python library for searching websites for RSS, Atom, and JSON feeds.
It is a continuation of my work on Feedsearch, which is itself a continuation of the work done by Dan Foreman-Mackey on Feedfinder2, which in turn is based on feedfinder - originally written by Mark Pilgrim and subsequently maintained by Aaron Swartz until his untimely death.
Feedsearch Crawler differs with all of the above in that it is now built as an asynchronous Web crawler for Python 3.7 and above, using asyncio and aiohttp, to allow much more rapid scanning of possible feed urls.
An implementation using this library to provide a public Feed Search API is available at https://feedsearch.dev
Pull requests and suggestions are welcome.
The library is available on PyPI:
pip install feedsearch-crawler
The library requires Python 3.7+.
Feedsearch Crawler is called with the single function search
:
>>> from feedsearch_crawler import search
>>> feeds = search('xkcd.com')
>>> feeds
[FeedInfo('https://xkcd.com/rss.xml'), FeedInfo('https://xkcd.com/atom.xml')]
>>> feeds[0].url
URL('https://xkcd.com/rss.xml')
>>> str(feeds[0].url)
'https://xkcd.com/rss.xml'
>>> feeds[0].serialize()
{'url': 'https://xkcd.com/rss.xml', 'title': 'xkcd.com', 'version': 'rss20', 'score': 24, 'hubs': [], 'description': 'xkcd.com: A webcomic of romance and math humor.', 'is_push': False, 'self_url': '', 'favicon': 'https://xkcd.com/s/919f27.ico', 'content_type': 'text/xml; charset=UTF-8', 'bozo': 0, 'site_url': 'https://xkcd.com/', 'site_name': 'xkcd: Chernobyl', 'favicon_data_uri': '', 'content_length': 2847}
If you are already running in an asyncio event loop, then you can import and await search_async
instead. The search
function is only a wrapper that runs search_async
in a new asyncio event loop.
from feedsearch_crawler import search_async
feeds = await search_async('xkcd.com')
A search will always return a list of FeedInfo objects, each of which will always have a url property, which is a URL object that can be decoded to a string with str(url)
.
The returned FeedInfo are sorted by the score value from highest to lowest, with a higher score theoretically indicating a more relevant feed compared to the original URL provided. A FeedInfo can also be serialized to a JSON compatible dictionary by calling it's .serialize()
method.
The crawl logs can be accessed with:
import logging
logger = logging.getLogger("feedsearch_crawler")
Feedsearch Crawler also provides a handy function to output the returned feeds as an OPML subscription list, encoded as a UTF-8 bytestring.
from feedsearch_crawler import output_opml
output_opml(feeds).decode()
search
and search_async
take the following arguments:
search(
url: Union[URL, str, List[Union[URL, str]]],
crawl_hosts: bool=True,
try_urls: Union[List[str], bool]=False,
concurrency: int=10,
total_timeout: Union[float, aiohttp.ClientTimeout]=10,
request_timeout: Union[float, aiohttp.ClientTimeout]=3,
user_agent: str="Feedsearch Bot",
max_content_length: int=1024 * 1024 * 10,
max_depth: int=10,
headers: dict={"X-Custom-Header": "Custom Header"},
favicon_data_uri: bool=True,
delay: float=0
)
In addition to the url, FeedInfo objects may have the following values:
FAQs
Search sites for RSS, Atom, and JSON feeds
We found that feedsearch-crawler demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Security News
Research
Socket researchers have discovered malicious npm packages targeting crypto developers, stealing credentials and wallet data using spyware delivered through typosquats of popular cryptographic libraries.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.