A .NET Standard web crawling library similar to WebMagic and Scrapy. It is a lightweight ,efficient and fast high-level web crawling & scraping framework for .NET
A .NET Standard web crawling library similar to WebMagic and Scrapy. It is a lightweight ,efficient and fast high-level web crawling & scraping framework for .NET
Deprecated as there's new maintainer for original HAP project. Please check the new repo at https://github.com/zzzprojects/html-agility-pack. This is a port of HtmlAgilityPack library created by Simon Mourrier and Jeff Klawiter for .NET Core platform. This NuGet package supports can be used with Universal Windows Platform, ASP.NET 5 (using .NET Core) and full .NET Framework 4.6. Original description: This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
The Crawler-Lib Engine is a general purpose workflow enabled task processor. It has evolved from a web crawler over data mining and information retrieval. It is throughput optimized and can perform thousands of tasks per second on standard hardware. Due to its workflow capabilities it allows to structure and parallelize even complex kind of work. Please visit the project page for the complete view of the Crawler-Lib Engine. A license for the Anonymous Edition is included in the package. A license for the more powerful free Community Edition can be generated on the project page. A unrestricted license is available too.
A .NET Standard web crawling library similar to WebMagic and Scrapy. It is a lightweight ,efficient and fast high-level web crawling & scraping framework for .NET
HTTP, HTTPS, FTP, S3, Azure, Kvpbase, and filesystem crawlers for Komodo. Please either install Komodo.Daemon to integrate search within your application, or Komodo.Server to run a standalone server. Komodo is an information search, metadata, storage, and retrieval platform.
A .NET Standard web crawling library similar to WebMagic and Scrapy. It is a lightweight ,efficient and fast high-level web crawling & scraping framework for .NET
Offical package. DotnetSpider is a high performance, light weight cralwer developed by C#.
Crawler and scrapping framework which is written in C#
dcsoup is a .NET library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. This library is basically a port of jsoup, a Java HTML parser library. see also: http://jsoup.org/ API reference is available at: https://raw.githubusercontent.com/matarillo/dcsoup/master/sandcastle/Help/dcsoup.chm
Crawler-Lib Concurrency Testing allows to write unit tests with multiple threads to test the concurrency behavior of components. It has synchronization mechanisms to control the workflow of the threads and to record the execution steps. It is also possible to use it for client/server tests. It can be used in conjunction with any unit test framework or with handwritten tests.
ASP.NET Core Detection Crawler resolver components
简单、易用、高效 一个有态度的开源.Net Http请求框架!可以用制作爬虫,api请求等等。让你感受一个简易到极致的HTTP编程. 让编程更简易,代码更简洁。用法请查看:https://github.com/stulzq/HttpCode.Core
My package description.
A .NET Standard web crawling library similar to WebMagic and Scrapy. It is a lightweight ,efficient and fast high-level web crawling & scraping framework for .NET
A .NET Standard web crawling library similar to WebMagic and Scrapy. It is a lightweight ,efficient and fast high-level web crawling & scraping framework for .NET
A .NET Standard web crawling library similar to WebMagic and Scrapy. It is a lightweight ,efficient and fast high-level web crawling & scraping framework for .NET
DotnetSpider, a .NET Standard web crawling library. It is lightweight, efficient and fast high-level web crawling & scraping framework
Use this filter that prerenders a javascript-rendered page using an external service and returns the HTML to the search engine crawler for SEO.
A WebClient which is optimized for crawling.
A WebClient which is optimized for crawling.
A .NET Standard web crawling library similar to WebMagic and Scrapy. It is a lightweight ,efficient and fast high-level web crawling & scraping framework for .NET
A .NET Standard web crawling library similar to WebMagic and Scrapy. It is a lightweight ,efficient and fast high-level web crawling & scraping framework for .NET
Web scraper / crawler / spider. Supports robots protocol and user agent.
Crawler-Lib NHunspell is a spell check, hyphenation, word stemming and thesaurus library based on the Open Office spell check library Hunspell. NHunspell can use the vast amount of OpenOffice dictionaries. It is an alternative to NetSpell, GNU Aspell, ISpell, PSpell and Enchant. It wraps the native libraries for Hunspell and Hyphen and contains a fully managed version of MyThes. This version of the nuget package automatically copies the native binaries to the output directory. NHunspell is licensed under: GPL/LGPL/MPL. Free use in commercial applications is permitted according to the LGPL and MPL licenses. Your commercial application can link against the NHunspell DLLs.
Crawler
Package Description
HttpClient extension class, and the .Net Core version of the HttpHelper class. Simple and flexible crawler base class library. JsHttpClient类是 .Net Core 下的一个简单灵活的爬虫基础类库
The Crawler-Lib Engine Test Helper simplifies the test of tasks. It can be used to develop unit tests and integration tests for tasks.
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
A .NET Standard web crawling library similar to WebMagic. It is a lightweight, modular, efficient and fast high-level web crawling. scraping framework for .NET
The Crawler-Lib Service Base is a foundation for the development of Windows Services, Cloud Services and Linux Daemons.
A .net standard port of JayBizzle's CrawlerDetect project (https://github.com/JayBizzle/Crawler-Detect).
Package Description
Package Description
A tool to operate on the NuGet catalog.
crawler framework , distributed crawler extractor. try ruiji scraper --- chrome web crawler https://chrome.google.com/webstore/detail/ruiji-scraper/klhahkhllngppofpkjdlbmnglnmnbbol?hl=zh-CN&authuser=0
Offical package. DotnetSpider is a high performance, light weight cralwer developed by C#.
Facilita a intereção com Selenium através de comandos mais simples. Além de facilitar a realização de testes unitários, também é possível realizar Web Crawler ou Web Scraping
HtmlMonkey is a lightweight HTML/XML parser written in C#. It allows you to parse an HTML or XML string into a hierarchy of node objects, which can then be traversed or queried using jQuery-like syntax. In addition, the node objects can be modified or even built from scratch using code. Finally, the classes can generate the HTML or XML from the data.
HtmlMonkey is a lightweight HTML/XML parser written in C#. It allows you to parse an HTML or XML string into a hierarchy of node objects, which can then be traversed or queried using jQuery-like selectors. In addition, the node objects can be modified or even built from scratch using code. Finally, the classes can generate the HTML or XML from the data.
ShapeCrawler (formerly SlideDotNet) is a .NET library for manipulating PowerPoint presentations. It provides fluent APIs to process slides without having Microsoft Office installed. This library provides a simplified object model on top of the Open XML SDK for manipulating PowerPoint documents without any COM+ or COM interop layers.