The HDX Python Scraper Library is designed to enable you to easily develop code that
assembles data from one or more tabular sources that can be csv, xls, xlsx or JSON. It
uses a YAML file that specifies for each source what needs to be read and allows some
transformations to be performed on the data. The output is written to JSON, Google sheets
and/or Excel and includes the addition of
Humanitarian Exchange Language (HXL) hashtags specified in
the YAML file. Custom Python scrapers can also be written that conform to a defined
specification and the framework handles the execution of both configurable and custom
scrapers.
For more information, please read the
documentation.
This library is part of the
Humanitarian Data Exchange (HDX) project. If you have
humanitarian related data, please upload your datasets to HDX.