Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

locr

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

locr

Tools for fetching OCRed text of Library of Congress items.

  • 0.4.4
  • PyPI
  • Socket score

Maintainers
1

README

This fetches full text from Library of Congress OCR files for LOC items. It returns the text, when found, and None otherwise.

Usage

It can take as input either a result item from a JSON API response or the URL of an item:

from locr import Fetcher

# From item or resource URL
Fetcher.full_text_from_url('https://www.loc.gov/resource/mss85943.001811/')

# From search result
# See https://libraryofcongress.github.io/data-exploration/requests.html
url = 'https://www.loc.gov/search/?fo=json&fa=subject:cats'
response = requests.get(url)
Fetcher(response['results'][0]).full_text()

Note that the above example is not guaranteed to work. In particular, not all objects have online text available.

Fetcher may raise the following exceptions:

  • ObjectNotOnline: when the object does not have any online formats.
  • AmbiguousText: when multiple fulltext options are found.
  • UnknownFormat: when locr is not sure how to handle the fulltext link's filetype.

If you encounter these exceptions, kindly file an issue or open a PR about the newly discovered edge case. Thanks.

Why LOCR?

The Library of Congress has put OCRed full text online for many of its items. However:

  • the API does not in general return the URLs to these items
  • OCRed text exists on different servers, with different URL formats; there is not one single way to construct the relevant URL for an item

While full text is easy to retrieve via the web site for a single item, perhaps you, like me, would like to fetch it programmatically.

Development

This package has a humiliating lack of tests, and I have done nothing to verify appropriate versions for dependencies. It really can use your help. PRs welcome.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc