rescribe.xyz/bookpipeline package
This package contains various tools and functions for the OCR of
books, with a focus on distributed OCR using short-lived virtual
servers.
This is a Go package, and can be installed in the standard go way,
by running go get rescribe.xyz/bookpipeline/...
and documentation
can be read with the go doc
command or online at
https://pkg.go.dev/rescribe.xyz/bookpipeline.
If you just want to install and use the commands, you can get the
package with git clone https://git.rescribe.xyz/bookpipeline
, and
then install them with go install ./...
from within the
bookpipeline
directory.
Commands
The commands in the cmd/ directory are at the heart of this
package. For more details on their usage, use go doc
or read
doc.go in the package repository.
The key commands for the virtual server side are:
- bookpipeline : processes items from queues, doing preprocessing,
ocr and postprocessing, and moving items on to
the next queue step on completion. this is the
core command of the package.
- booktopipeline : uploads a book to the pipeline and adds it to the
appropriate queue.
- getpipelinebook : downloads the pipeline results for a book.
- lspipeline : prints useful information about the status of the
pipeline.
- mkpipeline : sets up storage buckets and queues for use by the
pipeline.
- spotme : starts up a short-lived virtual server running
bookpipeline.
There are also some commands which are more useful in a standalone
setting:
- confgraph : creates a graph showing average word confidence of
each page of hOCR in a directory
- pagegraph : creates a graph showing average confidence of each
word in a page of hOCR
- pdfbook : creates a searchable PDF from a directory of hOCR
and image files
Rescribe tool for local operation
While bookpipeline was built with cloud based operation in mind, there is also
a local mode that can be used to run OCR jobs from a single computer, with all
the benefits of preprocessing, choosing the best threshold for each image,
graph creation, PDF creation, and so on that the pipeline provides.
Several of the commands accept a -c local
flag for local operation, but now
there is also a new command, named rescribe
, that is designed to make things
much simpler for people just wanting to do some OCR on their local computer.
More information about this, including links to prebuilt executables, can be
found on our blog at https://blog.rescribe.xyz/posts/desktop-tool/.
Contributions
Any and all comments, bug reports, patches or pull requests would
be very welcomely received. Please email them to nick@rescribe.xyz.
License
This package is licensed under the GPLv3. See the LICENSE file for
more details.