![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
This script will try to add OCR data, and possibly a preview image, to notes in Joplin. It will only work on notes with a tag, which must be specified on startup.
When making the switch from Evernote to Joplin, there was one thing missing for me. Evernote uses OCR on attachments, which makes it possible to do a full text search. This is something that is lacking in Joplin. The excellent rest_uploader has the ability to upload files to Joplin and add OCR data as an HTML comment in the note. For existing Joplin notes, either created manually, by the hotfolder plugin or via the Evernote import, the OCR data will not be present.
Just as notes created by the rest_uploader, I would like to have a nice preview of my PDF documents I've imported from Evernote
WARNING This script has the potential to mess up all your Joplin notes. But you are not worried, since you make regular backups of your Joplin notes already. Right??
For a quick installation, there is a docker version available. See the examples section of this readme.
If you don't like docker, there is the manual installation option. If you already installed rest_uploader you probably almost setup already. All that is left is to install this script and setup the environment variables.
pip install ocr-joplin-notes
Requirements:
JOPLIN_TOKEN
. You can find your token in the webclipper settings.TESSDATA_PREFIX
On Ubuntu, mine is set to /usr/share/tesseract-ocr/4.00/tessdata
Since this script will update your notes, several modes have been added, so the user of this script can verify if the detection in this script works as should be expected.
TAG_NOTES
Tags all notes in Joplin based on their possible source and markup type.
Apart from adding a tag to every note in Joplin, it does not update any notes.
The tags it adds will not be used by this script itself and can be removed.
It can however be a fast way to tag all your notes. You'll see the need for tags in the other modes.
The format of these tags is: ojn_<markup|html>_
Parameters:
--tag
When supplied, it will only process notes which have this tag. Default it will process all notes in Joplin.DRY_RUN
Report what will happen if this script would process the notes with a selected tag. It will report back via the output on the screen, as well as add tags to your notes. Every note will be tagged with one of the following tags:
Parameters:
--tag
="my_tag"
Only notes having the specified tag will be scanned.--exclude_tags
="other_tag"
Notes having the specified tag will be ignored. This Parameter can be set multiple times to exclude multiple tags.--add-preview
=on
|off
When on
, adds a preview image for every PDF found in an HTML note. Default is on
Markdown notes already have a PDF preview in the client.--autorotation
=on
|off
When on
tries to fix any skewed images. Default is on
--language
=<3 letter code>
.
The language to use for the OCR processing. Default in eng
Note OCR is quite a CPU intensive process, and it might take some time for large quantities of files to get processed.
FULL_RUN
WARNING: This mode will make changes to your notes. Remember those backups I mentioned before.
The FULL_RUN
mode will do the same as the DRY_RUN
mode, but this time, it will make the changes to your Joplin notes.
This is mode you are looking for.
python3 -m ocr_joplin_notes.cli --mode=TAG_NOTES
python3 -m ocr_joplin_notes.cli --mode=DRY_RUN --tag=my_notes_test --language=nld --add-previews=off
python3 -m ocr_joplin_notes.cli --mode=FULL_RUN --tag=my_notes_for_testing --language=get
python3 -m ocr_joplin_notes.cli --mode=FULL_RUN --tag=special_notes --exclude_tags=technical --exclude_tags=art
There is a docker image available, for those who do not want to install all the Python dependencies.
docker run --env-file ./docker-env --network="host" plamola/ocr-joplin-notes:0.3.11 python -m ocr_joplin_notes.cli --mode=TAG_NOTES
For this to work, you need to save you Joplin token saved in a file. In the example, the file is called docker-env
.
The contents of this file will look something like:
JOPLIN_TOKEN=f11db775b76e0f80ab39a932s3f79298d080d
Note: the --network="host"
parameter, to allow for access to localhost. This only seems to work on Linux systems.
For Windows and Mac users, the workaround is to add the JOPLIN_SERVER
environment variable in the docker-env file:
JOPLIN_SERVER=http://host.docker.internal:41184
# It also works with JOPLIN_SERVER=http://gateway.docker.internal:41184 as the docs indicate
Optional Max image pixels environment variable The OCR process for PDFs will experience errors and stop if your PDFs have large images contained within the file. There is a tunable parameter that allows you to process larger files without errors in exchange for higher memory usage. You can tune this parameter by setting the environment variable MAX_IMAGE_PIXELS in your docker-env file. The default max is 178956970. (this default size can cause errors with PDFs as small as 3MB, so if you experience these errors, you can experiment with increasing the value as seen in the example below)
MAX_IMAGE_PIXELS=400000000
FAQs
Add OCR data to Joplin notes
We found that ocr-joplin-notes demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.