![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
image:https://github.com/despeck/despeck/workflows/ubuntu/badge.svg["Build status (Ubuntu)", link="https://github.com/despeck/despeck/actions?workflow=ubuntu"] image:https://badge.fury.io/rb/despeck.svg["Gem Version", link="https://badge.fury.io/rb/despeck"]
= Despeck
Remove unwanted stamps or watermarks from scanned images
despeck
is a Ruby gem that helps you remove unwanted stamps or watermarks from
scanned images/PDFs, primarily prior to OCR.
Its image processing operations are based on libvips
via the
https://github.com/jcupitt/ruby-vips[ruby-vips] Ruby-bindings.
It can be used to:
Assumptions on input:
GREEN SQUARE PATTERN
", for all
the pages that contain this mark, despeck
will attempt to detect this pattern
and remove them.== Installation
=== General
Install gem manually:
Or add it to your Gemfile
:
and then run bundle install
=== OCR functions
To be able to extract text via despeck ocr
command, you'll need to install:
==== MacOS
To install Tesseract itself (with all languages pre-installed):
Or you can install Tesseract with some languages manually:
To install ImageMagick:
The full list of languages trained data can be found here (note, they're different for different Tesseract versions):
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files#data-files-for-version-304305
==== Ubuntu/Debian
==== FAQ
I'm getting the following error:
'convert': No such file or directory @ rb_sysopen - /var/folders/2t/xmdrn2sd2lv2w49dv0zw9_q00000gp/T/1521805124.661379908.txt (RTesseract::ConversionError)
This error means you don't have the appropriate Tesseract language installed (or Tesseract is unable to find that language). See language installation instructions above.
== Usage (Command Line)
Getting actual help:
=== All-in-one (aka Despeck)
If you need to remove watermark and extract OCR text, you may want to use:
This is the same as two following commands:
=== Remove watermark
To remove watermark:
With the command above, Despeck will try to find the watermark colour, and apply best filter settings to remove the watermark. It may be wrong, so you can pass several parameters to help Despeck with that:
A lit of available options:
--color 00FF00
- to say watermark is ~ green.--sensitivity 120
- increases sensitivity (if with default 100 watermark is still visible).--black-const -60
- by default, Despeck tries to improve text quality by increasing black by -110. This may be too much for you, so you can reduce that number.--add-contrast
- disabled by default, increases output image's contrast.--accurate
- disabled by default. Applies filters to the area with watermark only, preserving the rest of the image untouched.--debug
- shows debug information during command execution.==== "Accurate" option
By default, despeck
applies colour filters to the entire image and tries to improve the quality of the image by increasing contrast and cleaning the image.
It may decrease the original image quality in some cases, so there is the --accurate
option, which forces despeck
to apply despeck
filters only to the area where watermark was found, leaving the rest of the image intact.
For example:
===== Original image
image::readme_images/watermarked.jpg[Original image]
===== Despecked with default options
image::readme_images/defaults.jpg[Despecked with defaults]
===== Despecked with --accurate option
image::readme_images/accurate.jpg[Despecked with --accurate option]
== Usage
(still under development)
wr = Despeck::WatermarkRemover.new(black_const: -90, resize: 0.01)
image = Vips::Image.new_from_file("/path/to/image.jpg")
output_image = wr.remove_watermark(image)
FAQs
Unknown package
We found that despeck demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.