Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

Sign in Demo Install

What is Socket?

Socket for GitHub

Detect suspicious packages in PRs

Socket CLI

Use Socket from the command line

Socket Web Extension

Use Socket from your browser

Socket Dependency Search

Find any package for your project

Integrations

All Integrations

Ticketing & Messaging

Package Managers

Docs

Want to read all the docs? Start here

Customers

Check out our customer stories

Blog

Keep up to date with all the news

Changelog

Latest updates and enhancements

FAQ

Answers to common questions

Package Alerts

Learn about all Socket alerts

Glossary

Open source and security terms

Blog

Application Security

Customer Stories

About

Why we built Socket

Love

See why developers love Socket

Careers

Join our team

Investors

Learn about our investors

Security

Our security practices

Why Socket?

Socket vs Dependabot

Socket vs Semgrep

Socket vs EndorLabs

Socket for Open Source Security

Achievements

Fortune Cyber 60

Pricing Love Docs

Sign in Demo Install

maven
Categories
Server
Text Processing

Text Processing

org.dkpro.similarity:dkpro-similarity-algorithms-sound-asl

DKPro Similarity is an open source framework for text similarity. Our goal is to provide a comprehensive repository of text similarity measures which are implemented using standardized interfaces. The framework is designed to complement DKPro Core, a collection of software components for natural language processing based on the Apache UIMA framework.

2.3.0 • 7 years ago

org.dkpro.similarity:dkpro-similarity-algorithms-core-asl

DKPro Similarity is an open source framework for text similarity. Our goal is to provide a comprehensive repository of text similarity measures which are implemented using standardized interfaces. The framework is designed to complement DKPro Core, a collection of software components for natural language processing based on the Apache UIMA framework.

2.3.0 • 7 years ago

org.dkpro.similarity:dkpro-similarity-algorithms-lexical-gpl

DKPro Similarity is an open source framework for text similarity. Our goal is to provide a comprehensive repository of text similarity measures which are implemented using standardized interfaces. The framework is designed to complement DKPro Core, a collection of software components for natural language processing based on the Apache UIMA framework.

2.2.0 • 9 years ago

org.dkpro.similarity:dkpro-similarity-algorithms-wikipedia-asl

DKPro Similarity is an open source framework for text similarity. Our goal is to provide a comprehensive repository of text similarity measures which are implemented using standardized interfaces. The framework is designed to complement DKPro Core, a collection of software components for natural language processing based on the Apache UIMA framework.

2.3.0 • 7 years ago

de.julielab:jcore-mallet-2.0.9

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

2.1.2 • 5 years ago

org.kefirsf:kefirbb

KefirBB is a Java-library for text processing. Initially it was developed for BB2HTML translation. But flexible configuration allows to use it in different cases. For example for parsing Markdown, Textile, and for HTML filtration.

1.5 • 8 years ago

org.apache.stanbol:org.apache.stanbol.launchers.bundlelists.languageextras.smartcn

Provides modules that allow basic language support for Chinese using the Solr/Lucene smartcn analyzer. This includes a (1) Bundle providing the Solr Analyzer; (2) an NLP processing Engine that detects Sentences and Tokenizes Chinese Text and (3) an LabelTokenizer needed to match tokens of the analyzed text with the labels of Entities in the matched vocabularies.

1.0.0 • 8 years ago

org.apache.stanbol:org.apache.stanbol.launchers.bundlelists.languageextras.kuromoji

Provides modules that bring language support for Japanese using the Solr/Lucene kuromoji analyzer. This includes a (1) Bundle providing the Solr Analyzer; (2) an NLP processing Engine that Tokenizes, detects sentences, POS taggs, extracts Named Entities and Lemmatizes Japanese text (3) an LabelTokenizer needed to match tokens of the analyzed text with the labels of Entities in the matched vocabularies.

1.0.0 • 8 years ago

com.groupdocs:groupdocs-editor

GroupDocs.Editor for Java is a powerful document editing API using HTML. API can be used with any external, opensource or paid HTML editor. Editor API will process to load documents, convert it to HTML, provide HTML to external UI and then save HTML to original document after manipulation. It can also be used to generate different PDF files, Microsoft Word (DOC, DOCX), Excel spreadsheets (XLS, XSLSX), PowerPoint presentations (PPT, PPTX) and TXT documents. Manipulate Using HTML: Load Document Edit content using HTML Edit styles Perform Editor operations Convert back to supported file Document Editor is a computer program for editing HTML, the markup of a webpage. Although the HTML markup of a web page can be written with any text editor, specialized HTML editors can offer convenience and added functionality. For example, many HTML editors handle not only HTML, but also related technologies such as CSS, XML and JavaScript or ECMAScript. In some cases they also manage communication with remote web servers via FTP and WebDAV, and version control systems such as Subversion or Git. Many word processing, graphic design and page layout programs that are not dedicated to web design, such as Microsoft Word or Quark XPress, also have the ability to function as HTML editors.

17.9 • 7 years ago

de.unistuttgart.ims:de.unistuttgart.ims.drama.api

Several component for processing dramatic texts (theatre plays) with Apache UIMA.

1.0.0 • 7 years ago

org.dkpro.similarity:dkpro-similarity-uima-vsm-asl

DKPro Similarity is an open source framework for text similarity. Our goal is to provide a comprehensive repository of text similarity measures which are implemented using standardized interfaces. The framework is designed to complement DKPro Core, a collection of software components for natural language processing based on the Apache UIMA framework.

2.3.0 • 7 years ago

de.unistuttgart.ims:de.unistuttgart.ims.drama.io.core

Several component for processing dramatic texts (theatre plays) with Apache UIMA.

1.0.0 • 7 years ago

com.github.nicosensei:text-batch

Common framework for large text files processing tools.

1.2.2 • 11 years ago

com.groupdocs:viewer

GroupDocs.Viewer is an online document viewer that lets you read documents in your browser, regardless of whether you have the software that they were created in. You can view many types to word processing documents (DOC, DOCX, TXT, RTF, ODT), presentations (PPT, PPTX), spreadsheets (XLS, XLSX), portable files (PDF), and image files (JPG, BMP, GIF, TIFF). For each file, you get a high-fidelity rendering, showing the document just as it would if you opened it in the software it was created in. Layout and formatting is retained and you see an exact copy of the original. GroupDocs.Viewer lets you really read the document. You can search text documents, copy text and even embed the document РІР‚вЂњ GroupDocs.Viewer and all - in a web page. You can print or download the file from GroupDocs.Viewer if you need to work with it offline.

2.1.0 • 11 years ago

org.freecompany.util:util-text

Library for test processing

0.3.5 • 16 years ago

com.goikosoft.textprocessor:textprocessor

Text processor. A library to process texts with a block-like languaje

0.0.7 • 4 years ago

pl.chilldev.commons:commons-text

Text processing utilities.

0.4.4 • 6 years ago

org.dkpro.similarity:dkpro-similarity

DKPro Similarity is an open source framework for text similarity. Our goal is to provide a comprehensive repository of text similarity measures which are implemented using standardized interfaces. The framework is designed to complement DKPro Core, a collection of software components for natural language processing based on the Apache UIMA framework.

2.3.0 • 7 years ago

edu.umn.biomedicus:biomedicus

The BioMedical Information Collection and Understanding System (BioMedICUS) is a system for large-scale text analysis and processing of biomedical and clinical reports.

2.1.0 • 6 years ago

de.unistuttgart.ims:de.unistuttgart.ims.drama.core

Several component for processing dramatic texts (theatre plays) with Apache UIMA.

1.0.0 • 7 years ago

com.io7m.changelog:io7m-changelog-text

Software changelogs (Plain text processing)

3.0.3 • 8 years ago

com.io7m.changelog:com.io7m.changelog.text

Software changelogs (Plain text processing)

3.1.0 • 7 years ago

net.sf.czsem:czsem-common

Utility functions for text processing.

4.0.3 • last year

net.sf.czsem:czsem-gate-plugin

Utility functions for text processing.

4.0.3 • last year

de.unistuttgart.ims:de.unistuttgart.ims.drama.util

Several component for processing dramatic texts (theatre plays) with Apache UIMA.

1.0.0 • 7 years ago

com.github.sanskrit-coders:sanskritnlp

A collection of scala and java classes for some basic natural language processing (NLP) for the Sanskrit language, contributed by the open source SanskritNLP project and friends. Some notable facilities: * Transliterate text from one script or encoding scheme to another. * Deal with babylon dictionaries. * Use bots to write to wiki projects (wiktionary, wikisource etc..). * Basic metre identification. * Some grammar simulation. Contributions and suggestions are invited at https://github.com/sanskrit-coders/sanskritnlpjava . (Sister projects there may also be of interest.)

1.2 • 8 years ago

pl.wrzasq.commons:commons-text

Text processing utilities.

2.0.10 • 3 years ago

com.javax0.jamal:multi

Jamal macro library to process text files

1.0.0 • 6 years ago

com.javax0.jamal:jamal-parent

Jamal macro library to process text files

2.8.1 • 3 months ago

edu.nyu:jet

Information extraction is the process of identifying specified classes of entities, relations, and events in natural language text – creating structured data from unstructured input. JET, the Java Extraction Toolkit, developed at New York University over the past fifteen years, provides a rich set of tools for research and education in information extraction from English text. These include standard language processing tools such as a tokenizer, sentence segmenter, part-of-speech tagger, name tagger, regular-expression pattern matcher, and dependency parser. Also provided are relation and event extractors based on the specifications of the U.S. Government's ACE [Automatic Content Extraction] program. The program is provided under an Apache 2.0 license.

1.9.0 • 8 years ago

io.wcm:io.wcm.handler.richtext

Rich text processing and markup generation.

2.0.0 • 10 months ago

com.scandit.datacapture:label

ScanditLabelCapture coordinates the process of simultaneously capturing data contained in multiple barcodes and text.

6.28.1 • last month

net.sourceforge.xbis:xbis

XBIS is an encoding format for XML documents that is fully convertible to and from text, with information set equivalence between the original document text and regenerated document text. It's intended for use in transmitting XML documents between application components, and is therefore designed for processing speed. The current Java language implementation offers several times the performance of SAX2 parsers working from text documents across a wide range of document types and sizes, and across JVMs tested, while also providing a substantial reduction in document size for most types of XML documents.

0.9.5 • 13 years ago

org.apache.stanbol:org.apache.stanbol.enhancer.engines.lucenefstlinking

An in-memory EntityLinking engine that uses Lucenes FST (Finite State Transducer) technology. This engine is based on code provided by the Solr Text Tagger (https://github.com/OpenSextant/SolrTextTagger/) but provides a deep integration with Apache Stanbol (Datafile provider, NLP processing module and existing EntityLinking functionality).

1.0.0 • 8 years ago

com.groupdocs:groupdocs-watermark

GroupDocs.Watermark for Java is a powerful document watermarking API to add image and text watermarks. Furthermore, API works to search and remove the watermarks which were already added to the documents by other third-party softwares. The watermarks added by this API are hard to remove by any third-party tools. It is straight-forward and self-descriptive for integration into the custom applications. The most notable features are: - Add text and image watermarks into documents and images - Search for possible watermarks in documents and remove them - Support various document formats: Pdf; MS Office: Word, Excel, PowerPoint, Visio - Support various image formats: png, bmp, jpeg, jpeg2000, gif, tiff, webp (including multiframe gif and tiff) - Process documents and images attached to stored email messages (msg, oft, eml, emlx formats are supported) - Add watermarks to images inside documents of all supported formats - Two ways of watermark adding/removing are supported: using generalized approach and working with supported format specifics

18.3 • 7 years ago

edu.utah.bmi.nlp:rush

RuSH is an efficient, reliable, and easy adaptable rule-based sentence segmentation solution. It is specifically designed to handle the telegraphic written text in clinical note. It leverages a nested hash table to execute simultaneous rule processing, which reduces the impact of the rule-base growth on execution time and eliminates the effect of rule order on accuracy. If you wish to cite RuSH in a publication, please use: Jianlin Shi ; Danielle Mowery ; Kristina M. Doing-Harris ; John F. Hurdle.RuSH: a Rule-based Segmentation Tool Using Hashing for Extremely Accurate Sentence Segmentation of Clinical Text. AMIA Annu Symp Proc. 2016: 1587. The full text can be found at: https://knowledge.amia.org/amia-63300-1.3360278/t005-1.3362920/f005-1.3362921/2495498-1.3363244/2495498-1.3363247?timeStamp=1479743941616 This version allows defining section scopes for sentence segmentation.

1.3.1 • 7 years ago

net.sf.czsem:fs-query

Utility functions for text processing.

4.0.3 • last year

com.avast:big-map_2.10

In some data processing tasks we need to use huge maps or sets that are bigger than available JVM heap space or they are loading too slow to standard Java or Scala Maps. We use TSV format (text file with tab separated columns) for persist this kind of Maps or Sets. Some columns are used as a key and rest of columns as a value. Idea of this library is simple. We can prepare these maps once (sort by key), store it to file and then use it as memory mapped file. Searching key in sorted file has log(n) complexity. If more processes uses the same memory mapped file, it exists in memory just once (on Linux). This file can be loaded lazy by OS.

1.1 • 8 years ago

com.avast:big-map_2.9

In some data processing tasks we need to use huge maps or sets that are bigger than available JVM heap space or they are loading too slow to standard Java or Scala Maps. We use TSV format (text file with tab separated columns) for persist this kind of Maps or Sets. Some columns are used as a key and rest of columns as a value. Idea of this library is simple. We can prepare these maps once (sort by key), store it to file and then use it as memory mapped file. Searching key in sorted file has log(n) complexity. If more processes uses the same memory mapped file, it exists in memory just once (on Linux). This file can be loaded lazy by OS.

1.0 • 10 years ago

com.avast:big-map_2.11

In some data processing tasks we need to use huge maps or sets that are bigger than available JVM heap space or they are loading too slow to standard Java or Scala Maps. We use TSV format (text file with tab separated columns) for persist this kind of Maps or Sets. Some columns are used as a key and rest of columns as a value. Idea of this library is simple. We can prepare these maps once (sort by key), store it to file and then use it as memory mapped file. Searching key in sorted file has log(n) complexity. If more processes uses the same memory mapped file, it exists in memory just once (on Linux). This file can be loaded lazy by OS.

1.1 • 8 years ago

de.unistuttgart.ims:de.unistuttgart.ims.drama.meta

Several component for processing dramatic texts (theatre plays) with Apache UIMA.

0.5.0 • 8 years ago

de.unistuttgart.ims:de.unistuttgart.ims.drama.examples

Several component for processing dramatic texts (theatre plays) with Apache UIMA.

0.5.0 • 8 years ago

de.unistuttgart.ims:de.unistuttgart.ims.drama.main

Several component for processing dramatic texts (theatre plays) with Apache UIMA.

1.0.0 • 7 years ago

de.unistuttgart.ims:de.unistuttgart.ims.drama.core.cr

Several component for processing dramatic texts (theatre plays) with Apache UIMA.

1.0.0 • 7 years ago

de.unistuttgart.ims:de.unistuttgart.ims.drama.io.gutenbergde

Several component for processing dramatic texts (theatre plays) with Apache UIMA.

0.5.1 • 7 years ago

de.unistuttgart.ims:de.unistuttgart.ims.drama

Several component for processing dramatic texts (theatre plays) with Apache UIMA.

1.0.0 • 7 years ago

edu.pitt.dbmi.nlp:NobleTools

Noble Tools Suite, is a set of Natural Language Processing (NLP) tools and Application Programming Interfaces (API) written in Java for interfacing with ontologies, auto coding text and extracting information from free test. The Noble Tools suite also includes a generic ontology API for interfacing with Web Ontology Language (OWL) files, OBO and BioPortal ontologies and a number of support utilities and methods useful for NLP (e.g. string normalization, ngram and stemming)

1.0 • 8 years ago

dk.tbsalling:aisutils

Utilities for processing AIS messages; e.g. tracking, free-text filter expressions, archiving, and more.

1.1.4 • 2 years ago

com.expleague:commons

Utilities including math, charsequence based text processing, sequences etc.

1.4.9 • 6 years ago

com.github.chen0040:java-data-text

Java implementation of text processing such as stemmers

1.0.3 • 8 years ago

Product

Package Alerts
Integrations
Docs
Pricing
FAQ
Roadmap
Changelog

About

About
Love
Blog
Glossary
Discord Community
CareersHiring
Send Feedback
Contact Us
System Status

Packages

npm

Directory
Explore
Random Package
Most Popular
Top Maintainers
Removed Packages

Go

Directory
Explore
Random Package

Maven

Directory
Explore
Random Package

PyPI

Directory
Explore
Random Package

Rubygems

Directory
Explore
Random Package

Stay in touch

Get open source security insights delivered straight into your inbox.

Enter your email

Terms
Privacy
Security

Made with ⚡️ by Socket Inc