Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

elasticsearch-synonym-toolkit

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

elasticsearch-synonym-toolkit

Toolkit for Elasticsearch Synonym files.

  • 0.2.3
  • PyPI
  • Socket score

Maintainers
1

Elasticsearch Synonyms

Build Status PyPI Version

This repository contains a curated dataset of synonyms in Solr Format. These synonyms can be used for Elasticsearch Synonym Token Filter configuration.

Additional helper tools in this repository:

  • synlint: Commandline tool to lint and validate the synonym files.
  • synonyms.sublime-syntax: Syntax highlighting file for Sublime Text 3.

If you're using Elasticssearch with Django, you might find dj-elasticsearch-flex useful.

Why?

Trying to configure Synonyms in Elasticsearch, I found that docs for it are surprisingly scattered. The docs that are available do not do much justice either and miss out many corner cases.

For instance, an incorrect Solr mapping: hello, world, would be happily added in index configuration. However, as soon as you'd try to re-open the index, you'd get a malform_input_exception (discussion thread).

This repository solves such problems by with a linter tool that can be used to validate the synonym files beforehand.

Datasets

The synonym files in data/ can be used directly in elasticsearch configuration.

Following datasets are currently available:

  • be-ae: British English and American English Spellings. From AVKO.org.

Installation

If you want to use the synlint tool, install the package from PIP using:

pip install elasticsearch-synonym-toolkit

The Python Package is installed as es_synonyms. This will also install a linter tool, es-synlint. Use it with:

es-synlint [synonymfile]

Usage

In most cases, you'd want to use this module as a helper for loading validated synonyms from a file or a url:

from es_synonyms import load_synonyms

# Load synonym file at some URL:
be_ae_syns = load_synonyms('https://to.noop.pw/2sI9x4s')
# Or, from filesystem:
other_syns = load_synonyms('data/be-ae.synonyms')

Configuring Synonym Tokenfilter with Elasticsearch DSL Py, is very easy, too:

from elasticsearch_dsl import analyzer, token_filter

be_ae_syns = load_synonyms('https://to.noop.pw/2sI9x4s')

# Create a tokenfilter
brit_spelling_tokenfilter = token_filter(
  'my_tokenfilter',     # Any name for the filter
  'synonym',            # Synonym filter type
  synonyms=be_ae_syns   # Synonyms mapping will be inlined
)
# Create analyzer
brit_english_analyzer = analyzer(
  'my_analyzer',
  tokenizer='standard',
  filter=[
    'lowercase',
    brit_spelling_tokenfilter
  ])

To use the underlying linter, you can import SynLint class.

Development

  • Clone this repository.
  • Install package dependencies via pip with: pip install -r requirements.txt.
  • To run tests:
./panda test:all

License

The tools and codes are licensed under MIT. The datasets are used under fair use and are derivative of the original sources.

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc