Socket
Book a DemoInstallSign in
Socket

search-string-overvaagning

Package Overview
Dependencies
Maintainers
1
Versions
27
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

search-string-overvaagning

SearchString is a custom implementation for searching strings for km24.dk.

Source
pipPyPI
Version
0.5.3
Maintainers
1

Search String

GitHub Workflow Status (branch) PyPI - Version PyPI - Python Version

Installation

You can install search-string from PyPI:

$ pip install search-string-overvaagning

The package is supported on Python 3.9+. However, the package is only compiled using mypyc starting from Python 3.10 which makes it about twice as fast. As such, it is strongly advised to run it on Python 3.10+.

About

This package implements the search string object that is used across km24.dk for different types of surveillance.

It is used for searching a text. For something to be deemed a match, the text must match the first_str and if the second_str is not empty, the text must also match the second_str. If the not_str is not empty, the text must not match the not_str. A logical AND is used between the three conditions. The three strings can each be a collection of strings separated by semicolons wherein a match is deemed by logical OR. You can use '~' to make a word boundary. Finally, you can use !global at the end of a string to signal that that part should check globally.

Quick examples:

>>> ss = SearchString('example;hello', 'text', 'elephant', data=None)
>>> ss.match('This is an example text')
True
>>> ss.match('This text says hello')
True
>>> ss.match('This is just an example')
False

Usage

Creating Search Strings

Start by importing the SearchString class:

>>> from search_string import SearchString

Construct a new search string by supplying the first_str, second_str, not_str and any data that can be useful to refer back to later, such as an ID:

>>> ss = SearchString('first', '', '', data=2)

Optionally, you can also supply a third_str that works in the same was as first_str and not_str but has to be supplied as a keyword argument:

>>> ss = SearchString('first', '', '', data=2, third_str='third')

Matching text

If you just need to find out whether a given search string matches a text, you can use the method .match on a SearchString instance.

Often, what you want to do, is to match a collection of search strings across a list of text, e.g. sentences. You can do that the following way:

>>> from search_string import SearchString
>>> search_strings = [
...    SearchString('kan', '', 'ritzau', data=1),
...    SearchString('kan', '', 'ritzau!global', data=2)
... ]
>>> sentences = [
...    'Du kan skrive din tekst her.',
...    'Den kan bestå af flere sætninger.',
...    'Dig og Ritzau kan bestemme hvordan det skal være.',
...    'Nogle kan være lange, andre kan være korte.'
... ]
>>> res = SearchString.find_all(sentences, search_strings)
>>> res
[SearchString(kan, -, ritzau, data=1)]

For each of the matched search strings (in the above example, only one), you can extract the data and the matched text as follows:

>>> res[0].data
1
>>> res[0].matched_text
'Du kan skrive din tekst her. Den kan bestå af flere sætninger. (...) Nogle kan være lange, andre kan være korte.'
>>> res[0].matched_text_highligthed
'Du <b>kan</b> skrive din tekst her. Den <b>kan</b> bestå af flere sætninger. (...) Nogle <b>kan</b> være lange, andre <b>kan</b> være korte.'

Creating search strings that match everything

If you construct a completely empty search string, it will match everything (even the empty string), but the matched text will always be the empty string. This is to allow for "catch-all" search strings:

>>> ss = SearchString('', '', '', data=1)
>>> ss.match('')
True
>>> ss.match('Some random text')
True
>>> ss.matched_text
''

SearchStringCollection - Matching when you have many search strings

If you have a problem where you repeatedly will be matching new texts against the same collection of search strings, it is highly advised to use the SearchStringCollection which behind the scenes uses a trie for efficient search when many search strings are present. There is some initial cost in building the trie. Thus, it is recommended that you initialize the collection once and then continue to use it.

The most important method on SearchStringCollection is find_all, which takes a sentence (str) or list of sentences (list[str]) and returns the matched search strings, very similar to the familiar SearchString.find_all.

>>> from search_string import SearchString, SearchStringCollection
>>> search_strings = [
...    SearchString('kan', '', 'ritzau', data=1),
...    SearchString('kan', '', 'ritzau!global', data=2)
... ]
>>> sentences = ...  # Same as before
>>> ss_collection = SearchStringCollection(search_strings)
>>> res = ss_collection.find_all()
>>> res
[SearchString(kan, -, ritzau, data=1)]

Importantly, SearchStringCollection relies on the data variable being set on the collection of search strings. If it is set to None or multiple search strings have the same value, the behavior is undefined.

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts