šŸš€ Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more →
Socket
Sign inDemoInstall
Socket

extr

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

extr

Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions

0.0.44
PyPI
Maintainers
1

Extr

Named Entity Recognition (NER) and Relation Extraction (RE) library using Regular Expressions


Install

pip install extr

Example

text = 'Ted is a Pitcher.'

1. Entity Extraction

Find Named Entities from text.

from extr import RegEx, RegExLabel
from extr.entities import EntityExtractor

entity_extractor = EntityExtractor([
    RegExLabel('PERSON', [
        RegEx([r'ted'], re.IGNORECASE)
    ]),
    RegExLabel('POSITION', [
        RegEx([r'pitcher'], re.IGNORECASE)
    ]),
])

entities = entity_extractor.get_entities(text)

## entities == [
##      <Entity label="POSITION" text="Pitcher" span=(9, 16)>,
##      <Entity label="PERSON" text="Ted" span=(0, 3)>
## ]

or add a knowledge base

from extr import RegEx, RegExLabel
from extr.entities import create_entity_extractor

entity_extractor = create_entity_extractor(
    [
        RegExLabel('POSITION', [
            RegEx([r'pitcher'], re.IGNORECASE)
        ]),
    ],
    kb={
        'PERSON': ['Ted']
    }
)

entities = entity_extractor.get_entities(text)

## entities == [
##      <Entity label="POSITION" text="Pitcher" span=(9, 16)>,
##      <Entity label="PERSON" text="Ted" span=(0, 3)>
## ]

2. Visualize Entities in HTML

Annotate text to display in HTML.

from extr.entities.viewers import HtmlViewer

viewer = HtmlViewer()
viewer.append(text, entities)

html = viewer.create_view(custom_styles="""
    .lb-PERSON {
        background-color: orange;
    }

    .lb-POSITION {
        background-color: yellow;
    }
""")

3. Relation Extraction

Annotate and Extract Relationships between Entities

from extr.entities import EntityAnnotator
from extr.relations import RelationExtractor, \
                           RegExRelationLabelBuilder

## define relationship between PERSON and POSITION
relationship = RegExRelationLabelBuilder('is_a') \
    .add_e1_to_e2(
        'PERSON', ## e1
        [
            ## define how the relationship exists in nature
            r'\s+is\s+a\s+',
        ],
        'POSITION' ## e2
    ) \
    .build()

relations_to_extract = [relationship]

## `entities` see 'Entity Extraction' above
annotated_text = EntityAnnotator().annotate(text, entities)
relations = RelationExtractor(relations_to_extract).extract(annotated_text, entities)

## relations == [
##      <Relation e1="Ted" r="is_a" e2="Pitcher">
## ]

Keywords

Named Entity Recognition

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts