
Research
/Security News
Fake imToken Chrome Extension Steals Seed Phrases via Phishing Redirects
Mixed-script homoglyphs and a lookalike domain mimic imToken’s import flow to capture mnemonics and private keys.
extr-ds
Advanced tools
Library to quickly build basic datasets for Named Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks.
Library to programmatically build labeled datasets for Named-Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks.
pip install extr-ds
see Instructions on how to use the command line utility to manage your project.
extr-ds --init
extr-ds --split
extr-ds --annotate -ents
extr-ds --annotate -rels
extr-ds --relate -label NO_RELATION=5,7,9
extr-ds --relate -delete 5,6,7
extr-ds --relate -recover 5,6,7
extr-ds --save -ents
extr-ds --save -rels
extr-ds --reset
extr-ds --help
text = 'Ted Johnson is a pitcher.'
from extr import RegEx, RegExLabel
from extr.entities import EntityExtactor
from extr_ds.labelers import IOB
entity_extractor = EntityExtactor([
RegExLabel('PERSON', [
RegEx([r'(ted\s+johnson|ted)'], re.IGNORECASE)
]),
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
])
sentence_tokenizer = ## 3rd party tokenizer ##
label = IOB(sentence_tokenizer, entity_extractor).label(text)
## label == <Label tokens=..., labels=['B-PERSON', 'I-PERSON', 'O', 'O', 'B-POSITION', 'O']>
from extr.entities import EntityExtractor
from extr.relations import RegExRelationLabelBuilder, \
RelationExtractor
from extr_ds.labelers import RelationClassification
from extr_ds.labelers.relation import RelationBuilder, BaseRelationLabeler, RuleBasedRelationLabeler
person_to_position_relationship = RegExRelationLabelBuilder('is_a') \
.add_e1_to_e2(
'PERSON',
[
r'\s+is\s+a\s+',
],
'POSITION'
) \
.build()
base_relation_labeler = BaseRelationLabeler(
RelationBuilder(relation_formats=[
('PERSON', 'POSITION', 'NO_RELATION')
])
)
rule_based_relation_labeler = RuleBasedRelationLabeler(
RelationExtractor([person_to_position_relationship])
)
labeler = RelationClassification(
EntityExtractor([
RegExLabel('PERSON', [
RegEx([r'(ted johnson|bob)'], re.IGNORECASE)
]),
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
]),
base_relation_labeler,
relation_labelers=[
rule_based_relation_labeler
]
)
results = labeler.label(text)
## results.relation_labels == [
## <RelationLabel sentence="<e1>Ted Johnson</e1> is a <e2>pitcher</e2>." label="is_a">
## ]
from extr_ds.validators import check_for_differences
differences_in_labels = check_for_differences(
['B-PERSON', 'I-PERSON', 'O', 'O', 'B-POSITION', 'O'],
['B-PERSON', 'O', 'O', 'O', 'B-POSITION', 'O']
)
## differences_in_labels.has_diffs == True
## differences_in_labels.diffs_between_labels == [
## <Difference index=1, diff_type=DifferenceTypes.S2_MISSING>
## ]
differences_in_labels = check_for_differences(
['B-PERSON', 'I-PERSON', 'O', 'O', 'B-POSITION', 'O'],
['B-PERSON', 'B-PERSON', 'O', 'O', 'B-POSITION', 'O']
)
## differences_in_labels.has_diffs == True
## differences_in_labels.diffs_between_labels == [
## <Difference index=1, diff_type=DifferenceTypes.MISMATCH>
## ]
FAQs
Library to quickly build basic datasets for Named Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks.
We found that extr-ds demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
/Security News
Mixed-script homoglyphs and a lookalike domain mimic imToken’s import flow to capture mnemonics and private keys.

Security News
Latio’s 2026 report recognizes Socket as a Supply Chain Innovator and highlights our work in 0-day malware detection, SCA, and auto-patching.

Company News
Join Socket for live demos, rooftop happy hours, and one-on-one meetings during BSidesSF and RSA 2026 in San Francisco.