
Research
SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.
extr-ds
Advanced tools
Library to quickly build basic datasets for Named Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks.
Library to programmatically build labeled datasets for Named-Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks.
pip install extr-ds
see Instructions on how to use the command line utility to manage your project.
extr-ds --init
extr-ds --split
extr-ds --annotate -ents
extr-ds --annotate -rels
extr-ds --relate -label NO_RELATION=5,7,9
extr-ds --relate -delete 5,6,7
extr-ds --relate -recover 5,6,7
extr-ds --save -ents
extr-ds --save -rels
extr-ds --reset
extr-ds --help
text = 'Ted Johnson is a pitcher.'
from extr import RegEx, RegExLabel
from extr.entities import EntityExtactor
from extr_ds.labelers import IOB
entity_extractor = EntityExtactor([
RegExLabel('PERSON', [
RegEx([r'(ted\s+johnson|ted)'], re.IGNORECASE)
]),
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
])
sentence_tokenizer = ## 3rd party tokenizer ##
label = IOB(sentence_tokenizer, entity_extractor).label(text)
## label == <Label tokens=..., labels=['B-PERSON', 'I-PERSON', 'O', 'O', 'B-POSITION', 'O']>
from extr.entities import EntityExtractor
from extr.relations import RegExRelationLabelBuilder, \
RelationExtractor
from extr_ds.labelers import RelationClassification
from extr_ds.labelers.relation import RelationBuilder, BaseRelationLabeler, RuleBasedRelationLabeler
person_to_position_relationship = RegExRelationLabelBuilder('is_a') \
.add_e1_to_e2(
'PERSON',
[
r'\s+is\s+a\s+',
],
'POSITION'
) \
.build()
base_relation_labeler = BaseRelationLabeler(
RelationBuilder(relation_formats=[
('PERSON', 'POSITION', 'NO_RELATION')
])
)
rule_based_relation_labeler = RuleBasedRelationLabeler(
RelationExtractor([person_to_position_relationship])
)
labeler = RelationClassification(
EntityExtractor([
RegExLabel('PERSON', [
RegEx([r'(ted johnson|bob)'], re.IGNORECASE)
]),
RegExLabel('POSITION', [
RegEx([r'pitcher'], re.IGNORECASE)
]),
]),
base_relation_labeler,
relation_labelers=[
rule_based_relation_labeler
]
)
results = labeler.label(text)
## results.relation_labels == [
## <RelationLabel sentence="<e1>Ted Johnson</e1> is a <e2>pitcher</e2>." label="is_a">
## ]
from extr_ds.validators import check_for_differences
differences_in_labels = check_for_differences(
['B-PERSON', 'I-PERSON', 'O', 'O', 'B-POSITION', 'O'],
['B-PERSON', 'O', 'O', 'O', 'B-POSITION', 'O']
)
## differences_in_labels.has_diffs == True
## differences_in_labels.diffs_between_labels == [
## <Difference index=1, diff_type=DifferenceTypes.S2_MISSING>
## ]
differences_in_labels = check_for_differences(
['B-PERSON', 'I-PERSON', 'O', 'O', 'B-POSITION', 'O'],
['B-PERSON', 'B-PERSON', 'O', 'O', 'B-POSITION', 'O']
)
## differences_in_labels.has_diffs == True
## differences_in_labels.diffs_between_labels == [
## <Difference index=1, diff_type=DifferenceTypes.MISMATCH>
## ]
FAQs
Library to quickly build basic datasets for Named Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks.
We found that extr-ds demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.

Company News
Socket is proud to join the OpenJS Foundation as a Silver Member, deepening our commitment to the long-term health and security of the JavaScript ecosystem.

Security News
npm now links to Socket's security analysis on every package page. Here's what you'll find when you click through.