Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
About Mass-media text processing application for your Relation Extraction task, powered by AREkit.
ARElight is an application for a granular view onto sentiments between mentioned named entities in texts.
pip install git+https://github.com/nicolay-r/arelight@v0.24.0
Infer sentiment attitudes from text file in English:
python3 -m arelight.run.infer \
--sampling-framework "arekit" \
--ner-framework "deeppavlov" \
--ner-model-name "ner_ontonotes_bert" \
--ner-types "ORG|PERSON|LOC|GPE" \
--terms-per-context 50 \
--sentence-parser "nltk:english" \
--tokens-per-context 128 \
--bert-framework "opennre" \
--batch-size 10 \
--pretrained-bert "bert-base-cased" \
--bert-torch-checkpoint "ra4-rsr1_bert-base-cased_cls.pth.tar" \
--backend "d3js_graphs" \
--docs-limit 500 \
-o "output" \
--from-files "<PATH-TO-TEXT-FILE>"
The complete documentation is avalable via -h
flag:
python3 -m arelight.run.infer -h
Parameters:
sampling-framework
we consider only arekit
framework by default.
from-files
-- list of filepaths to the related documents.
.csv
files we consider that each line of the particular column
as a separated document.
csv-sep
-- separator between columns.csv-column
-- name of the column in CSV file.collection-name
-- name of the result files based on sampled documents.terms-per-context
-- total amount of words for a single sample.sentence-parser
-- parser utilized for document split into sentences; list of the [supported parsers].synonyms-filepath
-- text file with listed synonymous entries, grouped by lines. [example].stemmer
-- for words lemmatization (optional); we support [PyMystem].ner-framework
-- type of the framework:
deeppavlov
-- [DeepPavlov] list of models.transformers
-- [Transformers] list of models.ner-model-name
-- model name within utilized NER framework.ner-types
-- list of types to be considered for annotation, separated by |
.docs-limit
-- the total limit of documents for sampling.translate-framework
-- text translation backend (optional); we support [googletrans]translate-entity
-- (optional) source and target language supported by backend, separated by :
.translate-text
-- (optional) source and target language supported by backend, separated by :
.bert-framework
-- samples classification framework; we support [OpenNRE].
text-b-type
-- (optional) NLI
or None [supported].pretrained-bert
-- pretrained state name.batch-size
-- amount of samples per single inference iteration.tokens-per-context
-- size of input.bert-torch-checkpoint
-- fine-tuned state.device-type
-- cpu
or gpu
.labels-fmt
-- list of the mappings from label
to integer value; is a p:1,n:2,u:0
by default, where:
p
-- positive label, which is mapped to 1
.n
-- negative label, which is mapped to 2
.u
-- undefined label (optional), which is mapped to 0
.backend
-- type of the backend (d3js_graphs
by default).
host
-- port on which we expect to launch localhost server.label-names
-- default mapping is p:pos,n:neg,u:neu
.-o
-- output folder for result collections and demo.Framework parameters mentioned above as well as their related setups might be ommited.
To Launch Graph Builder for D3JS and (optional) start DEMO server for collections in output
dir:
cd output && python -m http.server 8000
Finally, you may follow the demo page at http://0.0.0.0:8000/
output/
├── description/
└── ... // graph descriptions in JSON.
├── force/
└── ... // force graphs in JSON.
├── radial/
└── ... // radial graphs in JSON.
└── index.html // main HTML demo page.
For graph analysis you can perform several graph operations by this script:
python3 -m arelight.run.operations \
--operation "<OPERATION-NAME>" \
--graph_a_file output/force/boris.json \
--graph_b_file output/force/rishi.json \
--weights y \
-o output \
--description "[OPERATION] between Boris Johnson and Rishi Sunak on X/Twitter"
python3 -m arelight.run.operations
arelight.run.operations
allows you to operate ARElight's outputs using graphs: you can merge graphs, find their similarities or differences.
--graph_a_file
and --graph_b_file
are used to specify the paths to the .json
files for graphs A and B, which are used in the operations.
These files should be located in the <your_output/force>
folder.--name
-- name of the new graph.--description
-- description of the new graph.--host
-- determines the server port to host after the calculations.-o
-- option allows you to specify the path to the folder where you want to store the output.
You can either create a new output folder or use an existing one that has been created by ARElight.operation
Consider that you used ARElight script for X/Twitter
to infer relations from
messages of UK politicians Boris Johnson
and Rishi Sunak
:
python3 -m arelight.run.infer ...other arguments... \
-o output --collection-name "boris" --from-files "twitter_boris.txt"
python3 -m arelight.run.infer ...other arguments... \
-o output --collection-name "rishi" --from-files "twitter_rishi.txt"
According to the results section, you will have output
directory with 2 files force
layout graphs:
output/
├── force/
├── rishi.json
└── boris.json
You can do the following operations to combine several outputs, ot better understand similarities, and differences between them:
UNION $(G_1 \cup G_2)$ - combine multiple graphs together.
python3 -m arelight.run.operations --operation UNION \
--graph_a_file output/force/boris.json \
--graph_b_file output/force/rishi.json \
--weights y -o output --name boris_UNION_rishi \
--description "UNION of Boris Johnson and Rishi Sunak Twits"
INTERSECTION $(G_1 \cap G_2)$ - what is similar between 2 graphs?
python3 -m arelight.run.operations --operation INTERSECTION \
--graph_a_file output/force/boris.json \
--graph_b_file output/force/rishi.json \
--weights y -o output --name boris_INTERSECTION_rishi \
--description "INTERSECTION between Twits of Boris Johnson and Rishi Sunak"
DIFFERENCE $(G_1 - G_2)$ - what is unique in one graph, that another graph doesn't have?
python3 -m arelight.run.operations --operation DIFFERENCE \
--graph_a_file output/force/boris.json \
--graph_b_file output/force/rishi.json \
--weights y -o output --name boris_DIFFERENCE_rishi \
--description "Difference between Twits of Boris Johnson and Rishi Sunak"
weights
You have the option to specify whether to include edge weights in calculations or not. These weights represent the frequencies of discovered edges, indicating how often a relation between two instances was found in the text analyzed by ARElight.
--weights
y
: the result will be based on the union, intersection, or difference of these frequencies.n
: all weights of input graphs will be set to 1. In this case, the result will reflect the union, intersection, or difference of the graph topologies, regardless of the frequencies. This can be useful when the existence of relations is more important to you, and the number of times they appear in the text is not a significant factor.Note that using or not using the
weights
option may yield different topologies:
Our one and my personal interest is to help you better explore and analyze attitude and relation extraction related tasks with ARElight. A great research is also accompanied with the faithful reference. if you use or extend our work, please cite as follows:
@inproceedings{rusnachenko2024arelight,
title={ARElight: Context Sampling of Large Texts for Deep Learning Relation Extraction},
author={Rusnachenko, Nicolay and Liang, Huizhi and Kolomeets, Maxim and Shi, Lei},
booktitle={European Conference on Information Retrieval},
year={2024},
organization={Springer}
}
FAQs
About Mass-media text processing application for your Relation Extraction task, powered by AREkit.
We found that arelight demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.