Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
This library provides several versions of the Rhetorical Structure Theory (RST) parser for English and Russian. Below, you will find instructions on how to set up and run the parser either locally or using Docker.
The parser supports multiple languages and corpora. The end-to-end performance metrics for different model versions across various corpora are as follows:
Tag / Version | Language | Train Data | Test Data | Seg | S | N | R | Full |
---|---|---|---|---|---|---|---|---|
gumrrg | En, Ru | GUM, RRG | GUM | 95.5 | 67.4 | 56.2 | 49.6 | 48.7 |
RRG | 97.0 | 67.1 | 54.6 | 46.5 | 45.4 | |||
rstdt | En | RST-DT | RST-DT | 97.8 | 75.6 | 65.0 | 55.6 | 53.9 |
rstreebank | Ru | RRT | RRT | 92.1 | 66.2 | 53.1 | 46.1 | 46.2 |
To use the IsaNLP RST Parser locally, follow these steps:
Installation:
First, install the isanlp
and isanlp_rst
libraries using pip:
pip install git+https://github.com/iinemo/isanlp.git
pip install isanlp_rst
Usage:
Below is an example of how to run a specific version of the parser using the library:
from isanlp_rst.parser import Parser
# Define the version of the model you want to use
version = 'gumrrg' # Choose from {'gumrrg', 'rstdt', 'rstreebank'}
# Initialize the parser with the desired version
parser = Parser(hf_model_name='tchewik/isanlp_rst_v3', hf_model_version=version, cuda_device=0)
# Example text for parsing
text = """
On Saturday, in the ninth edition of the T20 Men's Cricket World Cup, Team India won against South Africa by seven runs.
The final match was played at the Kensington Oval Stadium in Barbados. This marks India's second win in the T20 World Cup,
which was co-hosted by the West Indies and the USA between June 2 and June 29.
After winning the toss, India decided to bat first and scored 176 runs for the loss of seven wickets.
Virat Kohli top-scored with 76 runs, followed by Axar Patel with 47 runs. Hardik Pandya took three wickets,
and Jasprit Bumrah took two wickets.
"""
# Parse the text to obtain the RST tree
res = parser(text) # res['rst'] contains the binary discourse tree
# Display the structure of the RST tree
vars(res['rst'][0])
The output is an RST tree with the following structure:
{
'id': 7,
'left': <isanlp.annotation_rst.DiscourseUnit at 0x7f771076add0>,
'right': <isanlp.annotation_rst.DiscourseUnit at 0x7f7750b93d30>,
'relation': 'elaboration',
'nuclearity': 'NS',
'start': 0,
'end': 336,
'text': "On Saturday, ... took two wickets .",
}
(Optional) Save the result in RS3 format:
You can save the resulting RST tree in an RS3 file using the following command:
res['rst'][0].to_rs3('filename.rs3')
The filename.rs3
file can be opened in RSTTool or rstWeb for visualization or editing.
To run the IsaNLP RST Parser using Docker, follow these steps:
Run the Docker container:
Pull and run the Docker container with the desired model version tag:
docker run --rm -p 3335:3333 --name rst_rrt tchewik/isanlp_rst:3.0-rstreebank
Connect using the IsaNLP Python library:
Install the isanlp
library. The isanlp_rst
library is not required for dockerized parsers:
pip install git+https://github.com/iinemo/isanlp.git
Then connect to the running Docker container:
from isanlp import PipelineCommon
from isanlp.processor_remote import ProcessorRemote
# Put the container address here
address_rst = ('127.0.0.1', 3335)
ppl = PipelineCommon([
(ProcessorRemote(address_rst[0], address_rst[1], 'default'),
['text'],
{'rst': 'rst'})
])
res = ppl(text)
# res['rst'] will contain the binary discourse tree, similar to the previous example
If you use the IsaNLP RST Parser in your research, please cite our work as follows:
gumrrg
, rstdt
, and rstreebank
:
@inproceedings{
chistova-2024-bilingual,
title = "Bilingual Rhetorical Structure Parsing with Large Parallel Annotations",
author = "Chistova, Elena",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.577",
pages = "9689--9706"
}
FAQs
IsaNLP RST Parser: A library for parsing Rhetorical Structure Theory trees.
We found that isanlp-rst demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.