Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.
Final Year Project on EDU Segmentation:
To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.
Segbot:
http://138.197.118.157:8000/segbot/
https://www.ijcai.org/proceedings/2018/0579.pdf
To use the EDUSegmentation module, follow these steps:
download
module to download all models:from edu_segmentation.download import download_models
download_models()
edu_segmentation
module and its related classesfrom edu_segmentation.main import EDUSegmentation, ModelFactory, BERTUncasedModel, BERTCasedModel, BARTModel
The edu_segmentation module provides an easy-to-use interface to perform EDU segmentation using different strategies and models. Follow these steps to use it:
from edu_segmentation.main import DefaultSegmentation, ConjunctionSegmentation
ModelFactory
. model_type = "bert_uncased" # or "bert_cased", "bart"
model = ModelFactory.create_model(model_type)
EDUSegmentation
using the chosen model: edu_segmenter = EDUSegmentation(model)
text = "Your input text here."
granularity = "conjunction_words" # or "default"
conjunctions = ["and", "but", "however"] # Customize conjunctions if needed
device = 'cpu' # Choose your device, e.g., 'cuda:0'
segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)
Here's a simple example demonstrating how to use the edu_segmentation module:
from edu_segmentation.download import download_models
from edu_segmentation.main import ModelFactory, EDUSegmentation
download_models()
# Create a BERT Uncased model
model = ModelFactory.create_model("bart") # or bert_cased or bert_uncased
# Create an instance of EDUSegmentation using the model
edu_segmenter = EDUSegmentation(model)
# Segment the text using the conjunction-based segmentation strategy
text = "The food is good, but the service is bad."
granularity = "conjunction_words" # or default
conjunctions = ["and", "but", "however"] # customise as needed
device = 'cpu' # or cuda
segmented_output = edu_segmenter.run(text, granularity, conjunctions, device)
print(segmented_output)
FAQs
To improve EDU segmentation performance using Segbot. As Segbot has an encoder-decoder model architecture, we can replace bidirectional GRU encoder with generative pretraining models such as BART and T5. Evaluate the new model using the RST dataset by using few-shot based settings (e.g. 100 examples) to train the model, instead of using the full dataset.
We found that edu-segmentation demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.