
Product
Introducing Scala and Kotlin Support in Socket
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.
HMRF is a method for automatic keyword extraction from a text corpus. This method favors words that maximize the difference between their frequency in one class (positive class) and their frequency in the rest of the classes.
lang: str, default = 'english'. Language of the texts.
positive_class: str or int, default = 1
n: int, default = 50. Amount of keywords to extract.
phrases: bool, default = False. If phrases will be extracted.
n_phrases: int, default = 20. Amount of key phrases to extract.
phrases_by: {'Freq', 'PMI', 'TTEST', 'CHI'}, default = 'PMI'. Strategy to extract key phrases.
import hmrf
texts = ["I absolutely loved this movie! The storyline was captivating, the acting was superb, and the cinematography was stunning.",
"This restaurant exceeded my expectations. The food was delicious, the service was impeccable, and the ambiance was delightful.",
"I'm so happy with my purchase! The product arrived on time, it works perfectly, and the customer support was excellent.",
"The hotel stay was amazing. The room was spacious and clean, the staff was friendly and accommodating, and the amenities were top-notch.",
"I highly recommend this book. The writing style is beautiful, the characters are well-developed, and the story kept me hooked till the end.",
"I was extremely disappointed with the quality of this product. It broke within a week, and the customer service was unhelpful.",
"The movie was a complete waste of time. The plot was confusing, the acting was terrible, and I regretted watching it.",
"The service at this restaurant was awful. The food took forever to arrive, the server was rude, and the prices were exorbitant.",
"I had a horrible experience with this airline. My flight was delayed, the seats were uncomfortable, and the staff was unprofessional.",
"I found this book to be poorly written. The characters were one-dimensional, the plot was predictable, and it lacked depth."]
labels = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
extractor = hmrf.Hmrf(n=10)
keywords = extractor.hmrf(texts, labels)
for kw in keywords:
print(kw)
purchase
notch
kept
room
impeccable
hotel
hooked
recommend
spacious
stay
import hmrf
texts = ["I admire the government's efforts to promote education and create equal opportunities for all citizens.",
"The new policy on environmental conservation is a step in the right direction. It's crucial to protect our planet for future generations.",
"I strongly disagree with the recent tax reform. It places an unfair burden on the middle class and fails to address income inequality.",
"The foreign policy decisions taken by our leaders have enhanced our diplomatic relations and strengthened global cooperation.",
"I appreciate the government's commitment to healthcare reform. Accessible and affordable healthcare should be a priority for everyone.",
"What an incredible goal by the striker! The precision and power in that shot were absolutely amazing.",
"The team showed great resilience and teamwork throughout the game, securing a well-deserved victory.",
"The athlete's performance in the marathon was outstanding. They displayed remarkable endurance and determination.",
"The coach's strategic decisions and effective player substitutions turned the match around in our team's favor.",
"It's disappointing to see the player receive a red card. Their unsportsmanlike behavior tarnished the spirit of the game."]
labels = ["political", "political", "political", "political", "political", "sport", "sport", "sport", "sport", "sport"]
extractor = hmrf.Hmrf(positive_class="political", n=10)
keywords = extractor.hmrf(texts, labels)
for kw in keywords:
print(kw)
healthcare
government
policy
reform
global
generations
income
future
inequality
accessible
Please cite the following works when using Hmrf
@article{DELAPENASARRACEN2023103433,
title = {Systematic keyword and bias analyses in hate speech detection},
journal = {Information Processing & Management},
volume = {60},
number = {5},
pages = {103433},
year = {2023},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2023.103433},
url = {https://www.sciencedirect.com/science/article/pii/S030645732300170X},
author = {Gretel Liz {De la Peña Sarracén} and Paolo Rosso},
}
FAQs
Package to extract keywords in one of the classes of a dataset
We found that hmrf demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.
Application Security
/Security News
Socket CEO Feross Aboukhadijeh and a16z partner Joel de la Garza discuss vibe coding, AI-driven software development, and how the rise of LLMs, despite their risks, still points toward a more secure and innovative future.
Research
/Security News
Threat actors hijacked Toptal’s GitHub org, publishing npm packages with malicious payloads that steal tokens and attempt to wipe victim systems.