Socket
Socket
Sign inDemoInstall

hmrf

Package Overview
Dependencies
6
Maintainers
1
Alerts
File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

    hmrf

Package to extract keywords in one of the classes of a dataset


Maintainers
1

Readme

Harmonic Mean of Relative Frequencies (HMRF)

HMRF is a method for automatic keyword extraction from a text corpus. This method favors words that maximize the difference between their frequency in one class (positive class) and their frequency in the rest of the classes.

Parameters

  • lang: str, default = 'english'. Language of the texts.

  • positive_class: str or int, default = 1

  • n: int, default = 50. Amount of keywords to extract.

  • phrases: bool, default = False. If phrases will be extracted.

  • n_phrases: int, default = 20. Amount of key phrases to extract.

  • phrases_by: {'Freq', 'PMI', 'TTEST', 'CHI'}, default = 'PMI'. Strategy to extract key phrases.

Usage (Python)

Example 1

import hmrf

texts = ["I absolutely loved this movie! The storyline was captivating, the acting was superb, and the cinematography was stunning.",
         "This restaurant exceeded my expectations. The food was delicious, the service was impeccable, and the ambiance was delightful.",
	 "I'm so happy with my purchase! The product arrived on time, it works perfectly, and the customer support was excellent.",
	 "The hotel stay was amazing. The room was spacious and clean, the staff was friendly and accommodating, and the amenities were top-notch.",
	 "I highly recommend this book. The writing style is beautiful, the characters are well-developed, and the story kept me hooked till the end.",
	 "I was extremely disappointed with the quality of this product. It broke within a week, and the customer service was unhelpful.",
	 "The movie was a complete waste of time. The plot was confusing, the acting was terrible, and I regretted watching it.",
	 "The service at this restaurant was awful. The food took forever to arrive, the server was rude, and the prices were exorbitant.",
	 "I had a horrible experience with this airline. My flight was delayed, the seats were uncomfortable, and the staff was unprofessional.",
	 "I found this book to be poorly written. The characters were one-dimensional, the plot was predictable, and it lacked depth."]

labels = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]

extractor = hmrf.Hmrf(n=10)
keywords = extractor.hmrf(texts, labels)
	
for kw in keywords:
    print(kw)

Output

purchase
notch
kept
room
impeccable
hotel
hooked
recommend
spacious
stay

Example 2

import hmrf

texts = ["I admire the government's efforts to promote education and create equal opportunities for all citizens.",
	 "The new policy on environmental conservation is a step in the right direction. It's crucial to protect our planet for future generations.",
	 "I strongly disagree with the recent tax reform. It places an unfair burden on the middle class and fails to address income inequality.",
	 "The foreign policy decisions taken by our leaders have enhanced our diplomatic relations and strengthened global cooperation.",
	 "I appreciate the government's commitment to healthcare reform. Accessible and affordable healthcare should be a priority for everyone.",
	 "What an incredible goal by the striker! The precision and power in that shot were absolutely amazing.",
	 "The team showed great resilience and teamwork throughout the game, securing a well-deserved victory.",
	 "The athlete's performance in the marathon was outstanding. They displayed remarkable endurance and determination.",
	 "The coach's strategic decisions and effective player substitutions turned the match around in our team's favor.",
	 "It's disappointing to see the player receive a red card. Their unsportsmanlike behavior tarnished the spirit of the game."]

labels = ["political", "political", "political", "political", "political", "sport", "sport", "sport", "sport", "sport"]

extractor = hmrf.Hmrf(positive_class="political", n=10)
keywords = extractor.hmrf(texts, labels)
	
for kw in keywords:
    print(kw)

Output

healthcare
government
policy
reform
global
generations
income
future
inequality
accessible

References

Please cite the following works when using Hmrf

@article{DELAPENASARRACEN2023103433,
title = {Systematic keyword and bias analyses in hate speech detection},
journal = {Information Processing & Management},
volume = {60},
number = {5},
pages = {103433},
year = {2023},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2023.103433},
url = {https://www.sciencedirect.com/science/article/pii/S030645732300170X},
author = {Gretel Liz {De la Peña Sarracén} and Paolo Rosso},
}

FAQs


Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc