Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
github.com/wchunt/naive-bayes-spam-filter
This is a Naive Bayes classifier for spam detection in Go. It uses a simple bag-of-words model to train on a dataset of spam and non-spam (ham) emails and then classify new emails as either spam or ham. The algorithm assumes that each word in an email is independent of the other words, which is clearly not true, hence the "naive" in the name of the algorithm.
This project was originally a school assignment that I enjoyed, written in C++, but was later ported to Go to practice writing Go code and to experiment with parallelization.
The project solves the problem of classifying a given text file as either spam or ham. The model is trained on a dataset of known spam and real text files and then applied to new a new text file to determine it's classification.
The classifier works as follows:
Read in a dataset of text files labeled as real and spam.
Tokenize the text files by splitting them into individual words and storing the frequency of each word in the dataset.
Calculate the prior probabilities of each class (spam and real) and the conditional probabilities of each word given each class.
To classify a new text file, tokenize it, calculate the log-probability of it belonging to each class using the prior and conditional probabilities, and choose the class with the highest probability.
To use the classifier, clone the repository and follow the instructions in the README.md file.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.