
Security News
Open Source Maintainers Demand Ability to Block Copilot-Generated Issues and PRs
Open source maintainers are urging GitHub to let them block Copilot from submitting AI-generated issues and pull requests to their repositories.
An open-source Python library for data cleaning tasks. Includes profanity detection, and removal. Now includes offensive language and hate speech detection using an AI model.
An open-source Python library for data cleaning tasks. It includes functions for profanity detection, and removal, and detection and removal of personal information. Also includes hate speech and offensive language detection and removal, using AI.
[!IMPORTANT] Please downgrade to
numpy
version1.26.4
. Our ValX DecisionTreeClassifier AI model, relies on lower versions ofnumpy
, because it was trained on these versions. For more information see: https://techoverflow.net/2024/07/23/how-to-fix-numpy-dtype-size-changed-may-indicate-binary-incompatibility-expected-96-from-c-header-got-88-from-pyobject/
Fixed a major incompatibility issue with scikit-learn
due to version changes in scikit-learn v1.3.0
which causes compatibility issues with versions later than 1.2.2
. ValX can now be used with scikit-learn
versions earlier and later than 1.3.0
!
We've also removed scikit-learn==1.2.2
as a dependency, as most versions of scikit-learn
will now work.
We have introduced a new optional info_type
parameter into our detect_sensitive_information
, and remove_sensitive_information
functions, to allow you to have fine-grained control over what sensitive information you want to detect or remove.
Also introduced more detection patterns for other types of sensitive information, including:
"iban"
: International Bank Account Number."mrn"
: Medical Record Number (may not work correctly, depending on provider and country)."icd10"
: International Classification of Diseases, Tenth Revision."geo_coords"
: Geo-coordinates (latitude and longitude in decimal degrees format)."username"
: Username handles (@username)."file_path"
: File paths (general patterns for both Windows and Unix paths)."bitcoin_wallet"
: Cryptocurrency wallet address."ethereum_wallet"
: Cryptocurrency wallet addresses.We have refactored and changed the detect_profanity
function:
Line
, Column
, Word
, and Language
.[!NOTE] You can view ValX's package documentation for more information on changes.
Using the AI models in ValX, you can now automatically remove hate speech, or offensive speech from your text data, without needing to run detection and write your own custom implementation method.
You can install ValX using pip:
pip install valx
ValX supports the following Python versions:
Please ensure that you have one of these Python versions installed before using ValX. ValX may not work as expected on lower versions of Python than the supported.
Below is a complete list of all the available supported languages for ValX's profanity detection and removal functions which are valid values for language
:
from valx import detect_profanity
# Detect profanity
results = detect_profanity(sample_text, language='English')
print("Profanity Evaluation Results", results)
from valx import remove_profanity
# Remove profanity
removed = remove_profanity(sample_text, "text_cleaned.txt", language="English")
from valx import detect_sensitive_information
# Detect sensitive information
detected_sensitive_info = detect_sensitive_information(sample_text)
[!NOTE] We have updated this function, and it now includes an optional argument for
info_type
, which can be used to detect only specific types of sensitive information. It was also added toremove_sensitive_information
.
from valx import remove_sensitive_information
# Remove sensitive information
cleaned_text = remove_sensitive_information(sample_text2)
from valx import detect_hate_speech
# Detect hate speech or offensive language
outcome_of_detection = detect_hate_speech("You are stupid.")
[!IMPORTANT] The model's possible outputs are:
['Hate Speech']
: The text was flagged and contained hate speech.['Offensive Speech']
: The text was flagged and contained offensive speech.['No Hate and Offensive Speech']
: The text was not flagged for any hate speech or offensive speech.
[!NOTE] See our official documentation for more examples on how to use ValX.
Contributions are welcome! If you encounter any issues, have suggestions, or want to contribute to ValX, please open an issue or submit a pull request on GitHub.
ValX is released under the terms of the MIT License (Modified). Please see the LICENSE file for the full text.
ValX uses data from this GitHub repository: https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/ © 2012-2020 Shutterstock, Inc.
Creative Commons Attribution 4.0 International License: https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/blob/master/LICENSE
Modified License Clause
The modified license clause grants users the permission to make derivative works based on the ValX software. However, it requires any substantial changes to the software to be clearly distinguished from the original work and distributed under a different name.
FAQs
An open-source Python library for data cleaning tasks. Includes profanity detection, and removal. Now includes offensive language and hate speech detection using an AI model.
We found that valx demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Open source maintainers are urging GitHub to let them block Copilot from submitting AI-generated issues and pull requests to their repositories.
Research
Security News
Malicious Koishi plugin silently exfiltrates messages with hex strings to a hardcoded QQ account, exposing secrets in chatbots across platforms.
Research
Security News
Malicious PyPI checkers validate stolen emails against TikTok and Instagram APIs, enabling targeted account attacks and dark web credential sales.