QazNLTK: a package for working with Kazakh language text processing.
![PyPI Downloads](https://img.shields.io/pypi/dm/qaznltk.svg?label=PyPI%20downloads)
What is it?
QazNLTK provides developers with a fast and convenient tool for processing text in the Kazakh language. Tailored for the unique linguistic characteristics of Kazakh, this library offers a comprehensive set of tools for natural language processing, like: tokenization, sentence segmentation, evaluation similarity score and tranliteration of kazakh language cyrillic-latin.
Table of Contents
Main Features
Here are just a few of the things that qaznltk does well:
- Kazakh language Text Tokenizing by keyword frequencies:
from qaznltk import qaznltk as qnltk
qn = qnltk.QazNLTK()
text = input("Enter text: ")
tokens = qn.tokenize(text)
print(tokens)
- Kazakh language Text Segmentation into sentences:
from qaznltk import qaznltk as qnltk
qn = qnltk.QazNLTK()
text = input("Enter text: ")
sent_tokens = qn.sent_tokenize(text)
print(sent_tokens)
- Evaluate Difference score between 2 text:
from qaznltk import qaznltk as qnltk
qn = qnltk.QazNLTK()
textA = input("Enter text A: ")
textB = input("Enter text B: ")
similarity_score = qn.calc_similarity(textA, textB)
print(similarity_score)
- Convert Kazakh language Text from Cyrillic to Latin using ISO-9 Standard:
from qaznltk import qaznltk as qnltk
qn = qnltk.QazNLTK()
text = input("Enter text: ")
latin_text = qn.convert2latin_iso9(text)
print(latin_text)
- Convert Kazakh language Text from Latin to Cyrillic using ISO-9 Standard:
from qaznltk import qaznltk as qnltk
qn = qnltk.QazNLTK()
text = input("Enter text: ")
cyrillic_text = qn.convert2cyrillic_iso9(text)
print(cyrillic_text)
- Sentiment Analysis of Kazakh language text [
negative: -1
, neutral: 0
, positive: 1
]:
from qaznltk import qaznltk as qnltk
qn = qnltk.QazNLTK()
text = input("Enter text: ")
sentimize_score = qn.sentimize(text)
print(sentimize_score)
- Converting any number
N
into kazakh language number words [N <= 10^31
]:
from qaznltk import qaznltk as qnltk
qn = qnltk.QazNLTK()
n = int(input())
print(qn.num2word(n))
- Extracting information from IIN (Individual Identification Number) [
IIN: 12 digits
]:
from qaznltk import qaznltk as qnltk
qn = qnltk.QazNLTK()
iin = input("Enter IIN: ")
print(qn.get_info_from_iin(iin))
- KNN Search on TF-IDF matrix embeddings of Kazakh language text:
from qaznltk import vectorizer
qn_vectorizer = vectorizer.QazNLTKVectorizer()
tf_idf_matrix = qn_vectorizer.fit_transform(documents)
knn = vectorizer.KNN(tf_idf_matrix)
query = "Еліміздің алтын күні жарық күн."
query_vector = qn_vectorizer.transform([query])[0]
results = knn.search(query_vector, k=3)
for idx, distance in results:
print(f"Document: {documents[idx]}, Distance: {distance}")
Where to get it
The source code is currently hosted on GitHub at: https://github.com/silvermete0r/QazNLTK.git
Binary installers for the latest released version are available at the Python
Package Index (PyPI).
pip install qaznltk
![image](https://github.com/silvermete0r/QazNLTK/assets/108217670/b1e8eaa1-f25f-4019-9d75-dee8d25d6a28)
Dependencies
- Package was developed on built-in python functions (pure python);
License
![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)
Getting Help
📧 supwithproject@gmail.com
Contributing to qaznltk
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
Go to Top