
Security News
Django Joins curl in Pushing Back on AI Slop Security Reports
Django has updated its security policies to reject AI-generated vulnerability reports that include fabricated or unverifiable content.
A python tool kit for processing Bangla texts.
There are three installation options of the bkit package. These are:
bkit
: The most basic version of bkit with the normalization, cleaning and tokenization capabilities.pip install bkit
bkit[lemma]
: Everything in the basic version plus lemmatization capability.pip install bkit[lemma]
bkit[all]
: Everything that are available in bkit including normalization, cleaning, tokenization, lemmatization, NER, POS and shallow parsing.pip install bkit[all]
bkit.utils.is_bangla(text) -> bool
: Checks if text contains only Bangla characters, digits, spaces, punctuations and some symbols. Returns true if so, else return false.bkit.utils.is_digit(text) -> bool
: Checks if text contains only Bangla digit characters. Returns true if so, else return false.bkit.utils.contains_digit(text, check_english_digits) -> bool
: Checks if text contains any digits. By default checks only Bangla digits. Returns true if so, else return false.bkit.utils.contains_bangla(text) -> bool
: Checks if text contains any Bangla character. Returns true if so, else return false.Text transformation includes the normalization and cleaning procedures. To transform text, use the bkit.transform
module. Supported functionalities are:
This module normalize Bangla text using the following steps:
import bkit
text = 'āĻ
āĻžāĻžāĻŽāĻžāĻŦāĻŧ āĨ¤ '
print(list(text))
# >>> ['āĻ
', 'āĻž', 'āĻž', 'āĻŽ', 'āĻž', 'āĻŦ', 'āĻŧ', ' ', 'āĨ¤', ' ']
normalizer = bkit.transform.Normalizer(
normalize_characters=True,
normalize_zw_characters=True,
normalize_halant=True,
normalize_vowel_kar=True,
normalize_punctuation_spaces=True
)
clean_text = normalizer(text)
print(clean_text, list(clean_text))
# >>> āĻāĻŽāĻžāϰāĨ¤ ['āĻ', 'āĻŽ', 'āĻž', 'āϰ', 'āĨ¤']
This module performs character normalization in Bangla text. It performs nukta normalization, Assamese normalization, Kar normalization, legacy character normalization and Punctuation normalization sequentially.
import bkit
text = 'āĻāĻŽāĻžāĻŦāĻŧ'
print(list(text))
# >>> ['āĻ', 'āĻŽ', 'āĻž', 'āĻŦ', 'āĻŧ']
text = bkit.transform.normalize_characters(text)
print(list(text))
# >>> ['āĻ', 'āĻŽ', 'āĻž', 'āϰ']
Normalizes punctuation spaces i.e. adds necessary spaces before or after specific punctuations, also removes if necessary.
import bkit
text = 'āϰāĻšāĻŋāĻŽ(ā§¨ā§Š)āĻ āĻāĻĨāĻž āĻŦāϞā§āύ āĨ¤āϤāĻŋāύāĻŋ ( āϰāĻšāĻŋāĻŽ ) āĻāϰāĻ āĻāĻžāύāĻžāύ, ā§§,⧍ā§Ē,ā§Šā§Ģ,ā§Ŧā§Ģā§Ē.ā§Šā§¨ā§Š āĻā§āĻāĻŋ āĻāĻžāĻāĻž āĻŦā§āϝāĻžā§ā§...'
clean_text = bkit.transform.normalize_punctuation_spaces(text)
print(clean_text)
# >>> āϰāĻšāĻŋāĻŽ (ā§¨ā§Š) āĻ āĻāĻĨāĻž āĻŦāϞā§āύāĨ¤ āϤāĻŋāύāĻŋ (āϰāĻšāĻŋāĻŽ) āĻāϰāĻ āĻāĻžāύāĻžāύ, ā§§,⧍ā§Ē,ā§Šā§Ģ,ā§Ŧā§Ģā§Ē.ā§Šā§¨ā§Š āĻā§āĻāĻŋ āĻāĻžāĻāĻž āĻŦā§āϝāĻžā§ā§...
There are two zero-width characters. These are Zero Width Joiner (ZWJ) and Zero Width Non Joiner (ZWNJ) characters. Generally ZWNJ is not used with Bangla texts and ZWJ joiner is used with āϰ
only. So, these characters are normalized based on these intuitions.
import bkit
text = 'āϰâā§āϝâāĻžāĻā§āĻ'
print(f"text: {text} \t Characters: {list(text)}")
# >>> text: āϰâā§āϝâāĻžāĻā§āĻ Characters: ['āϰ', '\u200d', 'ā§', 'āϝ', '\u200c', 'āĻž', 'āĻ', 'ā§', 'āĻ']
clean_text = bkit.transform.normalize_zero_width_chars(text)
print(f"text: {clean_text} \t Characters: {list(clean_text)}")
# >>> text: āϰâā§āϝāĻžāĻā§āĻ Characters: ['āϰ', '\u200d', 'ā§', 'āϝ', 'āĻž', 'āĻ', 'ā§', 'āĻ']
This function normalizes halant (āĻšāϏāύā§āϤ) [0x09CD
] in Bangla text. While using this function, it is recommended to normalize the zero width characters at first, e.g. using the bkit.transform.normalize_zero_width_chars()
function.
During the normalization it also handles the āϤ⧠-> ā§
conversion. For a valid conjunct letter (āϝā§āĻā§āϤāĻŦāϰā§āĻŖ) where 'āϤ' is the former character, can take one of 'āϤ', 'āĻĨ', 'āύ', 'āĻŦ', 'āĻŽ', 'āϝ', and 'āϰ' as the next character. The conversion is perform based on this intuition.
During the halant normalization, the following cases are handled.
import bkit
text = 'āĻāϏāύā§ā§ā§āύ āĻāϏāĻĢāĻžāĻā§āϞā§āϞāĻžāĻšā§â āĻāϞāĻŦāϤā§â āĻāϞāĻŦāϤ⧠āϰâā§āϝāĻžāĻŦ āĻā§āϏāĻŋ'
print(list(text))
# >>> ['āĻ', 'āϏ', 'āύ', 'ā§', 'ā§', 'ā§', 'āύ', ' ', 'āĻ', 'āϏ', 'āĻĢ', 'āĻž', 'āĻ', 'ā§', 'āϞ', 'ā§', 'āϞ', 'āĻž', 'āĻš', 'ā§', '\u200c', ' ', 'āĻ', 'āϞ', 'āĻŦ', 'āϤ', 'ā§', '\u200d', ' ', 'āĻ', 'āϞ', 'āĻŦ', 'āϤ', 'ā§', ' ', 'āϰ', '\u200d', 'ā§', 'āϝ', 'āĻž', 'āĻŦ', ' ', 'āĻ', 'ā§', 'āϏ', 'āĻŋ']
clean_text = bkit.transform.normalize_zero_width_chars(text)
clean_text = bkit.transform.normalize_halant(clean_text)
print(clean_text, list(clean_text))
# >>> āĻāϏāύā§āύ āĻāϏāĻĢāĻžāĻā§āϞā§āϞāĻžāĻš āĻāϞāĻŦā§ āĻāϞāĻŦā§ āϰâā§āϝāĻžāĻŦ āĻāϏāĻŋ ['āĻ', 'āϏ', 'āύ', 'ā§', 'āύ', ' ', 'āĻ', 'āϏ', 'āĻĢ', 'āĻž', 'āĻ', 'ā§', 'āϞ', 'ā§', 'āϞ', 'āĻž', 'āĻš', ' ', 'āĻ', 'āϞ', 'āĻŦ', 'ā§', ' ', 'āĻ', 'āϞ', 'āĻŦ', 'ā§', ' ', 'āϰ', '\u200d', 'ā§', 'āϝ', 'āĻž', 'āĻŦ', ' ', 'āĻ', 'āϏ', 'āĻŋ']
Normalizes kar ambiguity with vowels, āĻ, āĻ, and āĻ. It removes any kar that is preceded by a vowel or consonant diacritics like: āĻāĻž
will be normalized to āĻ
. In case of consecutive occurrence of kars like: āĻāĻžāĻžāĻžā§
, only the first kar will be kept like: āĻāĻž
.
import bkit
text = 'āĻ
āĻāĻļāĻā§ āĻ
āĻāĻļāĻā§āϰāĻšāĻŖāĻā§ āĻāĻžāĻžāϰ⧠āĻāĻāύāĻā§ āĻāϞāĻŦāĻžāϰā§āϤā§ā§ āϏāĻžāϧā§ā§ āĻāĻžāĻžāĻžā§'
print(list(text))
# >>> ['āĻ
', 'āĻ', 'āĻļ', 'āĻ', 'ā§', ' ', 'āĻ
', 'āĻ', 'āĻļ', 'āĻ', 'ā§', 'āϰ', 'āĻš', 'āĻŖ', 'āĻ', 'ā§', ' ', 'āĻ', 'āĻž', 'āĻž', 'āϰ', 'ā§', ' ', 'āĻ', 'āĻ', 'āύ', 'āĻ', 'ā§', ' ', 'āĻ', 'āϞ', 'āĻŦ', 'āĻž', 'āϰ', 'ā§', 'āϤ', 'ā§', 'ā§', ' ', 'āϏ', 'āĻž', 'āϧ', 'ā§', 'ā§', ' ', 'āĻ', 'āĻž', 'āĻž', 'āĻž', 'ā§']
clean_text = bkit.transform.normalize_kar_ambiguity(text)
print(clean_text, list(clean_text))
# >>> āĻ
āĻāĻļāĻ āĻ
āĻāĻļāĻā§āϰāĻšāĻŖāĻ āĻāϰ⧠āĻāĻāύāĻ āĻāϞāĻŦāĻžāϰā§āϤ⧠āϏāĻžāϧ⧠āĻāĻž ['āĻ
', 'āĻ', 'āĻļ', 'āĻ', ' ', 'āĻ
', 'āĻ', 'āĻļ', 'āĻ', 'ā§', 'āϰ', 'āĻš', 'āĻŖ', 'āĻ', ' ', 'āĻ', 'āϰ', 'ā§', ' ', 'āĻ', 'āĻ', 'āύ', 'āĻ', ' ', 'āĻ', 'āϞ', 'āĻŦ', 'āĻž', 'āϰ', 'ā§', 'āϤ', 'ā§', ' ', 'āϏ', 'āĻž', 'āϧ', 'ā§', ' ', 'āĻ', 'āĻž']
Clean text using the following steps sequentially:
import bkit
text = '<a href=some_URL>āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļ</a>\nāĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ āĻā§āϤāύ ā§§.ā§Ēā§ āϞāĻā§āώ āĻāĻŋāϞā§āĻŽāĻŋāĻāĻžāϰ!!!'
clean_text = bkit.transform.clean_text(text)
print(clean_text)
# >>> āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļ āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ āĻā§āϤāύ āϞāĻā§āώ āĻāĻŋāϞā§āĻŽāĻŋāĻāĻžāϰ
Remove punctuations with the given replace_with
character/string.
import bkit
text = 'āĻāĻŽāϰāĻž āĻŽāĻžāĻ ā§ āĻĢā§āĻāĻŦāϞ āĻā§āϞāϤ⧠āĻĒāĻāύā§āĻĻ āĻāϰāĻŋ!'
clean_text = bkit.transform.clean_punctuations(text)
print(clean_text)
# >>> āĻāĻŽāϰāĻž āĻŽāĻžāĻ ā§ āĻĢā§āĻāĻŦāϞ āĻā§āϞāϤ⧠āĻĒāĻāύā§āĻĻ āĻāϰāĻŋ
clean_text = bkit.transform.clean_punctuations(text, replace_with=' PUNC ')
print(clean_text)
# >>> āĻāĻŽāϰāĻž āĻŽāĻžāĻ ā§ āĻĢā§āĻāĻŦāϞ āĻā§āϞāϤ⧠āĻĒāĻāύā§āĻĻ āĻāϰāĻŋ PUNC
Remove any bangla digit from text by replacing with the given replace_with
character/string.
import bkit
text = 'āϤāĻžāϰ āĻŦāĻžāϏāĻž ā§ā§¯ āύāĻžāĻŽā§āĻŦāĻžāϰ āϰā§āĻĄā§āĨ¤'
clean_text = bkit.transform.clean_digits(text)
print(clean_text)
# >>> āϤāĻžāϰ āĻŦāĻžāϏāĻž āύāĻžāĻŽā§āĻŦāĻžāϰ āϰā§āĻĄā§āĨ¤
clean_text = bkit.transform.clean_digits(text, replace_with='#')
print(clean_text)
# >>> āϤāĻžāϰ āĻŦāĻžāϏāĻž ## āύāĻžāĻŽā§āĻŦāĻžāϰ āϰā§āĻĄā§āĨ¤
Clean multiple consecutive whitespace characters including space, newlines, tabs, vertical tabs, etc. It also removes leading and trailing whitespace characters.
import bkit
text = 'āϤāĻžāϰ āĻŦāĻžāϏāĻž ā§ā§¯ \t\t āύāĻžāĻŽā§āĻŦāĻžāϰ āϰā§āĻĄā§āĨ¤\nāϏ⧠āĻā§āĻŦ \v āĻāĻžāϞ⧠āĻā§āϞā§āĨ¤'
clean_text = bkit.transform.clean_multiple_spaces(text)
print(clean_text)
# >>> āϤāĻžāϰ āĻŦāĻžāϏāĻž ā§ā§¯ āύāĻžāĻŽā§āĻŦāĻžāϰ āϰā§āĻĄā§āĨ¤ āϏ⧠āĻā§āĻŦ āĻāĻžāϞ⧠āĻā§āϞā§āĨ¤
clean_text = bkit.transform.clean_multiple_spaces(text, keep_new_line=True)
print(clean_text)
# >>> āϤāĻžāϰ āĻŦāĻžāϏāĻž ā§ā§¯ āύāĻžāĻŽā§āĻŦāĻžāϰ āϰā§āĻĄā§āĨ¤\nāϏ⧠āĻā§āĻŦ \n āĻāĻžāϞ⧠āĻā§āϞā§āĨ¤
Clean URLs from text and replace the URLs with any given string.
import bkit
text = 'āĻāĻŽāĻŋ https://xyz.abc āϏāĻžāĻāĻā§ āĻŦā§āϞāĻ āϞāĻŋāĻāĻŋāĨ¤ āĻāĻ ftp://10.17.5.23/books āϏāĻžāϰā§āĻāĻžāϰ āĻĨā§āĻā§ āĻāĻŽāĻžāϰ āĻŦāĻāĻā§āϞ⧠āĻĒāĻžāĻŦā§āĨ¤ āĻāĻ https://bn.wikipedia.org/wiki/%E0%A6%A7%E0%A6%BE%E0%A6%A4%E0%A7%81_(%E0%A6%AC%E0%A6%BE%E0%A6%82%E0%A6%B2%E0%A6%BE_%E0%A6%AC%E0%A7%8D%E0%A6%AF%E0%A6%BE%E0%A6%95%E0%A6%B0%E0%A6%A3) āϞāĻŋāĻā§āĻāĻāĻŋāϤ⧠āĻāĻžāϞ⧠āϤāĻĨā§āϝ āĻāĻā§āĨ¤'
clean_text = bkit.transform.clean_urls(text)
print(clean_text)
# >>> āĻāĻŽāĻŋ āϏāĻžāĻāĻā§ āĻŦā§āϞāĻ āϞāĻŋāĻāĻŋāĨ¤ āĻāĻ āϏāĻžāϰā§āĻāĻžāϰ āĻĨā§āĻā§ āĻāĻŽāĻžāϰ āĻŦāĻāĻā§āϞ⧠āĻĒāĻžāĻŦā§āĨ¤ āĻāĻ āϞāĻŋāĻā§āĻāĻāĻŋāϤ⧠āĻāĻžāϞ⧠āϤāĻĨā§āϝ āĻāĻā§āĨ¤
clean_text = bkit.transform.clean_urls(text, replace_with='URL')
print(clean_text)
# >>> āĻāĻŽāĻŋ URL āϏāĻžāĻāĻā§ āĻŦā§āϞāĻ āϞāĻŋāĻāĻŋāĨ¤ āĻāĻ URL āϏāĻžāϰā§āĻāĻžāϰ āĻĨā§āĻā§ āĻāĻŽāĻžāϰ āĻŦāĻāĻā§āϞ⧠āĻĒāĻžāĻŦā§āĨ¤ āĻāĻ URL āϞāĻŋāĻā§āĻāĻāĻŋāϤ⧠āĻāĻžāϞ⧠āϤāĻĨā§āϝ āĻāĻā§āĨ¤
Clean emoji and emoticons from text and replace those with any given string.
import bkit
text = 'āĻāĻŋāĻā§ āĻāĻŽā§āĻāĻŋ āĻšāϞ: đđĢ
đžđĢ
đŋđĢđŧđĢđŊđĢđžđĢđŋđĢđĢđģđĢđŧđĢđŊđĢđžđĢđŋđ§đǏđǎđĒšđĒēđĢđĢđĢđđđđĒŦđĒŠđĒĢđŠŧđŠģđ̧đĒĒđ°'
clean_text = bkit.transform.clean_emojis(text, replace_with='<EMOJI>')
print(clean_text)
# >>> āĻāĻŋāĻā§ āĻāĻŽā§āĻāĻŋ āĻšāϞ: <EMOJI>
Clean HTML tags from text and replace those with any given string.
import bkit
text = '<a href=some_URL>āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļ</a>'
clean_text = bkit.transform.clean_html(text)
print(clean_text)
# >>> āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļ
Remove multiple consecutive punctuations and keep the first punctuation only.
import bkit
text = 'āĻāĻŋ āĻāύāύā§āĻĻ!!!!!'
clean_text = bkit.transform.clean_multiple_punctuations(text)
print(clean_text)
# >>> āĻāĻŋ āĻāύāύā§āĻĻ!
Remove special characters like $
, #
, @
, etc and replace them with the given string. If no character list is passed, [$, #, &, %, @]
are removed by default.
import bkit
text = '#āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļ$'
clean_text = bkit.transform.clean_special_characters(text, characters=['#', '$'], replace_with='')
print(clean_text)
# >>> āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļ
Non Bangla characters include characters and punctuation not used in Bangla like english or other language's alphabets and replace them with the given string.
import bkit
text = 'āĻāĻ āĻļā§āĻāĻā§āĻ āĻšāĻžāϤāĻŋāĻļā§āĻāĻĄāĻŧ Heliotropium indicum, āĻ
āϤāϏā§, āĻāĻāύā§āĻĻ Calotropis gigantea āĻāĻžāĻā§āϰ āĻĒāĻžāϤāĻžāϰ āϰāϏāĻžāϞ⧠āĻ
āĻāĻļ āĻāĻšāĻžāϰ āĻāϰā§āĨ¤'
clean_text = bkit.transform.clean_non_bangla(text, replace_with='')
print(clean_text)
# >>> āĻāĻ āĻļā§āĻāĻā§āĻ āĻšāĻžāϤāĻŋāĻļā§āĻāĻĄāĻŧ , āĻ
āϤāϏā§, āĻāĻāύā§āĻĻ āĻāĻžāĻā§āϰ āĻĒāĻžāϤāĻžāϰ āϰāϏāĻžāϞ⧠āĻ
āĻāĻļ āĻāĻšāĻžāϰ āĻāϰā§
The bkit.analysis.count_words
function can be used to get the word counts. It has the following paramerts:
"""
Args:
text (Tuple[str, List[str]]): The text to count words from. If a string is provided,
it will be split into words. If a list of strings is provided, each string will
be split into words and counted separately.
clean_punctuation (bool, optional): Whether to clean punctuation from the words count. Defaults to False.
punct_replacement (str, optional): The replacement for the punctuation. Only applicable if
clean_punctuation is True. Defaults to "".
return_dict (bool, optional): Whether to return the word count as a dictionary.
Defaults to False.
ordered (bool, optional): Whether to return the word count in descending order. Only
applicable if return_dict is True. Defaults to False.
Returns:
Tuple[int, Dict[str, int]]: If return_dict is True, returns a tuple containing the
total word count and a dictionary where the keys are the words and the values
are their respective counts. If return_dict is False, returns only the total
word count as an integer.
"""
# examples
import bkit
text='āĻ
āĻāĻŋāώā§āĻā§āϰ āĻāĻā§āϰ āĻĻāĻŋāύ āĻāϤāĻāĻžāϞ āϰā§āĻŦāĻŦāĻžāϰ āĻā§āĻžāĻļāĻŋāĻāĻāύ⧠āĻŦāĻŋāĻļāĻžāϞ āĻāĻ āϏāĻŽāĻžāĻŦā§āĻļā§ āĻšāĻžāĻāĻŋāϰ āĻšāύ āĻā§āϰāĻžāĻŽā§āĻĒāĨ¤ āϤāĻŋāύāĻŋ āĻāĻā§āĻā§āĻŦāϏāĻŋāϤ āĻāĻā§āϤ-āϏāĻŽāϰā§āĻĨāĻāĻĻā§āϰ āĻāĻŽā§āϰāĻŋāĻāĻžāϰ āĻĒāϤāύā§āϰ āϝāĻŦāύāĻŋāĻāĻž āĻāĻāĻžāύā§āϰ āĻ
āĻā§āĻā§āĻāĻžāϰ āĻāϰā§āύāĨ¤'
total_words=bkit.analysis.count_words(text)
print(total_words)
# >>> 21
The bkit.analysis.count_sentences function can be used to get the word counts. It has the following paramerts:
"""
Counts the number of sentences in the given text or list of texts.
Args:
text (Tuple[str, List[str]]): The text or list of texts to count sentences from.
return_dict (bool, optional): Whether to return the result as a dictionary. Defaults to False.
ordered (bool, optional): Whether to order the result in descending order.
Only applicable if return_dict is True. Defaults to False.
Returns:
int or dict: The count of sentences. If return_dict is True, returns a dictionary with sentences as keys
and their counts as values. If return_dict is False, returns the total count of sentences.
Raises:
AssertionError: If ordered is True but return_dict is False.
"""
# examples
import bkit
text = 'āϤā§āĻŽāĻŋ āĻā§āĻĨāĻžā§ āĻĨāĻžāĻ? āĻĸāĻžāĻāĻž āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ\n āϰāĻžāĻāϧāĻžāύā§āĨ¤ āĻāĻŋ āĻ
āĻŦāϏā§āĻĨāĻž āϤāĻžāϰ! ⧧⧍/ā§Ļā§Š/⧍ā§Ļ⧍⧍ āϤāĻžāϰāĻŋāĻā§ āϏ⧠ā§Ē/āĻ āĻ āĻŋāĻāĻžāύāĻžā§ āĻāĻŋā§ā§ ⧧⧍,ā§Šā§Ēā§Ģ.ā§¨ā§Š āĻāĻžāĻāĻž āĻĻāĻŋā§ā§āĻāĻŋāϞāĨ¤'
count = bkit.analysis.count_sentences(text)
print(count)
# >>> 5
count = bkit.analysis.count_sentences(text, return_dict=True, ordered=True)
print(count)
# >>> {'āϤā§āĻŽāĻŋ āĻā§āĻĨāĻžā§ āĻĨāĻžāĻ?': 1, 'āĻĸāĻžāĻāĻž āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ\n': 1, 'āϰāĻžāĻāϧāĻžāύā§āĨ¤': 1, 'āĻāĻŋ āĻ
āĻŦāϏā§āĻĨāĻž āϤāĻžāϰ!': 1, '⧧⧍/ā§Ļā§Š/⧍ā§Ļ⧍⧍ āϤāĻžāϰāĻŋāĻā§ āϏ⧠ā§Ē/āĻ āĻ āĻŋāĻāĻžāύāĻžā§ āĻāĻŋā§ā§ ⧧⧍,ā§Šā§Ēā§Ģ.ā§¨ā§Š āĻāĻžāĻāĻž āĻĻāĻŋā§ā§āĻāĻŋāϞāĨ¤': 1}
Lemmatization is implemented based on our this paper BanLemma: A Word Formation Dependent Rule and Dictionary Based Bangla Lemmatizer
Lemmatize a given text. Generally expects the text to be a sentence.
import bkit
text = 'āĻĒā§āĻĨāĻŋāĻŦā§āϰ āĻāύāϏāĻāĻā§āϝāĻž ā§Ž āĻŦāĻŋāϞāĻŋā§āύā§āϰ āĻāĻŋāĻā§ āĻāĻŽ'
lemmatized = bkit.lemmatizer.lemmatize(text)
print(lemmatized)
# >>> āĻĒā§āĻĨāĻŋāĻŦā§ āĻāύāϏāĻāĻā§āϝāĻž ā§Ž āĻŦāĻŋāϞāĻŋā§āύ āĻāĻŋāĻā§ āĻāĻŽ
Lemmatize a word given the PoS information.
import bkit
text = 'āĻĒā§āĻĨāĻŋāĻŦā§āϰ'
lemmatized = bkit.lemmatizer.lemmatize_word(text, 'noun')
print(lemmatized)
# >>> āĻĒā§āĻĨāĻŋāĻŦā§
Stemming is the process of reducing words to their base or root form. Our implementation achieves this by conditionally stripping away predefined prefixes and suffixes from each word.
import bkit
stemmer = bkit.stemmer.SimpleStemmer()
stemmer.word_stemer('āύāĻāϰāĻŦāĻžāϏā§')
# >>> āύāĻāϰ
import bkit
stemmer = bkit.stemmer.SimpleStemmer()
stemmer.sentence_stemer('āĻŦāĻŋāĻā§āϞ⧠āϰā§āĻĻ āĻāĻŋāĻā§āĻāĻž āĻāĻŽā§āĻā§āĨ¤')
# >>> āĻŦāĻŋāĻā§āϞ āϰā§āĻĻ āĻāĻŋāĻā§ āĻāĻŽ
Tokenize a given text. The bkit.tokenizer
module is used to tokenizer text into tokens. It supports three types of tokenization.
Tokenize text into words. Also separates some punctuations including comma, danda (āĨ¤), question mark, etc.
import bkit
text = 'āϤā§āĻŽāĻŋ āĻā§āĻĨāĻžā§ āĻĨāĻžāĻ? āĻĸāĻžāĻāĻž āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ āϰāĻžāĻāϧāĻžāύā§āĨ¤ āĻāĻŋ āĻ
āĻŦāϏā§āĻĨāĻž āϤāĻžāϰ! ⧧⧍/ā§Ļā§Š/⧍ā§Ļ⧍⧍ āϤāĻžāϰāĻŋāĻā§ āϏ⧠ā§Ē/āĻ āĻ āĻŋāĻāĻžāύāĻžā§ āĻāĻŋā§ā§ ⧧⧍,ā§Šā§Ēā§Ģ āĻāĻžāĻāĻž āĻĻāĻŋā§ā§āĻāĻŋāϞāĨ¤'
tokens = bkit.tokenizer.tokenize(text)
print(tokens)
# >>> ['āϤā§āĻŽāĻŋ', 'āĻā§āĻĨāĻžā§', 'āĻĨāĻžāĻ', '?', 'āĻĸāĻžāĻāĻž', 'āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ', 'āϰāĻžāĻāϧāĻžāύā§', 'āĨ¤', 'āĻāĻŋ', 'āĻ
āĻŦāϏā§āĻĨāĻž', 'āϤāĻžāϰ', '!', '⧧⧍/ā§Ļā§Š/⧍ā§Ļ⧍⧍', 'āϤāĻžāϰāĻŋāĻā§', 'āϏā§', 'ā§Ē/āĻ', 'āĻ āĻŋāĻāĻžāύāĻžā§', 'āĻāĻŋā§ā§', '⧧⧍,ā§Šā§Ēā§Ģ', 'āĻāĻžāĻāĻž', 'āĻĻāĻŋā§ā§āĻāĻŋāϞ', 'āĨ¤']
Tokenize text into words and any punctuation.
import bkit
text = 'āϤā§āĻŽāĻŋ āĻā§āĻĨāĻžā§ āĻĨāĻžāĻ? āĻĸāĻžāĻāĻž āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ āϰāĻžāĻāϧāĻžāύā§āĨ¤ āĻāĻŋ āĻ
āĻŦāϏā§āĻĨāĻž āϤāĻžāϰ! ⧧⧍/ā§Ļā§Š/⧍ā§Ļ⧍⧍ āϤāĻžāϰāĻŋāĻā§ āϏ⧠ā§Ē/āĻ āĻ āĻŋāĻāĻžāύāĻžā§ āĻāĻŋā§ā§ ⧧⧍,ā§Šā§Ēā§Ģ āĻāĻžāĻāĻž āĻĻāĻŋā§ā§āĻāĻŋāϞāĨ¤'
tokens = bkit.tokenizer.tokenize_word_punctuation(text)
print(tokens)
# >>> ['āϤā§āĻŽāĻŋ', 'āĻā§āĻĨāĻžā§', 'āĻĨāĻžāĻ', '?', 'āĻĸāĻžāĻāĻž', 'āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ', 'āϰāĻžāĻāϧāĻžāύā§', 'āĨ¤', 'āĻāĻŋ', 'āĻ
āĻŦāϏā§āĻĨāĻž', 'āϤāĻžāϰ', '!', '⧧⧍', '/', 'ā§Ļā§Š', '/', '⧍ā§Ļ⧍⧍', 'āϤāĻžāϰāĻŋāĻā§', 'āϏā§', 'ā§Ē', '/', 'āĻ', 'āĻ āĻŋāĻāĻžāύāĻžā§', 'āĻāĻŋā§ā§', '⧧⧍', ',', 'ā§Šā§Ēā§Ģ', 'āĻāĻžāĻāĻž', 'āĻĻāĻŋā§ā§āĻāĻŋāϞ', 'āĨ¤']
Tokenize text into sentences.
import bkit
text = 'āϤā§āĻŽāĻŋ āĻā§āĻĨāĻžā§ āĻĨāĻžāĻ? āĻĸāĻžāĻāĻž āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ āϰāĻžāĻāϧāĻžāύā§āĨ¤ āĻāĻŋ āĻ
āĻŦāϏā§āĻĨāĻž āϤāĻžāϰ! ⧧⧍/ā§Ļā§Š/⧍ā§Ļ⧍⧍ āϤāĻžāϰāĻŋāĻā§ āϏ⧠ā§Ē/āĻ āĻ āĻŋāĻāĻžāύāĻžā§ āĻāĻŋā§ā§ ⧧⧍,ā§Šā§Ēā§Ģ āĻāĻžāĻāĻž āĻĻāĻŋā§ā§āĻāĻŋāϞāĨ¤'
tokens = bkit.tokenizer.tokenize_sentence(text)
print(tokens)
# >>> ['āϤā§āĻŽāĻŋ āĻā§āĻĨāĻžā§ āĻĨāĻžāĻ?', 'āĻĸāĻžāĻāĻž āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ āϰāĻžāĻāϧāĻžāύā§āĨ¤', 'āĻāĻŋ āĻ
āĻŦāϏā§āĻĨāĻž āϤāĻžāϰ!', '⧧⧍/ā§Ļā§Š/⧍ā§Ļ⧍⧍ āϤāĻžāϰāĻŋāĻā§ āϏ⧠ā§Ē/āĻ āĻ āĻŋāĻāĻžāύāĻžā§ āĻāĻŋā§ā§ ⧧⧍,ā§Šā§Ēā§Ģ āĻāĻžāĻāĻž āĻĻāĻŋā§ā§āĻāĻŋāϞāĨ¤']
Predicts the tags of the Named Entities of a given text.
import bkit
text = 'āϤā§āĻŽāĻŋ āĻā§āĻĨāĻžā§ āĻĨāĻžāĻ? āĻĸāĻžāĻāĻž āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ āϰāĻžāĻāϧāĻžāύā§āĨ¤ āĻāĻŋ āĻ
āĻŦāϏā§āĻĨāĻž āϤāĻžāϰ! ⧧⧍/ā§Ļā§Š/⧍ā§Ļ⧍⧍ āϤāĻžāϰāĻŋāĻā§ āϏ⧠ā§Ē/āĻ āĻ āĻŋāĻāĻžāύāĻžā§ āĻāĻŋā§ā§ ⧧⧍,ā§Šā§Ēā§Ģ.ā§¨ā§Š āĻāĻžāĻāĻž āĻĻāĻŋā§ā§āĻāĻŋāϞāĨ¤'
ner = bkit.ner.Infer('ner-noisy-label')
predictions = ner(text)
print(predictions)
# >>> [('āϤā§āĻŽāĻŋ', 'O', 0.9998692), ('āĻā§āĻĨāĻžā§', 'O', 0.99988306), ('āĻĨāĻžāĻ?', 'O', 0.99983954), ('āĻĸāĻžāĻāĻž', 'B-GPE', 0.99891424), ('āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ', 'B-GPE', 0.99710876), ('āϰāĻžāĻāϧāĻžāύā§āĨ¤', 'O', 0.9995414), ('āĻāĻŋ', 'O', 0.99989176), ('āĻ
āĻŦāϏā§āĻĨāĻž', 'O', 0.99980336), ('āϤāĻžāϰ!', 'O', 0.99983263), ('⧧⧍/ā§Ļā§Š/⧍ā§Ļ⧍⧍', 'B-D&T', 0.97921854), ('āϤāĻžāϰāĻŋāĻā§', 'O', 0.9271435), ('āϏā§', 'O', 0.99934834), ('ā§Ē/āĻ', 'B-NUM', 0.8297553), ('āĻ āĻŋāĻāĻžāύāĻžā§', 'O', 0.99728775), ('āĻāĻŋā§ā§', 'O', 0.9994825), ('⧧⧍,ā§Šā§Ēā§Ģ.ā§¨ā§Š', 'B-NUM', 0.99740463), ('āĻāĻžāĻāĻž', 'B-UNIT', 0.99914896), ('āĻĻāĻŋā§ā§āĻāĻŋāϞāĨ¤', 'O', 0.9998908)]
It takes the model's output and visualizes the NER tag for every word in the text.
import bkit
text = 'āϤā§āĻŽāĻŋ āĻā§āĻĨāĻžā§ āĻĨāĻžāĻ? āĻĸāĻžāĻāĻž āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ āϰāĻžāĻāϧāĻžāύā§āĨ¤ āĻāĻŋ āĻ
āĻŦāϏā§āĻĨāĻž āϤāĻžāϰ! ⧧⧍/ā§Ļā§Š/⧍ā§Ļ⧍⧍ āϤāĻžāϰāĻŋāĻā§ āϏ⧠ā§Ē/āĻ āĻ āĻŋāĻāĻžāύāĻžā§ āĻāĻŋā§ā§ ⧧⧍,ā§Šā§Ēā§Ģ.ā§¨ā§Š āĻāĻžāĻāĻž āĻĻāĻŋā§ā§āĻāĻŋāϞāĨ¤'
ner = bkit.ner.Infer('ner-noisy-label')
predictions = ner(text)
bkit.ner.visualize(predictions)
Predicts the tags of the parts of speech of a given text.
import bkit
text = 'āĻāϤ āĻāĻŋāĻā§āĻĻāĻŋāύ āϧāϰā§āĻ āĻā§āĻŦāĻžāϞāĻžāύāĻŋāĻšā§āύ āĻ
āĻŦāϏā§āĻĨāĻžā§ āĻāĻāĻāĻŋ āĻā§āĻ āĻŽāĻžāĻ āϧāϰāĻžāϰ āύā§āĻāĻžā§ ā§§ā§Ģā§Ļ āĻāύ āϰā§āĻšāĻŋāĻā§āĻāĻž āĻāύā§āĻĻāĻžāĻŽāĻžāύ āϏāĻžāĻāϰ⧠āĻāĻžāϏāĻŽāĻžāύ āĻ
āĻŦāϏā§āĻĨāĻžā§ āϰā§ā§āĻā§ āĨ¤'
pos = bkit.pos.Infer('pos-noisy-label')
predictions = pos(text)
print(predictions)
# >>> [('āĻāϤ', 'ADJ', 0.98674506), ('āĻāĻŋāĻā§āĻĻāĻŋāύ', 'NNC', 0.97954935), ('āϧāϰā§āĻ', 'PP', 0.96124), ('āĻā§āĻŦāĻžāϞāĻžāύāĻŋāĻšā§āύ', 'ADJ', 0.93195957), ('āĻ
āĻŦāϏā§āĻĨāĻžā§', 'NNC', 0.9960413), ('āĻāĻāĻāĻŋ', 'QF', 0.9912915), ('āĻā§āĻ', 'ADJ', 0.9810739), ('āĻŽāĻžāĻ', 'NNC', 0.97365385), ('āϧāϰāĻžāϰ', 'NNC', 0.96641904), ('āύā§āĻāĻžā§', 'NNC', 0.99680626), ('ā§§ā§Ģā§Ļ', 'QF', 0.996005), ('āĻāύ', 'NNC', 0.99434316), ('āϰā§āĻšāĻŋāĻā§āĻāĻž', 'NNP', 0.9141038), ('āĻāύā§āĻĻāĻžāĻŽāĻžāύ', 'NNP', 0.9856694), ('āϏāĻžāĻāϰā§', 'NNP', 0.7122378), ('āĻāĻžāϏāĻŽāĻžāύ', 'ADJ', 0.93841994), ('āĻ
āĻŦāϏā§āĻĨāĻžā§', 'NNC', 0.9965629), ('āϰā§ā§āĻā§', 'VF', 0.99680847), ('āĨ¤', 'PUNCT', 0.9963098)]
"It takes the model's output and visualizes the Part-of-Speech tag for every word in the text.
import bkit
text = 'āĻāϤ āĻāĻŋāĻā§āĻĻāĻŋāύ āϧāϰā§āĻ āĻā§āĻŦāĻžāϞāĻžāύāĻŋāĻšā§āύ āĻ
āĻŦāϏā§āĻĨāĻžā§ āĻāĻāĻāĻŋ āĻā§āĻ āĻŽāĻžāĻ āϧāϰāĻžāϰ āύā§āĻāĻžā§ ā§§ā§Ģā§Ļ āĻāύ āϰā§āĻšāĻŋāĻā§āĻāĻž āĻāύā§āĻĻāĻžāĻŽāĻžāύ āϏāĻžāĻāϰ⧠āĻāĻžāϏāĻŽāĻžāύ āĻ
āĻŦāϏā§āĻĨāĻžā§ āϰā§ā§āĻā§ āĨ¤'
pos = bkit.pos.Infer('pos-noisy-label')
predictions = pos(text)
bkit.pos.visualize(predictions)
Predicts the shallow parsing tags of a given text.
import bkit
text = 'āϤā§āĻŽāĻŋ āĻā§āĻĨāĻžā§ āĻĨāĻžāĻ? āĻĸāĻžāĻāĻž āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ āϰāĻžāĻāϧāĻžāύā§āĨ¤ āĻāĻŋ āĻ
āĻŦāϏā§āĻĨāĻž āϤāĻžāϰ! ⧧⧍/ā§Ļā§Š/⧍ā§Ļ⧍⧍ āϤāĻžāϰāĻŋāĻā§ āϏ⧠ā§Ē/āĻ āĻ āĻŋāĻāĻžāύāĻžā§ āĻāĻŋā§ā§ ⧧⧍,ā§Šā§Ēā§Ģ.ā§¨ā§Š āĻāĻžāĻāĻž āĻĻāĻŋā§ā§āĻāĻŋāϞāĨ¤'
shallow = bkit.shallow.Infer(pos_model='pos-noisy-label')
predictions = shallow(text)
print(predictions)
# >>> (S (VP (NP (PRO āϤā§āĻŽāĻŋ)) (VP (ADVP (ADV āĻā§āĻĨāĻžā§)) (VF āĻĨāĻžāĻ))) (NP (NNP ?) (NNP āĻĸāĻžāĻāĻž) (NNC āĻŦāĻžāĻāϞāĻžāĻĻā§āĻļā§āϰ)) (ADVP (ADV āϰāĻžāĻāϧāĻžāύā§)) (NP (NP (NP (NNC āĨ¤)) (NP (PRO āĻāĻŋ))) (NP (QF āĻ
āĻŦāϏā§āĻĨāĻž) (NNC āϤāĻžāϰ)) (NP (PRO !))) (NP (NP (QF ⧧⧍/ā§Ļā§Š/⧍ā§Ļ⧍⧍) (NNC āϤāĻžāϰāĻŋāĻā§)) (VNF āϏā§) (NP (QF ā§Ē/āĻ) (NNC āĻ āĻŋāĻāĻžāύāĻžā§))) (VF āĻāĻŋā§ā§))
It converts model predictions into an interactive shallow parsing Tree for clear and intuitive analysis
from bkit import shallow
text = "āĻāĻžāϤāĻžāϰ āĻŦāĻŋāĻļā§āĻŦāĻāĻžāĻĒā§ āĻāϰā§āĻā§āύā§āĻāĻŋāύāĻžāϰ āĻŦāĻŋāĻļā§āĻŦāĻāĻžāĻĒ āĻā§ā§ āĻŽāĻžāϰā§āϤāĻŋāύā§āĻā§āϰ āĻ
āĻŦāĻĻāĻžāύ āĻ
āύā§āĻāĨ¤"
shallow = shallow.Infer(pos_model='pos-noisy-label')
predictions = shallow(text)
shallow.visualize(predictions)
Predicts the dependency parsing tags of a given text.
from bkit import dependency
text = "āĻāĻžāϤāĻžāϰ āĻŦāĻŋāĻļā§āĻŦāĻāĻžāĻĒā§ āĻāϰā§āĻā§āύā§āĻāĻŋāύāĻžāϰ āĻŦāĻŋāĻļā§āĻŦāĻāĻžāĻĒ āĻā§ā§ āĻŽāĻžāϰā§āϤāĻŋāύā§āĻā§āϰ āĻ
āĻŦāĻĻāĻžāύ āĻ
āύā§āĻāĨ¤"
dep =dependency.Infer('dependency-parsing')
predictions = dep(text)
print(predictions)
# >>>[{'text': 'āĻāĻžāϤāĻžāϰ āĻŦāĻŋāĻļā§āĻŦāĻāĻžāĻĒā§ āĻāϰā§āĻā§āύā§āĻāĻŋāύāĻžāϰ āĻŦāĻŋāĻļā§āĻŦāĻāĻžāĻĒ āĻā§ā§ āĻŽāĻžāϰā§āϤāĻŋāύā§āĻā§āϰ āĻ
āĻŦāĻĻāĻžāύ āĻ
āύā§āĻ āĨ¤', 'predictions': [{'token_start': 1, 'token_end': 0, 'label': 'compound'}, {'token_start': 7, 'token_end': 1, 'label': 'obl'}, {'token_start': 4, 'token_end': 2, 'label': 'nmod'}, {'token_start': 4, 'token_end': 3, 'label': 'nmod'}, {'token_start': 7, 'token_end': 4, 'label': 'obl'}, {'token_start': 6, 'token_end': 5, 'label': 'nmod'}, {'token_start': 7, 'token_end': 6, 'label': 'nsubj'}, {'token_start': 7, 'token_end': 7, 'label': 'root'}, {'token_start': 7, 'token_end': 8, 'label': 'punct'}]}]
It converts model predictions into an interactive dependency graph for clear and intuitive analysis
from bkit import dependency
text = "āĻāĻžāϤāĻžāϰ āĻŦāĻŋāĻļā§āĻŦāĻāĻžāĻĒā§ āĻāϰā§āĻā§āύā§āĻāĻŋāύāĻžāϰ āĻŦāĻŋāĻļā§āĻŦāĻāĻžāĻĒ āĻā§ā§ āĻŽāĻžāϰā§āϤāĻŋāύā§āĻā§āϰ āĻ
āĻŦāĻĻāĻžāύ āĻ
āύā§āĻāĨ¤"
dep = dependency.Infer('dependency-parsing')
predictions = dep(text)
dependency.visualize(predictions)
Predicts the coreferent clusters of a given text.
import bkit
text = "āϤāĻžāϰāĻžāϏā§āύā§āĻĻāϰ⧠( ā§§ā§Žā§ā§Ž - ⧧⧝ā§Ēā§Ž ) āĻ
āĻāĻŋāύā§āϤā§āϰ⧠āĨ¤ ā§§ā§Žā§Žā§Ē āϏāĻžāϞ⧠āĻŦāĻŋāύā§āĻĻāĻŋāύā§āϰ āϏāĻšāĻžāϝāĻŧāϤāĻžāϝāĻŧ āϏā§āĻāĻžāϰ āĻĨāĻŋāϝāĻŧā§āĻāĻžāϰ⧠āϝā§āĻāĻĻāĻžāύā§āϰ āĻŽāĻžāϧā§āϝāĻŽā§ āϤāĻŋāύāĻŋ āĻ
āĻāĻŋāύāϝāĻŧ āĻļā§āϰ⧠āĻāϰā§āύ āĨ¤ āĻĒā§āϰāĻĨāĻŽā§ āϤāĻŋāύāĻŋ āĻāĻŋāϰāĻŋāĻļāĻāύā§āĻĻā§āϰ āĻā§āώā§āϰ āĻā§āϤāύā§āϝāϞā§āϞāĻž āύāĻžāĻāĻā§ āĻāĻ āĻŦāĻžāϞāĻ āĻ āϏāϰāϞāĻž āύāĻžāĻāĻā§ āĻā§āĻĒāĻžāϞ āĻāϰāĻŋāϤā§āϰ⧠āĻ
āĻāĻŋāύāϝāĻŧ āĻāϰā§āύ āĨ¤"
coref = bkit.coref.Infer('coref')
predictions = coref(text)
print(predictions)
# >>> {'text': ['āϤāĻžāϰāĻžāϏā§āύā§āĻĻāϰā§', '(', 'ā§§ā§Žā§ā§Ž', '-', '⧧⧝ā§Ēā§Ž', ')', 'āĻ
āĻāĻŋāύā§āϤā§āϰā§', 'āĨ¤', 'ā§§ā§Žā§Žā§Ē', 'āϏāĻžāϞā§', 'āĻŦāĻŋāύā§āĻĻāĻŋāύā§āϰ', 'āϏāĻšāĻžāϝāĻŧāϤāĻžāϝāĻŧ', 'āϏā§āĻāĻžāϰ', 'āĻĨāĻŋāϝāĻŧā§āĻāĻžāϰā§', 'āϝā§āĻāĻĻāĻžāύā§āϰ', 'āĻŽāĻžāϧā§āϝāĻŽā§', 'āϤāĻŋāύāĻŋ', 'āĻ
āĻāĻŋāύāϝāĻŧ', 'āĻļā§āϰā§', 'āĻāϰā§āύ', 'āĨ¤', 'āĻĒā§āϰāĻĨāĻŽā§', 'āϤāĻŋāύāĻŋ', 'āĻāĻŋāϰāĻŋāĻļāĻāύā§āĻĻā§āϰ', 'āĻā§āώā§āϰ', 'āĻā§āϤāύā§āϝāϞā§āϞāĻž', 'āύāĻžāĻāĻā§', 'āĻāĻ', 'āĻŦāĻžāϞāĻ', 'āĻ', 'āϏāϰāϞāĻž', 'āύāĻžāĻāĻā§', 'āĻā§āĻĒāĻžāϞ', 'āĻāϰāĻŋāϤā§āϰā§', 'āĻ
āĻāĻŋāύāϝāĻŧ', 'āĻāϰā§āύ', 'āĨ¤'], 'mention_indices': {0: [{'start_token': 0, 'end_token': 0}, {'start_token': 6, 'end_token': 6}, {'start_token': 10, 'end_token': 10}, {'start_token': 16, 'end_token': 16}, {'start_token': 22, 'end_token': 22}]}}
It takes the model's output and creates an interactive visualization to clearly depict coreference resolution, highlighting the relationships between entities in the text
from bkit import coref
text = "āϤāĻžāϰāĻžāϏā§āύā§āĻĻāϰ⧠( ā§§ā§Žā§ā§Ž - ⧧⧝ā§Ēā§Ž ) āĻ
āĻāĻŋāύā§āϤā§āϰ⧠āĨ¤ ā§§ā§Žā§Žā§Ē āϏāĻžāϞ⧠āĻŦāĻŋāύā§āĻĻāĻŋāύā§āϰ āϏāĻšāĻžāϝāĻŧāϤāĻžāϝāĻŧ āϏā§āĻāĻžāϰ āĻĨāĻŋāϝāĻŧā§āĻāĻžāϰ⧠āϝā§āĻāĻĻāĻžāύā§āϰ āĻŽāĻžāϧā§āϝāĻŽā§ āϤāĻŋāύāĻŋ āĻ
āĻāĻŋāύāϝāĻŧ āĻļā§āϰ⧠āĻāϰā§āύ āĨ¤ āĻĒā§āϰāĻĨāĻŽā§ āϤāĻŋāύāĻŋ āĻāĻŋāϰāĻŋāĻļāĻāύā§āĻĻā§āϰ āĻā§āώā§āϰ āĻā§āϤāύā§āϝāϞā§āϞāĻž āύāĻžāĻāĻā§ āĻāĻ āĻŦāĻžāϞāĻ āĻ āϏāϰāϞāĻž āύāĻžāĻāĻā§ āĻā§āĻĒāĻžāϞ āĻāϰāĻŋāϤā§āϰ⧠āĻ
āĻāĻŋāύāϝāĻŧ āĻāϰā§āύ āĨ¤"
coref = coref.Infer('coref')
predictions = coref(text)
coref.visualize(predictions)
FAQs
A python tool for Bangla text processing
We found that bkit demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Django has updated its security policies to reject AI-generated vulnerability reports that include fabricated or unverifiable content.
Security News
ECMAScript 2025 introduces Iterator Helpers, Set methods, JSON modules, and more in its latest spec update approved by Ecma in June 2025.
Security News
A new Node.js homepage button linking to paid support for EOL versions has sparked a heated discussion among contributors and the wider community.