Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

keytext

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

keytext

Keyword based text extraction Pacakage (keytext)

  • 0.5
  • PyPI
  • Socket score

Maintainers
1

keyword based text extraction toolkit (keytext)

What is it?

keytext is an all-in-one versatile and efficient Python package designed for keyword-based text search, manipulation, and data cleansing. Whether you need to extract contextual information around specific keywords, remove unwanted terms from texts and dataframes, precisely locate the positions of keywords within a Pandas DataFrame, replacing single or a set of keywords, keytext is your indispensable toolkit for advanced robust toolkit text analysis and data management.

Main Features

Here are just a few of the things that keytext does well:

  • Keyword Positioning: Locate the exact start and end positions of a keyword within a given text, facilitating precise information retrieval.
  • Keyword Frequency: Extract the count of a keyword or set of keywords within a given text, facilitating precise information retrieval.
  • Replacing keyword: Replace a single keyword or list of keywords with its corresponding replacement(s) in the given text.
  • Contextual Extraction: Extract left and right texts, characters, words, and sentences surrounding a specified keyword as well as words between .
  • Flexible Configuration: Customize the number of left and right characters, words, or sentences to tailor the extraction to your specific requirements.
  • Text Between Keywords: Extract the text between two occurrences of the same keyword, offering deeper insights into the context of your data.
  • Word Removal: Efficiently remove a list of specified words from texts, enhancing text cleanliness and relevance.
  • Dataframe Cleansing: Seamlessly remove unwanted words from text columns in Pandas DataFrames, ensuring data integrity.
  • Cell Positioning in DataFrame: Identify the row and column positions of a keyword within a Pandas DataFrame, enabling precise data manipulation.
  • Random Pattern Search: Check for list of arbitrary patterns or regular expressions within the text data of a DataFrame, uncovering hidden insights and potential correlations.
  • Easy Integration: Integrate keytext into your Python projects effortlessly, enhancing your text processing and data cleansing workflows.

Installation Procedure

PyPI
pip install keytext

Dependencies:

  • Regex - Adds support to itterating and finding keywords from the text and dataframe
  • Pandas - Adds support to deal with dataframe

Functionalities (with parameters description):

keytext.keywords_occurrences(keywords, text)
- text (str): The input text
- keywords (str or list): The keyword or a list of keywords to count occurrences for
- Returns a dictionary mapping each keyword to its frequency in the text
keytext.replace_keywords(keywords, replacements, text)
- text (str): The input text
- keyword (str or list): The keyword or a list of keywords to be replaced
- replacement (str or list): The replacement string or a list of replacement strings corresponding to the keyword(s)
- Returns the text with replacements
keytext.keypos_text(keyword, text)
- text (str): The input text
- keyword (str): The keyword need to be searched
- Return all starting and ending position of the keyword from a text
- Output will be in list of tuples
keytext.extract_sents(keyword, text, format)
- text (str): The input text
- keyword (str): The keyword need to be searched in the sentences of given text
- format (str): By default format is l, that means list of sentences. If we pass p then the outpt format will be paragraph.
- This function extract all the sentences from a giuven text that contain the keyword
keytext.extract_words(keyword, text, left, right)
- text (str): The input text
- keyword (str): The keyword need to be searched in the given text
- left (int): The number of words from the left side of the keyword
- right (int): The number of words from the right side of the keyword
- This function extract the neighbourhood words of the keyword from a given text.
- In case of left_w = 0, right_w = n it will provide n number of words from the right side of the keyword
- In case of left_w = m, right_w = 0 it will provide m number of words from the left side of the keyword
- In case of left_w = m, right_w = n it will provide m left words and n right words of the keyword
keytext.extract_chr(keyword, text, left_chr, right_chr)
- text (str): The input text
- keyword (str): The keyword need to be searched in the given text
- left_chr (int): The number of charecters from the left side of the keyword
- right_chr (int): The number of charecters from the right side of the keyword
- This function extract the neighbourhood charecters of the keyword from a given text.
- In case of left_chr = 0, right_chr = n it will provide n number of charecters from the right side of the keyword
- In case of left_chr = m, right_chr = 0 it will provide m number of charecters from the left side of the keyword
- In case of left_chr = m, right_chr = n it will provide m left charecters and n right charecters of the keyword
keytext.left_texts(keyword, text, occurrence)
- text (str): The input text
- keyword (str): The keyword need to be searched in the given text
- occurrence (int or str): The number of charecters from the left side of the keyword, Occurene should be 1,2,...,n,'all'
- This function will return the left side of the keyword i.e. from the keyword to beginning of the text based on all occurence of keyword
- If we pass the 1 or 2 in occurence then it will return the left side text of 1st or 2nd occurence of the keyword from a text
- Provid the output in list format if occurence is "all"
keytext.right_texts(keyword, text, occurrence)
- text (str): The input text
- keyword (str): The keyword need to be searched in the given text
- occurrence (int or str): The number of charecters from the left side of the keyword, Occurene should be 1,2,...,n,'all'
- occurence means the repeation of the keyword in  text
- This function will return the right side of the keyword i.e. from the keyword to ending of the text based on all occurence of keyword
- If we pass the 1 in occurence then it will return the right side text of 1st occurence of the keyword from a text
- Provid the output in list format if occurence is "all"
keytext.between_fixed_keyword(keyword, text)
- text (str): The input text
- keyword (str): The keyword replicating in given text
- Provide the part of the text between two same keyword
- Output will come in list format
keytext.between_distinct_keywords(keyword_start, keyword_end, text, keyword_start_occurence, keyword_end_occurence)
- text (str): The input text
- keyword_start (str): The starting keyword
- keyword_end (str): The ending keyword that should be different from strating keyword
- keyword_start_occurence (int): indicates the the repeatition of the starting keyword in given string
- keyword_end_occurence (int): indicates the the repeatition of the ending keyword in given string
- Provide the part of the text between two distinct keyword
- Output will come in list format
- For getting all snap texts in list format pass keyword_start_occurence = 0 and keyword_end_occurence = 0
keytext.text_keyword_remover(remover_list, text, replaced_by)
- text (str): The input text
- remover_list (list): List of keywords along with regex patterns need to be removed
- replaced_by (str): Replacing the unwanted list of keyword or patterns with some special charecters like space(" ")
- Non alphanumeric charecters need to be write in regex format
- Return the text after removing the unranted keyword or patterns

keytext.text_pattern_finder(pattern_list, text)

- text (str): The input text
- pattern_list (list): List of regex patterns need to be searched within the text
- It will return the matched word with location

keytext.keypos_df(keyword, dataframe)

- dataframe (dataframe): The input table
- keyword (str): The keyword need to be searched in the dataframe
- Return all cells position of the keyword from a giuven dataframe
- Output will be in list of tuples

keytext.dataframe_keyword_remover(remover_list, dataframe, replaced_by)

- dataframe (dataframe): The input table
- remover_list (list): List of keywords along with regex patterns need to be removed
- replaced_by (str): Replacing the unwanted list of keyword or patterns with some special charecters like space(" ")
- This function remove the keyword from the dataframe
- Non alphanumeric charecters need to be write in regex format

keytext.dataframe_pattern_finder(pattern, dataframe)

- dataframe (dataframe): The input table
- pattern (str): List of regex patterns need to be searched within the dataframe
- This function find the list of regex patterns from the dataframe
- It will return the matched word with cell identity

Contributing to keytext

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome. Feel free to ask questions on the mailing list

Change Log

0.1 (03/01/2024)

  • First Release

0.2 (03/01/2024)

  • Second Release

0.3 (03/01/2024)

  • Third Release

0.4 (04/01/2024)

  • Fourth Release

0.5 (24/01/2024)

  • Fifth Release

Keywords

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc