Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

text-chunker

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

text-chunker

A Python class that takes in long text as input and divides it into shorter chunks no longer than a specified length.

  • 0.2.2
  • PyPI
  • Socket score

Maintainers
1

TextChunker

TextChunker is a Python class that takes in long text as input and divides it into shorter chunks no longer than a specified length. The purpose of this project is to provide a simple and useful tool for data processing tasks such as natural language processing and information extraction.

Install

pip intsll text_chunker

Usage

You can use the TextChunker class in your Python code as follows:

from text_chunker import TextChunker

# Create a new TextChunker object with a maximum chunk length of 50 characters
chunker = TextChunker(maxlen=1000)

# Chunk a long text string into smaller chunks
text = "This is a long text string..."
for chunk in chunker.chunk(text):
    print(chunk)

The chunk method attempts to split paragraphs first while keeping chunk length below maxlen. If a paragraph is longer than maxlen, the method attempts to split the paragraph into sentences. If a sentence is longer than maxlen, it is split into smaller chunks no longer than maxlen.

There are also functions called paragraphs and sentences that divide the text into paragraphs and sentences, respectively.

from text_chunker import paragraphs

for p in paragraphs(text):
    print(p)
from text_chunker import sentences

for s in sentences(text):
    print(s)

The sentences function utilizes a tokenizer from the nltk library.

License

This project is distributed under the MIT license. See the LICENSE file for details.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc