Security News
The Risks of Misguided Research in Supply Chain Security
Snyk's use of malicious npm packages for research raises ethical concerns, highlighting risks in public deployment, data exfiltration, and unauthorized testing.
Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.
Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.
Hint: If you're interested in state-of-the-art voice solutions you might also want to have a look at Linguflex, the original project from which stream2sentence is spun off. It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.
pip install stream2sentence
Pass a generator of characters or text chunks to generate_sentences()
to get a generator of sentences in return.
Here's a basic example:
from stream2sentence import generate_sentences
# Dummy generator for demonstration
def dummy_generator():
yield "This is a sentence. And here's another! Yet, "
yield "there's more. This ends now."
for sentence in generate_sentences(dummy_generator()):
print(sentence)
This will output:
This is a sentence.
And here's another!
Yet, there's more.
This ends now.
One main use case of this library is enable fast text to speech synthesis in the context of character feeds generated from large language models: this library enables fastest possible access to a complete sentence or sentence fragment (using the quick_yield_single_sentence_fragment flag) that then can be synthesized in realtime. The usage of this is demonstrated in the test_stream_from_llm.py file in the tests directory.
The generate_sentences()
function offers various parameters to fine-tune its behavior:
generator: Iterator[str]
context_size: int = 12
context_size_look_overhead: int = 12
context_size
for sentence splitting.minimum_sentence_length: int = 10
minimum_first_fragment_length: int = 10
These parameters control how quickly and frequently the generator yields sentence fragments:
quick_yield_single_sentence_fragment: bool = False
quick_yield_for_all_sentences: bool = False
quick_yield_single_sentence_fragment
to True.quick_yield_every_fragment: bool = False
quick_yield_for_all_sentences
and quick_yield_single_sentence_fragment
to True.cleanup_text_links: bool = False
cleanup_text_emojis: bool = False
tokenize_sentences: Callable = None
tokenizer
.tokenizer: str = "nltk"
language: str = "en"
log_characters: bool = False
sentence_fragment_delimiters: str = ".?!;:,\n…)]}。-"
full_sentence_delimiters: str = ".?!\n…。"
force_first_fragment_after_words: int = 15
Any Contributions you make are welcome and greatly appreciated.
git checkout -b feature/AmazingFeature
).git commit -m 'Add some AmazingFeature'
).git push origin feature/AmazingFeature
).This project is licensed under the MIT License. For more details, see the LICENSE
file.
Project created and maintained by Kolja Beigel.
FAQs
Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.
We found that stream2sentence demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Snyk's use of malicious npm packages for research raises ethical concerns, highlighting risks in public deployment, data exfiltration, and unauthorized testing.
Research
Security News
Socket researchers found several malicious npm packages typosquatting Chalk and Chokidar, targeting Node.js developers with kill switches and data theft.
Security News
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.