Normalize the text before natural language processing
Process FileMaker Pro Advanced's Database Design Report (DDR) to produce textual representations of the design objects for use with version control systems, text editors, etc.
Process text and propose tonality.
AlchemyLanguage is a collection of APIs that offer text analysis through natural language processing.
A Ruby interface to Enrycher text-processing API
git-scribe is a workflow tool for starting, writing, reviewing and publishing multiple forms of a book. it allows you to use asciidoc plain text markup to write, review and translate a work and provides a simple toolkit for generating common digital outputs for publishing - epub, mobi, pdf and html. it is also integrated into github functionality, letting you automate the publishing and collaboration process.
agenndy is a minimal text-based activity log (or personal agenda). It takes a text file which follows some very basic (but strict) rules and turns it into a CSV file (which includes times, activities and hours spent for each activity) suited for further processing. For the schema of the text-based agenda, check out the examples/ directory.
This Ruby gem leverages Machine Learning(ML) techniques to make predictions(forecasts) and classifications in various applications. It provides capabilities such as predicting next month's billing, forecasting upcoming sales orders, identifying patient's potential findings(like Diabetes), determining user approval status, classifying text, generating similarity scores, and making recommendations. It uses Python3 under the hood, powered by popular machine learning techniques including NLP(Natural Language Processing), Decision Tree, K-Nearest Neighbors and Logistic Regression, Random Forest and Linear Regression algorithms.
NoAccent is a Ruby gem designed to remove diacritic accents from text, providing cleaner and simpler text processing.
A library to create text processing pipelines.
This is a small class library of useful text processing routines I tend to use on my personal projects.
A simple multilingual tokenizer for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Use it for tokenization of German, English and French texts.
An encouraging process wrapper that texts a ship-it squirrel on completion.
A simple multilingual tokenizer for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Use it for tokenization of German, English and French texts.
A simple multilingual tokenizer for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Use it for tokenization of German, English and French texts.
Base32 is one of several base 32 transfer encodings. Base32 uses a 32-character set comprising the twenty-six upper-case letters A–Z, and the digits 2–7. Base32 is primarily used to encode binary data, but Base32 is also able to encode binary text like ASCII. Base32 is a notation for encoding arbitrary byte data using a restricted set of symbols that can be conveniently used by humans and processed by computers. Base32 consists of a symbol set made up of 32 different characters, as well as an algorithm for encoding arbitrary sequences of 8-bit bytes into the Base32 alphabet. Because more than one 5-bit Base32 symbol is needed to represent each 8-bit input byte, it also specifies requirements on the allowed lengths of Base32 strings (which must be multiples of 40 bits). The closely related Base64 system, in contrast, uses a set of 64 symbols.
Allows simple processing of russian strings - transliteration, numerals as text and HTML beautification
Process text and calculate RAKE.
A Jekyll plugin that automatically converts in-text references ([name](url)) to Wikipedia-style superscripted numbered references and generates a Reference section at the end of the document. Only documents updated after a configurable date are processed.
A Jekyll plugin that automatically converts in-text references ([name](url)) to Wikipedia-style superscripted numbered references and generates a Reference section at the end of the document. Only documents updated after a configurable date are processed.
A comprehensive Ruby gem that handles document processing, text extraction, and AI-powered analysis for PDF, Word, Excel, PowerPoint, images, archives, and more with a unified API. Includes agentic AI features for document analysis, summarization, and intelligent extraction.
== What's this? {ComicFury}[https://comicfury.com] is an excellent no-bullshit webcomic hosting site created and maintained by the legend Kyo. You should support them on {Patreon}[https://www.patreon.com/comicfury]! {Jekyll}[https://jekyllrb.com] is a highly regarded and widespread static site generator. It builds simple slowly-changing content into HTML files using templates. RageRender allows you to use your ComicFury templates to generate a static version of your webcomic site using Jekyll. You just supply your templates, comics and blogs, and RageRender will output a site that mimics your ComicFury site. Well, I say "mimics". Output is a static site, which means all of the interactive elements of ComicFury don't work. This includes comments, subscriptions, search, and comic management. === But why?! RageRender allows those of us who work on making changes to ComicFury site templates to test our changes before we put them live. With RageRender, you can edit your CSS, HTML templates and site settings before you upload them to ComicFury. This makes the process of testing changes quicker and makes it much more likely that you catch mistakes before any comic readers have a chance to see them. RageRender doesn't compete with the most excellent ComicFury (who's Patreon you should contribute to, as I do!) – you should continue to use ComicFury for all your day-to-day artistic rage management needs. But if you find yourself making changes to a site design, RageRender may be able to help you. == Getting started First, you need to have {Ruby}[https://www.ruby-lang.org/] and {Bundler}[https://bundle.io/] installed. The Jekyll site has {good guides on how to do that}[https://jekyllrb.com/docs/installation/] depending on your operating system. To set up a new site, open a terminal and type: mkdir mycomic && cd mycomic bundle init bundle add jekyll bundle add ragerender Now you can add comics! Add the image into an <tt>images</tt> folder: mkdir images cp 'cool comic.jpg' 'images/My first page.jpg' The file name of the image will be the title of your comic page. And that's it, you added your first comic! If you want to add an author note, create a text file in a folder called <tt>_comics</tt> that has the same file name, but with a <tt>.md</tt> extension: mkdir _comics echo "Check out my cool comic y'all!" > '_comics/My first page.md' Generate the site using: bundle exec jekyll build Or start a local website to see it in your browser: bundle exec jekyll serve # Now visit http://localhost:4000! === Customising your site You'll notice a few things that might be off about your site, including that the webcomic title and author name are probably not what you were expecting. You can create a configuration file to tell RageRender the important details. Put something like this in your webcomic folder and call it <tt>_config.yml</tt>: title: "My awesome webcomic!" slogan: "It's the best!" description: > My epic story about how him and her fell into a romantic polycule with they and them defaults: - scope: path: '' values: author: "John smith" theme: ragerender Your webcomic now has its basic information set up. === Adding your layouts If you want to use your own layout code, then create a <tt>_layouts</tt> directory and put the contents of each of your ComicFury layout tabs in there, and then put your CSS in the main folder. You should end up with a full set of files like: _layouts archive.html blog-archive.html blog-display.html comic-page.html error-page.html overall.html overview.html search.html layout.css Now when you build your site, your custom templates and styles will be used instead. === Adding blogs Add your blogs into a folder called `_posts`: cat _posts/2025-05-29-my-new-comic.md Hey guys, welcome to my new comic! It's gonna be so sick! Note that the name of your blog post has to include the date and the title, or it'll be ignored. === Customising comics and blogs You can add {Front Matter}[https://jekyllrb.com/docs/front-matter/] to set the details of your author notes and blogs manually: --- title: "spooky comic page" date: "2025-03-05 16:20" image: "images/ghost.png" author: "Jane doe" custom: # use yes and no for tickbox settings spooky: yes # use text in quotes for short texts mantra: "live long and prosper" # use indented text for long texts haiku: > Testing webcomics Now easier than ever Thanks to RageRender comments: - author: "Skippy" date: "13 Mar 2025, 3.45 PM" comment: "Wow this is so sick!" --- Your author note still goes at the end, like this! === Adding extra pages You can add extra pages just by adding new HTML files to your webcomic folder. The name of the file becomes the URL that it will use. Pages by default won't be embedded into your 'Overall' layout. You can change that and more with optional Front Matter: --- # Include this line to set the page title title: "Bonus content" # Include this line to hide the page from the navigation menu hidden: yes # Include this line to embed this page in the overall layout layout: Overall --- <h1>yo check out my bonus content!</h1> === Controlling the front page As on ComicFury you have a few options for setting the front page of you site. You control this by setting a <tt>frontpage</tt> key in your site config. - <tt>latest</tt> will display the latest comic (also the default) - <tt>first</tt> will display the first comic - <tt>chapter</tt> will display the first comic in the latest chapter - <tt>blog</tt> will display the list of blog posts - <tt>archive</tt> will display the comic archive - <tt>overview</tt> will display the comic overview (blogs and latest page) - anything else will display the extra page that has the matching <tt>slug</tt> in its Front Matter === Stuff that doesn't work Here is a probably incomplete list of things you can expect to be different about your local site compared to ComicFury: - Any comments you specify in Front Matter will be present, but you can't add new ones - Search doesn't do anything at all - Saving and loading your place in the comic isn't implemented - GET and POST variables in templates are ignored and will always be blank - Random numbers in templates will be random only once per site build, not once per page call == Without Jekyll RageRender can also be used without Jekyll to turn ComicFury templates into templates in other languages. E.g: gem install ragerender echo "[c:iscomicpage]<div>[f:js|v:comictitle]</div>[/]" > template.html ruby $(gem which ragerender/to_liquid) template.html # {% if iscomicpage %}<div>{{ comictitle | escape }}</div>{% endif %} ruby $(gem which ragerender/to_erb) template.html # <% if iscomicpage %><div><%= js(comictitle) %></div><% end %> You still need to pass the correct variables to these templates; browse {this unofficial documentation}[https://github.com/heyeinin/comicfury-documentation] or RageRender::ComicDrop etc. to see which variables work on which templates. == Get help That's not a proclamation but an invitation! Reach out if you're having trouble by {raising an issue}[https://github.com/simonwo/ragerender/issues] or posting in the ComicFury forums.
A library for text-to-speech functionality designed for AI applications and natural language processing, with the ability to detect and adapt to the operating system (Windows, macOS, Linux)
Gitingest is a powerful command-line tool that fetches files from GitHub repositories and generates consolidated text prompts for AI analysis. It features smart file filtering, concurrent processing, custom exclusion patterns, authentication support, and automatic rate limit handling. Perfect for creating context-rich prompts from codebases for AI assistants, documentation generation, or codebase analysis.
The app provides a command-line interface (CLI) to an Ollama AI model, allowing users to engage in text-based conversations and generate human-like responses. Users can import data from local files or web pages, which are then processed through three different modes: fully importing the content into the conversation context, summarizing the information for concise reference, or storing it in an embedding vector database for later retrieval based on the conversation.
This gem extends the internationalization (i18n) functionality in Ruby on Rails by adding a simple toggle button to your application’s UI. When activated, it overlays each translated string with a tooltip that displays its associated translation key on hover. This allows QA teams, product managers, and other non-technical stakeholders to quickly identify which translation keys correspond to specific pieces of text on the page—without diving into source code. Designed for simplicity and ease-of-use, the gem integrates seamlessly with the Rails asset pipeline and can be easily enabled or disabled as needed, streamlining your localization quality assurance process.
The Ruby library, Documentrix, is designed to provide a way to build and query vector databases for applications in natural language processing (NLP) and large language models (LLMs). It allows users to store and retrieve dense vector embeddings for text strings.
WatermarkToPdf is a Ruby gem that allows you to add text-based watermarks to PDF files. It uses the MiniMagick library to process PDF pages as images, apply customizable watermark text, and then converts the watermarked pages back into a single PDF. The gem supports setting different watermark text, font, size, position, and color for each page, providing a flexible solution for adding professional watermarks to your PDF documents.
=== What is GptHelpr? It is sometime necessary to provide context and explanations for your code. Instead of manually copying and formatting code snippets, GPT-Helpr automates the process with an interactive cli, generating a well-structured Markdown output, which can be copied to your clipboard or printed to file. === Example Usage # note lmk is an alias for gpt_helpr -i -ln $ lmk == 🏴☠️ GptHelpr 0.2.3 == Helping to dig your codebase and cook GPT-XX instructions [current directory /Users/etozzato/WorkSpace/_AINZ/pizzatarians.com] File Path (optional :start:end): TAB -> favicon.ico hey.md js random-acts-of-pizza.md _config.yml _site draft fonts images kneading-baking-academy.md _exe academy favicon.gif hands-in-dough.md index.md parties-and-events.md File Path (optional :start:end): hey.md 1:22 Instructions: can you improve this text? Do you see any issues with the template? File Path (optional :start:end): # this is the generated output (also copied to the clipboard) ==== file source `hey.md 1:22` 1: --- 2: title: Hey, hello! 3: layout: default 4: --- 5: 6: # {{ page.title }} 7: ---- 8: 9: <div class="row"> 10: <div class="col-md-12"> 11: <p class='justin'> 12: Nice to meet you, I am *Mek*! 13: </p> 14: <p class='listo'> 15: I am a self-proclaimed pizza guru and I am here to teach & learn. Originally from Venice, Italy you can find me in San Diego, CA. 16: </p> 17: <p class='listo'> 18: In my spare time, I write code @ PlayStation! 19: </p> 20: </div> 21: </div> 22: can you improve this text? Do you see any issues with the template? ==== end of `hey.md`
Goethe - Text processing library.
a text processing command-line tool that is driven by Ruby's `#each_line`
A smart, static site generator that automatically manages dependencies to achieve blazing build times with minimal cognitive load. Only new and changed files, and files upstream of a changed dependency are processed. Renders markdown or embedded-Ruby (Erb-like) content as HTML. Supports templates (embedded & layout), which may be included within content sources or other templates. Document metadata may me added using a plain-text preamble of key-value pairs. Generates a complete website that can be served by the built-in WEBrick server.
Hunyuan is a Ruby gem designed to simplify the integration of the Hunyuan API for chat completions into your Ruby applications. With Hunyuan, you can effortlessly add natural language processing capabilities, enabling your applications to provide intelligent responses to user queries. Whether you're building chatbots, virtual assistants, or any other application that requires text-based interactions, Hunyuan streamlines the process and empowers your Ruby code with advanced chat completion features.
Wikipedia articles are infamous for being heavily referenced. One article could all of a sudden end up being a rabbit hole where you start clicking on other links and might soon get lost in the process. This ruby script converts your wikipedia url into simple text. You will have all that you need without any references. Removing references also comes in handy when you are plugging this into any text-to-audio converter.
The ruby-amazon-bedrock gem offers Ruby developers an efficient and user-friendly interface to Amazon Bedrock, a powerful library for AI-driven text and image generation. This gem simplifies the process of connecting to Amazon Bedrock's APIs, enabling developers to easily harness the capabilities of advanced machine learning models for generating high-quality text and images.