New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details
Socket
Book a DemoSign in
Socket

htmless

Package Overview
Dependencies
Maintainers
1
Versions
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

htmless

CLI tool to clean and minify HTML by removing scripts, styles, and attributes — optimized for LLM input.

latest
Source
npmnpm
Version
1.0.0
Version published
Weekly downloads
2
Maintainers
1
Weekly downloads
 
Created
Source

htmless

npm version npm downloads license types code style

Lighten your HTML input. Keep the meaning, ditch the weight.

🧠 What is it?

htmless is a minimalist CLI tool that strips HTML down to the bone — removing unnecessary scripts, styles, attributes, and utility classes. The result is a clean, minified HTML output, ideal for feeding into LLMs where every token counts.

🤔 Why was it created?

I needed to extract semantically valuable content from HTML pages and send it to AI models. But raw HTML is full of bloat — especially utility classes from frameworks like Tailwind, inline styles, scripts, and other things that eat tokens without adding real value.

The goals were simple:

  • Preserve document structure – headings, paragraphs, text emphasis
  • Keep href attributes on <a> tags – they carry semantic meaning and useful context
  • Eliminate noise
  • Make it fast, simple, and automatable
  • Follow the Unix philosophy — do one thing and do it well

🔧 Installation

pnpm add -g htmless
# or
npm install -g htmless

🚀 Usage

cat input.html | htmless

Use it in a bash pipeline, before LLM processing, or to clean up WYSIWYG HTML exports.

💡 Example

Input:

<div class="bg-white p-4 text-sm text-gray-700">
  <h1 class="text-3xl font-bold">Welcome</h1>
  <p>This is a <strong>test</strong>.</p>
  <script>alert('Hi')</script>
  <style>body { background: red; }</style>
</div>

Output:

<div><h1>Welcome</h1><p>This is a <strong>test</strong>.</p></div>

🛠️ What gets removed?

  • all HTML attributes (class, id, style, data-*, etc.)
  • <script> and <style> blocks
  • comments and whitespace
  • (exception: href on <a> is preserved)

🔎 Who is this for?

  • developers working with LLMs and prompt engineering
  • anyone who needs to get meaningful content from HTML without the fluff
  • scripting, scraping, automation pipelines

🧪 Tech info

  • built on top of htmlparser2 — fast and robust
  • outputs valid HTML (not plaintext)
  • written in TypeScript, clean CLI with commander

🧘 Philosophy

Less is more. Tokens are expensive. htmless helps LLMs process content, not the wrapper.

👤 Author

Made with ❤️ by BroJor

📄 License

ISC License © 2025 BroJor

Keywords

html

FAQs

Package last updated on 15 Apr 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts