🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
DemoInstallSign in
Socket

html-text-extractor

Package Overview
Dependencies
Maintainers
1
Versions
4
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

html-text-extractor

A Node.js library that extracts and structures text from HTML files for full-text search indexing.

1.1.1
latest
Source
npm
Version published
Weekly downloads
1
-50%
Maintainers
1
Weekly downloads
 
Created
Source

html-text-extractor

An HTML parsing library for Node.js, designed to extract text sections associated with anchor tags and headings from HTML files in a directory and its subdirectories. The extracted text is structured for indexing in a full-text search engine. The library produces an array of sections, each with properties for the URL (based on the file path), the anchor (if present), the title (based on the following heading tag), and the text.

Features

  • ✅ Extracts text from HTML files in a folder (and it's sub-folders)
  • ✅ Available as a simple API
  • ✅ Just 624 byte nano sized (ESM, gizpped)
  • ✅ Tree-shakable and side-effect free
  • ✅ First class TypeScript support
  • ✅ 100% Unit Test coverage

Example usage (API, as a library)

Setup

  • yarn: yarn add html-text-extractor
  • npm: npm install html-text-extractor

ESM

import { extract } from 'html-text-extractor'

const result = await extract('./dist')

CommonJS

const { extract } = require('html-text-extractor')

// same API like ESM variant

Keywords

html

FAQs

Package last updated on 14 Jul 2023

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts