
Security News
Software Engineering Daily Podcast: Feross on AI, Open Source, and Supply Chain Risk
Socket CEO Feross Aboukhadijeh joins Software Engineering Daily to discuss modern software supply chain attacks and rising AI-driven security risks.
markdown-crawler
Advanced tools
A powerful web crawler that extracts content from web pages and converts them to clean Markdown format, with support for code blocks and GitHub Flavored Markdown
A web crawler tool optimized for AI reading that converts web content into structured Markdown format. Using intelligent algorithms to clean up noise and extract core content, it generates clean text data suitable for AI model understanding and processing. Its special feature is the ability to integrate all related pages (including the current page and all its subdirectories) into a single YAML file, producing clearly structured Markdown content.
# Basic usage
npx markdown-crawler <url> <output-filename>
# Example: Crawl website and save as output.yaml
npx markdown-crawler https://example.com output
# For URLs with spaces, use double quotes
npx markdown-crawler "https://example.com/my page" output
# Output file will automatically add .yaml extension
# Results will be saved in the current working directory
The tool integrates all related pages into a structured YAML format:
- title: "Main Page Title"
content: |
# Main Page Content
Here is the main page content...
- title: "Subpage 1 Title"
content: |
# Subpage 1 Content
Here is subpage 1 content...
- title: "Subpage 2 Title"
content: |
# Subpage 2 Content
Here is subpage 2 content...
Features:
This project is licensed under the MIT License - see the LICENSE file for details.
FAQs
A powerful web crawler that extracts content from web pages and converts them to clean Markdown format, with support for code blocks and GitHub Flavored Markdown
The npm package markdown-crawler receives a total of 3 weekly downloads. As such, markdown-crawler popularity was classified as not popular.
We found that markdown-crawler demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Socket CEO Feross Aboukhadijeh joins Software Engineering Daily to discuss modern software supply chain attacks and rising AI-driven security risks.

Security News
GitHub has revoked npm classic tokens for publishing; maintainers must migrate, but OpenJS warns OIDC trusted publishing still has risky gaps for critical projects.

Security News
Rust’s crates.io team is advancing an RFC to add a Security tab that surfaces RustSec vulnerability and unsoundness advisories directly on crate pages.