🚀. Socket Launch Week Day 3:Socket Firewall Now Blocks Malicious VS Code and Open VSX Extensions.Learn more
Sign In

srcfull

Package Overview
Dependencies
Maintainers
1
Versions
2
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install
Package was removed
Sorry, it seems this package was removed from the registry

srcfull

Image extraction and source-resolution toolkit for high-quality web images.

latest
npmnpm
Version
2.0.1
Version published
Weekly downloads
0
Maintainers
1
Weekly downloads
 
Created
Source

Srcfull

srcfull is a package-first toolkit for extracting and upgrading web image URLs.

It is designed as a standalone library and CLI for image extraction and source resolution. The focus is:

  • extract image candidates from HTML
  • filter obvious junk like logos and icons
  • resolve CDN/transformed URLs back to larger originals
  • probe likely source variants when no curated pattern exists
  • optionally plug in HTML fetchers like ScrapingBee and fallback image providers like Firecrawl

It handles the page-shape problems that usually make this kind of package annoying in practice:

  • relative image paths resolved against the page URL
  • lazy-loaded image attributes like data-src, data-srcset, and data-original
  • img srcset, picture source, inline background images, and social/meta image tags
  • private-host blocking for both page scraping and image validation
  • HEAD fallback to ranged GET for hosts that refuse metadata requests
  • persistent file-backed cache/pattern stores for repeat runs

Install

pnpm install
pnpm build

Library Usage

import { scrapePage, resolveImageUrl } from "srcfull";

const resolved = await resolveImageUrl(
  "https://cdn.example.com/image.jpg?w=400&q=80"
);

const page = await scrapePage("https://example.com/product-page");

scrapePage() normalizes relative candidates against the page URL before validation and resolution, so typical product/article HTML works without extra preprocessing.

If you need rendered HTML instead of plain fetch, inject a custom fetcher:

import { scrapePage } from "srcfull";
import { createScrapingBeeHtmlFetcher } from "srcfull/providers/scrapingbee";

const fetchHtml = createScrapingBeeHtmlFetcher({
  apiKey: process.env.SCRAPINGBEE_API_KEY!,
});

const result = await scrapePage("https://example.com", { fetchHtml });

If you want the built-in fetcher with different timeout or header behavior:

import { createDefaultHtmlFetcher, scrapePage } from "srcfull";

const fetchHtml = createDefaultHtmlFetcher({
  timeoutMs: 15_000,
  headers: {
    "Accept-Language": "en-GB,en;q=0.9",
  },
});

const result = await scrapePage("https://example.com", { fetchHtml });

For image-only fallback:

import { createFirecrawlImageFallback } from "srcfull/providers/firecrawl";

If you want candidate extraction without the rest of the pipeline:

import { extractImageCandidatesFromHtml } from "srcfull";

const candidates = extractImageCandidatesFromHtml(
  html,
  "https://example.com/product-page"
);

For repeat jobs, persist cache and learned patterns on disk:

import {
  createFileCache,
  createFilePatternStore,
  resolveImageUrl,
} from "srcfull";

const cache = createFileCache({ filePath: ".srcfull/cache.json" });
const patternStore = createFilePatternStore({
  filePath: ".srcfull/patterns.json",
});

const result = await resolveImageUrl("https://cdn.example.com/photo.jpg?w=400", {
  cache,
  patternStore,
});

CLI

srcfull resolve 'https://cdn.example.com/photo.jpg?w=300'
srcfull scrape 'https://example.com/listing' --max-images=12
srcfull scrape 'https://example.com/listing' --max-images=12 --min-size=300 --resolve-concurrency=8
srcfull --version

The JSON response from scrape includes stats.returned as well as found, resolved, failed, and durationMs.

Demo Page

There is a self-contained demo page at docs/demo/index.html.

pnpm demo:build
pnpm demo:serve

The page is generated from real calls to the package, so the HTML samples, extracted candidates, resolved URLs, and persisted cache/pattern snapshots are actual outputs rather than hand-written mockups.

Development

pnpm test
pnpm test:live-patterns
pnpm typecheck
pnpm build

pnpm test:live-patterns revalidates the researched real-world CDN fixtures in test/fixtures/curated-patterns.json against the network.

FAQs

Package last updated on 04 Apr 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts