Security News
38% of CISOs Fear They’re Not Moving Fast Enough on AI
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Node.js package to simplify the process of converting documents (PDF, DOCX, PPTX, and XLSX) into Markdown format.
It uses tesseract.js
, mammoth
, pdf.js
, and turndown
to convert documents to Markdown format. For PDFs, it also provides an option to use vLLMs (Vision Large Language Models) for advanced OCR capabilities (using the OpenAI API).
npm install down-craft
import { downCraft } from 'down-craft';
import fs from 'fs/promises';
async function example() {
// Read file buffer
const fileBuffer = await fs.readFile('document.docx');
// Convert to markdown (pass file buffer and file type)
const markdown = await downCraft(fileBuffer, 'docx');
console.log(markdown);
}
Converts a document buffer to markdown format.
fileBuffer
(Buffer): The document buffer to convertfileType
(string, optional): File type ('pdf', 'docx', 'pptx', 'xlsx'). If not provided the file type will be attempted to be auto-detected.options
(Object, optional): Conversion options
pdfConverterType
(string, optional): Converter to use for PDF files ('standard' | 'llm' | 'ocr'). Default: 'standard'llmParams
(Object, required for 'llm' converter): LLM configuration
baseURL
(string): Base URL for the LLM APIapiKey
(string): API key for the LLM servicemodel
(string): Model to use for OCRReturns: Promise - The markdown content
For PDFs that require advanced OCR capabilities, you can use the vLLM converter:
const markdown = await downCraft(pdfBuffer, 'pdf', {
pdfConverterType: 'llm',
llmParams: {
baseURL: 'https://api.llm-service.com',
apiKey: 'your-api-key',
model: 'your-model-name'
}
});
This converter:
The llmParams object will attempt to read environment variables for baseURL, apiKey, and model if you have them defined.
See the .env.example
file for an example (it also shows an example of how you can define your own user/system prompts), as well as various LLM providers / models.
This package is licensed under the Apache 2.0 license.
See LICENSE for details.
[0.1.0] - 2024-12-27
@opendocsg/pdf2md
mammoth
and turndown
pptx-in-html-out
and turndown
xlsx
and turndown
FAQs
Convert various document types (PDF, DOCX, PPTX, XLSX) to Markdown format
The npm package down-craft receives a total of 7 weekly downloads. As such, down-craft popularity was classified as not popular.
We found that down-craft demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.
Security News
Company News
Socket is joining TC54 to help develop standards for software supply chain security, contributing to the evolution of SBOMs, CycloneDX, and Package URL specifications.