![require(esm) Backported to Node.js 20, Paving the Way for ESM-Only Packages](https://cdn.sanity.io/images/cgdhsj6q/production/be8ab80c8efa5907bc341c6fefe9aa20d239d890-1600x1097.png?w=400&fit=max&auto=format)
Security News
require(esm) Backported to Node.js 20, Paving the Way for ESM-Only Packages
require(esm) backported to Node.js 20, easing the transition to ESM-only packages and reducing complexity for developers as Node 18 nears end-of-life.
docs2vector
Advanced tools
A tool to process markdown files from GitHub repositories and store them in Upstash Vector
A Node.js tool to process Markdown files from GitHub repositories, generate embeddings, and store them in Upstash Vector database. Perfect for building document search systems, AI-driven documentation assistants, or knowledge bases.
Clone any GitHub repository
Recursively find all Markdown (.md
) and MDX (.mdx
) files
Chunk documents using LangChain's RecursiveCharacterTextSplitter for better text segmentation
Supports both OpenAI and Upstash embeddings
Stores document chunks and metadata in Upstash Vector for enhanced retrieval
Handles cleanup automatically
Preserves file metadata for better context during retrieval
Settings
> Developer settings
> Personal access tokens
> Tokens (classic)
Generate new token
> Generate new token (classic)
repo
(Full control of private repositories)read:org
(Read organization data)Generate token
Note: If you're only accessing public repositories, you can create a token with just the public_repo
scope instead of the full repo
scope.
For security best practices:
mkdir github-docs-vectorizer
cd github-docs-vectorizer
Ensure the following files are included in your directory:
script.js
: The main script for processingpackage.json
: Manages project dependencies.env
: Contains your environment variables (explained below)Install dependencies:
npm install
.env
file in the root directory of your project with your credentials:# Required for accessing GitHub repositories
GITHUB_TOKEN=your_github_token
# Required for storing vectors in Upstash
UPSTASH_VECTOR_REST_URL=your_upstash_vector_url
UPSTASH_VECTOR_REST_TOKEN=your_upstash_vector_token
# Optional: Provide if using OpenAI embeddings
OPENAI_API_KEY=your_openai_api_key
Run the script by providing the GitHub repository URL as an argument:
node script.js https://github.com/username/repository
Example:
node script.js https://github.com/facebook/react
The script will:
OpenAI Embeddings (default if API key is provided)
OPENAI_API_KEY
in .env
Upstash Embeddings (used when OpenAI API key is not provided)
To adjust how documents are split into chunks, you can update the configuration in script.js
:
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000, // Adjust chunk size as needed
chunkOverlap: 200 // Adjust overlap as needed
});
Metadata accompanies each stored chunk for improved context:
The script is designed to handle errors gracefully in the following cases:
In case of errors, the script will:
Feel free to submit issues and enhancement requests!
MIT License - feel free to use this tool for any purpose.
This tool uses the following open-source packages:
FAQs
A tool to process markdown files from GitHub repositories and store them in Upstash Vector
The npm package docs2vector receives a total of 2 weekly downloads. As such, docs2vector popularity was classified as not popular.
We found that docs2vector demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
require(esm) backported to Node.js 20, easing the transition to ESM-only packages and reducing complexity for developers as Node 18 nears end-of-life.
Security News
PyPI now supports iOS and Android wheels, making it easier for Python developers to distribute mobile packages.
Security News
Create React App is officially deprecated due to React 19 issues and lack of maintenance—developers should switch to Vite or other modern alternatives.