@7-docs/cli
7d
is a powerful CLI tool to ingest content and store into a vector database, ready to get queried like you would with
ChatGPT.
Uses OpenAI APIs, part of 7-docs.
Impression
CLI
Content
Status
This is still in the early days, but already offers a variety of features:
- Plain text, Markdown and PDF files are supported as input.
- Ingest from local files, from HTML pages over HTTP, and from GitHub repositories.
- The OpenAI
text-embedding-ada-002
model is used to create embeddings. - Pinecone and Supabase are supported for vector storage.
- The OpenAI
gpt-3.5-turbo
model is used for chat completions from the CLI.
See the 7-docs overview for more packages and starter kits.
Prerequisites
- Node.js v16+
- OpenAI API key
- Pinecone or Supabase account, plus API keys
- When ingesting lots of files from GitHub, a GitHub token
Installation
You can install 7-docs in two ways:
- Global to manage knowledge base(s) from the command line.
- Local to manage the knowledge base(s) of a repository.
Global
Use 7d
from anywhere to manage your personal knowledge bases:
npm install --global 7-docs
Get an OpenAI API key and make it available as en environment variable:
export OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Alternative storage (in ~/.7d.json
) so it's available in your next session too:
7d set OPENAI_API_KEY sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
This works for the other export
values shown later as well.
Local
Add 7d
to the devDependencies
of a repository to manage its knowledge base(s):
npm install --save-dev 7-docs
Store the variables you need in a local .env
file in the root of your project:
OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
For local installations, use npx 7d
(over just 7d
).
Now let's choose either Pinecone or Supabase!
Pinecone
Make sure to have a Pinecone account and set PINECONE_API_KEY
:
export PINECONE_API_KEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Create or select an index:
7d pinecone-create-index --index [name] --environment [env]
Find the environment in your Pinecone Console (e.g. us-east4-gcp
).
Keep working with this index by setting the PINECONE_URL
from the Pinecone Console like so:
export PINECONE_URL=xxxxx-xxxxxxx.svc.us-xxxxx-gcp.pinecone.io
export PINECONE_API_KEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Supabase
Make sure to have a Supabase account and set SUPABASE_URL
and SUPABASE_API_KEY
:
export SUPABASE_URL="https://xxxxxxxxxxxxxxxxxxxx.supabase.co"
export SUPABASE_API_KEY="ey..."
Print the SQL query to enable pgvector and create a table (paste the output in the Supabase web admin):
7d supabase-create-table --namespace my-collection
Ingestion
Let's ingest some text or Markdown files, make sure to adjust the --files
pattern to match yours:
7d ingest --files README.md --files 'docs/**/*.md' --namespace my-collection
Note that ingestion from remote resources (GitHub and/or HTTP) has the benefit to link back to the original
source when retrieving answers. This is not possible when using local files.
GitHub
Use --source github
and file patterns to ingest from a GitHub repo:
7d ingest --source github --repo reactjs/react.dev --files 'src/content/reference/react/*.md' --namespace react
You can start without it, but once you start fetching lots of files you'll need to set GITHUB_TOKEN
:
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
HTTP
Crawl content from web pages:
7d ingest --source http --url https://en.wikipedia.org/wiki/Butterfly
PDF
7d
supports PDF files as well:
7d ingest --files ./my-article.pdf
7d ingest --source github --repo webpro/webpro.nl --files 'content/*.pdf'
When you see the cannot find module "canvas"
error, please see node-canvas#compiling.
Ignore files
To exclude files from ingestion, use the --ignore
argument:
7d ingest --files 'docs/**/*.md' --ignore 'folder/*' --ignore 'dir/file.md' --ignore '**/ignore.md'
Query
Now you can start asking questions about it:
7d Can you please give me a summary?
Other commands
Other convenience flags and commands not mentioned yet.
--help
Shows available commands and how they can be used:
7d --help
openai-list-models
List available OpenAI models:
7d openai-list-models
pinecone-clear-namespace
Clear a single namespace from the current Pinecone index:
7d pinecone-clear-namespace --namespace my-collection
Token Usage
The OpenAI recommendation text-embedding-ada-002 model is used to create embeddings. Ingestion uses some tokens
when ingesting lots of files. Queries use only a few tokens (using the gpt-3.5-turbo model by default). See the
console for details.