New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

@promptbook/website-crawler

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

@promptbook/website-crawler

Supercharge your use of large language models

0.74.0-8
Source
npm

Version published: 3 months ago

Weekly downloads: 1.3K; increased by212.35%

Maintainers: 0

Weekly downloads

Created: 4 months ago

Source

Promptbook

Build responsible, controlled and transparent applications on top of LLM models!

✨ New Features

💙 Working on the Book language v1
📚 Support of .docx, .doc and .pdf documents
✨ Support of OpenAI o1 model

⚠ Warning: This is a pre-release version of the library. It is not yet ready for production use. Please look at latest stable release.

📦 Package `@promptbook/website-crawler`

Promptbooks are divided into several packages, all are published from single monorepo.
This package @promptbook/website-crawler is one part of the promptbook ecosystem.

To install this package, run:

# Install entire promptbook ecosystem
npm i ptbk

# Install just this package to save space
npm install @promptbook/website-crawler

Crawl knowledge from the web

Rest of the documentation is common for entire promptbook ecosystem:

🤍 The Promptbook Whitepaper

If you have a simple, single prompt for ChatGPT, GPT-4, Anthropic Claude, Google Gemini, Llama 3, or whatever, it doesn't matter how you integrate it. Whether it's calling a REST API directly, using the SDK, hardcoding the prompt into the source code, or importing a text file, the process remains the same.

But often you will struggle with the limitations of LLMs, such as hallucinations, off-topic responses, poor quality output, language and prompt drift, word repetition repetition repetition repetition or misuse, lack of context, or just plain w𝒆𝐢rd resp0nses. When this happens, you generally have three options:

Fine-tune the model to your specifications or even train your own.
Prompt-engineer the prompt to the best shape you can achieve.
Orchestrate multiple prompts in a pipeline to get the best result.

In all of these situations, but especially in 3., the ✨ Promptbook can make your life waaaaaaaaaay easier.

Separates concerns between prompt-engineer and programmer, between code files and prompt files, and between prompts and their execution logic. For this purpose, it introduces a new language called the 💙 Book.
Book allows you to focus on the business logic without having to write code or deal with the technicalities of LLMs.
Forget about low-level details like choosing the right model, tokens, context size, temperature, top-k, top-p, or kernel sampling. Just write your intent and persona who should be responsible for the task and let the library do the rest.
We have built-in orchestration of pipeline execution and many tools to make the process easier, more reliable, and more efficient, such as caching, compilation+preparation, just-in-time fine-tuning, expectation-aware generation, agent adversary expectations, and more.
Sometimes even the best prompts with the best framework like Promptbook :) can't avoid the problems. In this case, the library has built-in anomaly detection and logging to help you find and fix the problems.
Versioning is build in. You can test multiple A/B versions of pipelines and see which one works best.
Promptbook is designed to use RAG (Retrieval-Augmented Generation) and other advanced techniques to bring the context of your business to generic LLM. You can use knowledge to improve the quality of the output.

💜 The Promptbook Project

Promptbook project is ecosystem of multiple projects and tools, following is a list of most important pieces of the project:

Project	Description	Link
Core	Promptbook core is a description and documentation of basic innerworkings how should be Promptbook implemented and defines which fetures must be descriable by book language	https://ptbk.io https://github.com/webgptorg/book
Book language	Book is a markdown-like language to define core entities like projects, pipelines, knowledge,.... It is designed to be understandable by non-programmers and non-technical people
Promptbook typescript project	Implementation of Promptbook in TypeScript published into multiple packages to NPM	https://github.com/webgptorg/promptbook + Multiple packages on NPM
Promptbook studio	No-code studio to write book without need to write even the markdown	https://promptbook.studio https://github.com/hejny/promptbook-studio
Promptbook miniapps	Builder of LLM miniapps from book notation

💙 Book language (for prompt-engineer)

💙 The blueprint of book language

Following is the documentation and blueprint of the Book language.

Example

# 🌟 My first Book

-   PERSONA Jane, marketing specialist with prior experience in writing articles about technology and artificial intelligence
-   KNOWLEDGE https://ptbk.io
-   KNOWLEDGE ./promptbook.pdf
-   EXPECT MIN 1 Sentence
-   EXPECT MAX 1 Paragraph

> Write an article about the future of artificial intelligence in the next 10 years and how metalanguages will change the way AI is used in the world.
> Look specifically at the impact of Promptbook on the AI industry.

-> {article}

Goals and principles of book language

File is designed to be easy to read and write. It is strict subset of markdown. It is designed to be understandable by both humans and machines and without specific knowledge of the language.

It has file with .ptbk.md or .book extension with UTF-8 non BOM encoding.

As it is source code, it can leverage all the features of version control systems like git and does not suffer from the problems of binary formats, proprietary formats, or no-code solutions.

But unlike programming languages, it is designed to be understandable by non-programmers and non-technical people.

Structure

Book is divided into sections. Each section starts with heading. The language itself is not sensitive to the type of heading (h1, h2, h3, ...) but it is recommended to use h1 for header section and h2 for other sections.

Header

Header is the first section of the book. It contains metadata about the pipeline. It is recommended to use h1 heading for header section but it is not required.

Parameter

Foo bar

Parameter names

Reserved words:

each command like PERSONA, EXPECT, KNOWLEDGE, etc.
content
context
knowledge
examples
modelName
currentDate

Parameter notation

Template

Todo todo

Command

Todo todo

Block

Todo todo

Return parameter

Examples

📦 Packages (for developers)

This library is divided into several packages, all are published from single monorepo. You can install all of them at once:

npm i ptbk

Or you can install them separately:

⭐ Marked packages are worth to try first

⭐ ptbk - Bundle of all packages, when you want to install everything and you don't care about the size
promptbook - Same as ptbk
@promptbook/core - Core of the library, it contains the main logic for promptbooks
@promptbook/node - Core of the library for Node.js environment
@promptbook/browser - Core of the library for browser environment
⭐ @promptbook/utils - Utility functions used in the library but also useful for individual use in preprocessing and postprocessing LLM inputs and outputs
@promptbook/markdown-utils - Utility functions used for processing markdown
(Not finished) @promptbook/wizzard - Wizard for creating+running promptbooks in single line
@promptbook/execute-javascript - Execution tools for javascript inside promptbooks
@promptbook/openai - Execution tools for OpenAI API, wrapper around OpenAI SDK
@promptbook/anthropic-claude - Execution tools for Anthropic Claude API, wrapper around Anthropic Claude SDK
@promptbook/azure-openai - Execution tools for Azure OpenAI API
@promptbook/langtail - Execution tools for Langtail API, wrapper around Langtail SDK
@promptbook/fake-llm - Mocked execution tools for testing the library and saving the tokens
@promptbook/remote-client - Remote client for remote execution of promptbooks
@promptbook/remote-server - Remote server for remote execution of promptbooks
@promptbook/pdf - Read knowledge from .pdf documents
@promptbook/documents - Read knowledge from documents like .docx, .odt,…
@promptbook/legacy-documents - Read knowledge from legacy documents like .doc, .rtf,…
@promptbook/website-crawler - Crawl knowledge from the web
@promptbook/types - Just typescript types used in the library
@promptbook/cli - Command line interface utilities for promptbooks

📚 Dictionary

The following glossary is used to clarify certain concepts:

General LLM / AI terms

Prompt drift is a phenomenon where the AI model starts to generate outputs that are not aligned with the original prompt. This can happen due to the model's training data, the prompt's wording, or the model's architecture.
Pipeline, workflow or chain is a sequence of tasks that are executed in a specific order. In the context of AI, a pipeline can refer to a sequence of AI models that are used to process data.
Fine-tuning is a process where a pre-trained AI model is further trained on a specific dataset to improve its performance on a specific task.
Zero-shot learning is a machine learning paradigm where a model is trained to perform a task without any labeled examples. Instead, the model is provided with a description of the task and is expected to generate the correct output.
Few-shot learning is a machine learning paradigm where a model is trained to perform a task with only a few labeled examples. This is in contrast to traditional machine learning, where models are trained on large datasets.
Meta-learning is a machine learning paradigm where a model is trained on a variety of tasks and is able to learn new tasks with minimal additional training. This is achieved by learning a set of meta-parameters that can be quickly adapted to new tasks.
Retrieval-augmented generation is a machine learning paradigm where a model generates text by retrieving relevant information from a large database of text. This approach combines the benefits of generative models and retrieval models.
Longtail refers to non-common or rare events, items, or entities that are not well-represented in the training data of machine learning models. Longtail items are often challenging for models to predict accurately.

Note: Thos section is not complete dictionary, more list of general AI / LLM terms that has connection with Promptbook

Promptbook core

Organization (legacy name collection) group jobs, workforce, knowledge, instruments, and actions into one package. Entities in one organization can share resources (= import resources from each other).
- Jobs
  - Task
  - Subtask
- Workforce
  - Persona
  - Team
  - Role
- Knowledge
  - Public
  - Private
  - Protected
- Instruments
- Actions

Book language

Book file
- Section
  - Heading
  - Description
  - Command
  - Block
  - Return statement
- Comment
- Import
- Scope

💯 Core concepts

Advanced concepts

Terms specific to Promptbook TypeScript implementation

Anonymous mode
Application mode

🔌 Usage in Typescript / Javascript

➕➖ When to use Promptbook?

➕ When to use

When you are writing app that generates complex things via LLM - like websites, articles, presentations, code, stories, songs,...
When you want to separate code from text prompts
When you want to describe complex prompt pipelines and don't want to do it in the code
When you want to orchestrate multiple prompts together
When you want to reuse parts of prompts in multiple places
When you want to version your prompts and test multiple versions
When you want to log the execution of prompts and backtrace the issues

➖ When not to use

When you have already implemented single simple prompt and it works fine for your job
When OpenAI Assistant (GPTs) is enough for you
When you need streaming (this may be implemented in the future, see discussion).
When you need to use something other than JavaScript or TypeScript (other languages are on the way, see the discussion)
When your main focus is on something other than text - like images, audio, video, spreadsheets (other media types may be added in the future, see discussion)
When you need to use recursion (see the discussion)

🐜 Known issues

🧼 Intentionally not implemented features

🖋️ Contributing

I am open to pull requests, feedback, and suggestions. Or if you like this utility, you can ☕ buy me a coffee or donate via cryptocurrencies.

You can also ⭐ star the promptbook package, follow me on GitHub or various other social networks.

Keywords

FAQs

What is @promptbook/website-crawler?

Is @promptbook/website-crawler popular?

Is @promptbook/website-crawler well maintained?

Package last updated on 16 Nov 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@promptbook/website-crawler

.css-6n7j50{display:inline;} Promptbook

✨ New Features

📦 Package @promptbook/website-crawler

🤍 The Promptbook Whitepaper

💜 The Promptbook Project

💙 Book language (for prompt-engineer)

💙 The blueprint of book language

Example

Goals and principles of book language

Structure

Header

Parameter

Parameter names

Parameter notation

Template

Command

Block

Return parameter

Examples

📦 Packages (for developers)

📚 Dictionary

📚 Dictionary

General LLM / AI terms

Promptbook core

Book language

💯 Core concepts

Advanced concepts

Terms specific to Promptbook TypeScript implementation

🔌 Usage in Typescript / Javascript

➕➖ When to use Promptbook?

➕ When to use

➖ When not to use

🐜 Known issues

🧼 Intentionally not implemented features

❔ FAQ

⌚ Changelog

📜 License

🎯 Todos

🖋️ Contributing

Keywords

Related posts

React Team Updates CRA Migration Guidance After Community Pushback

Ransomware in 2024: Record-Low Payment Rate Signals Changing Economics of Cybercrime

Promptbook

📦 Package `@promptbook/website-crawler`