New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

eliza-scraper

Package Overview
Dependencies
Maintainers
0
Versions
6
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

eliza-scraper

Awesome Scraper for eliza, scrape docs, tweets, and tokens

latest
npmnpm
Version
1.1.8
Version published
Maintainers
0
Created
Source

Eliza Scraper

A powerful scraping and RAG (Retrieval Augmented Generation) toolkit for collecting, processing, and serving blockchain ecosystem data. Built with Bun, TypeScript, and modern vector search capabilities.

Features

  • 🤖 Automated Twitter data scraping with engagement filtering
  • 📊 Real-time token price and market data tracking via CoinGecko
  • 📑 Documentation site scraping with caching
  • 🔄 Configurable cron jobs for automated data collection
  • 🔍 Vector search with time-decay relevance scoring
  • 🚀 REST API with Swagger documentation

Architecture

RAG Pipeline for Tweet Processing

graph TD
    A[Twitter] --> B[Apify Scraper]
    B --> C[Filters]
    C --> D[Text Processing]
    D --> E[OpenAi Embedding]
    E --> F[Qdrant Vector DB]
    F --> G[Search API]
    
    subgraph "filters"
    C --> H[Min Replies]
    C --> I[Min Retweets]
    C --> J[Search Tags]
    C --> K[Kaito Most Mindshare handles]
    end
    
    subgraph "Vector Search"
    F --> L[Semantic Search]
    end 

System Architecture

graph TB
    CLI[CLI Interface] --> Services
    API[REST API] --> Services
    
    subgraph "Services"
    Tweets[Tweet Service]
    Tokens[Token Service]
    Docs[Site Service]
    end
    
    subgraph "Storage"
    Qdrant[(Qdrant Vector DB)]
    Postgres[(PostgreSQL)]
    end
    
    Services --> Storage

Scraper Quick Start

  • Install globally:
npm install -g eliza-scraper
  • Create .env file:
# Required APIs
APIFY_API_KEY=your_apify_key
COINGECKO_API_KEY=your_coingecko_key
OPENAI_API_KEY=

# Database
DATABASE_URL=postgresql://user:password@localhost:5432/token_scraper

# Vector DB
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION_NAME=twitter_embeddings

# Optional
TWITTER_SEARCH_TAGS=berachain,bera
APP_TOPICS=berachain launch,berachain token
  • Start required services:
docker-compose up -d

CLI Commands

Start API Server

elizascraper server --port <number>

Tweet Scraping Service

elizascraper tweets-cron [options]

Options:
--port Port number (default: 3000)
--tags Custom search tags, can also pass twitter search strings, check out https://github.com/igorbrigadir/twitter-advanced-search (comma-separated)
--start Start date (YYYY-MM-DD)
--end End date (YYYY-MM-DD)
--min-replies Minimum replies threshold (default: 4)
--retweets Minimum retweets threshold (default: 2)
--max-items Maximum tweets to fetch (default: 200)

Disclaimer: make sure you don't run the cron job in a small time interval, with start and end date less than 3 days, fetching less than 50 tweets, would lead to a ban shortly on your apify key.

To prevent ban, same search query, ran multiple times inside an hour would be skipped over.

Tracking Latest Token Data

elizascraper token-cron [options]

Options:
-p, --port <number> Port to listen on (default: 3008)
-i, --interval <number> Interval in minutes (default: 60)
--currency <string> Currency to track (default: 'usd')
--cg_category <string> CoinGecko category (default: 'berachain-ecosystem')

Documentation Scraping Service

elizascraper site-cron [options]

Options:
-p, --port <number> Port to listen on (default: 3010)
-i, --interval <number> Interval in minutes (default: 60)
-l, --link <string> URL to scrape (default: 'https://docs.berachain.com')

API Documentation

Tweets API

GET /tweets/search

Search for tweets based on semantic similarity.

Query Parameters:

  • searchString (required): Text to search for semantically similar tweets

Response Schema:

{
  status: "success" | "error",
  results: {
    tweet: {
      id: string,
      text: string,
      fullText: string,
      author: {
        name: string,
        userName: string,
        profilePicture: string
      },
      createdAt: string,
      retweetCount: number,
      replyCount: number,
      likeCount: number
    },
    combinedScore: number,
    originalScore: number,
    date: string
  }[]
}

GET /tweets/random

Get random tweets by predefined topics.

Response Schema:

{
  topic: string,
  tweet: {
    tweet: {
      id: string,
      text: string,
      fullText: string,
      author: {
        name: string,
        userName: string,
        profilePicture: string
      },
      createdAt: string,
      retweetCount: number,
      replyCount: number,
      likeCount: number
    },
    combinedScore: number,
    originalScore: number,
    date: string
  }[]
}

Tokens API

GET /tokens

Get token market data.

Query Parameters:

  • symbol (optional): Token symbol to filter results

Response Schema:

{
  status: "success" | "error",
  results: {
    id: string,
    symbol: string,
    name: string,
    image: string,
    currentPrice: number,
    marketCap: number,
    marketCapRank: number,
    fullyDilutedValuation: number,
    totalVolume: number,
    high24h: number,
    low24h: number,
    priceChange24h: number,
    priceChangePercentage24h: number,
    marketCapChange24h: number,
    marketCapChangePercentage24h: number,
    circulatingSupply: number,
    totalSupply: number,
    maxSupply: number,
    ath: number,
    athChangePercentage: number,
    athDate: string,
    atl: number,
    atlChangePercentage: number,
    atlDate: string,
    lastUpdated: string
  }
}

Documentation API

GET /docs/scrape

Scrape and cache documentation from a given URL.

Query Parameters:

  • url (required): URL of the documentation site to scrape

Response Schema:

{
  status: "success" | "error",
  data: {
    title: string,
    last_updated: string,
    total_sections: number,
    sections: {
      topic: string,
      source_url: string,
      overview: string,
      subsections: {
        title: string,
        content: string
      }[]
    }[]
  },
  source: "cache" | "fresh"
} | {
  status: "error",
  message: string
}

Cache Behavior:

  • Documentation is cached for 7 days
  • Returns cached version if available and not expired
  • Automatically refreshes cache if data is older than 7 days

Error Handling

All routes use a standardized error response format:

{
  status: "error",
  code: string,
  message: string
}

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting pull requests.

License

MIT

Keywords

blockchain

FAQs

Package last updated on 09 Mar 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts