New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

strapi-plugin-semantic-search

Package Overview
Dependencies
Maintainers
1
Versions
2
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

strapi-plugin-semantic-search

Intelligent semantic search plugin for Strapi 5 powered by OpenAI embeddings. Automatically generates embeddings for your content and provides powerful semantic search capabilities.

latest
Source
npmnpm
Version
1.1.0
Version published
Maintainers
1
Created
Source

Semantic Search Plugin for Strapi 5

Intelligent content discovery powered by OpenAI embeddings

Transform your Strapi CMS into an AI-powered content platform with automatic embedding generation and semantic search capabilities.

🆕 What's New in v1.1.0

  • Configurable Field Mapping - Customize which fields to process per content type
  • Enhanced Validation - Comprehensive configuration validation with helpful error messages
  • Improved Logging - Better visibility into plugin configuration and processing
  • Flexible Content Types - Support for any content type with custom field mappings

Features

  • Automatic Embedding Generation - Content embeddings created automatically on save
  • Semantic Search APIs - RESTful endpoints for intelligent content discovery
  • Multi-Content Type Search - Search across different content types simultaneously
  • High Performance - Sub-300ms search responses with vector similarity
  • Production Ready - Comprehensive error handling and rate limiting
  • Analytics - Monitor embedding coverage and search performance
  • Zero Dependencies - No external vector databases required

Installation

1. Install Dependencies

cd src/plugins/semantic-search
npm install openai@^5.8.2 axios@^1.10.0

2. Configure Environment

Add your OpenAI API key to .env:

OPENAI_API_KEY=your_openai_api_key_here

3. Enable Plugin

Update config/plugins.ts:

export default ({ env }) => ({
  'semantic-search': {
    enabled: true,
    resolve: './src/plugins/semantic-search'
  },
});

4. Add Embedding Fields

Add embedding fields to your content types:

{
  "embedding": {
    "type": "json"
  },
  "embeddingMetadata": {
    "type": "json"
  }
}

5. Restart Strapi

npm run develop

Configuration

Basic Configuration

By default, the plugin processes these content types and fields:

// Default configuration
{
  'api::article.article': ['title', 'content', 'summary'],
  'api::blog.blog': ['title', 'body', 'excerpt']
}

Custom Configuration

Configure which content types and fields to process by updating config/plugins.js:

// config/plugins.js
module.exports = ({ env }) => ({
  'semantic-search': {
    enabled: true,
    resolve: './src/plugins/semantic-search',
    config: {
      contentTypes: {
        'api::article.article': ['title', 'content', 'summary', 'tags'],
        'api::blog.blog': ['title', 'body', 'excerpt', 'category'],
        'api::product.product': ['name', 'description', 'features', 'benefits'],
        'api::course.course': ['title', 'overview', 'learningOutcomes']
      }
    }
  }
});

Configuration Options

OptionTypeDescription
contentTypesObjectMaps content type UIDs to arrays of field names

Content Type Format: Use Strapi's UID format: api::collection-name.collection-name

Field Names:

  • Must be valid field names from your content type schema
  • Can include any text-based fields (text, textarea, richtext, etc.)
  • Rich text fields are automatically converted to plain text for embedding

Configuration Validation

The plugin validates your configuration on startup:

  • Valid content type format: api::article.article
  • Valid field arrays: ['title', 'content']
  • Non-empty field names: No empty strings or invalid types
  • Invalid formats: Missing api:: prefix, empty fields, etc.

Invalid configurations will log warnings and fallback to defaults.

Usage

Automatic Embedding Generation

Embeddings are generated automatically when you create or update content. The plugin extracts text from the fields you've configured for each content type (see Configuration section above).

Search API

POST /api/semantic-search/search

Request:

{
  "query": "artificial intelligence and machine learning",
  "contentType": "api::article.article", 
  "limit": 10,
  "threshold": 0.1
}

Response:

{
  "success": true,
  "data": {
    "query": "artificial intelligence and machine learning",
    "contentType": "api::article.article",
    "results": [
      {
        "id": 1,
        "title": "Deep Learning Fundamentals",
        "similarityScore": 0.8945,
        "content": "...",
        "createdAt": "2025-01-15T10:30:00.000Z"
      }
    ],
    "metadata": {
      "totalResults": 5,
      "queryProcessing": {
        "embeddingDimensions": 1536
      }
    }
  }
}
POST /api/semantic-search/multi-search

Request:

{
  "query": "productivity and remote work",
  "contentTypes": ["api::article.article", "api::blog.blog"],
  "limit": 15,
  "aggregateResults": true
}

Embedding Statistics

GET /api/semantic-search/stats

Response:

{
  "success": true,
  "data": {
    "api::article.article": {
      "total": 50,
      "withEmbeddings": 50,
      "coverage": "100.00%"
    },
    "api::blog.blog": {
      "total": 25, 
      "withEmbeddings": 23,
      "coverage": "92.00%"
    }
  }
}

Configuration

Supported Content Types

The plugin automatically processes these content types:

  • Any content type that starts with api::
  • Excludes admin and plugin content types
  • Configurable in the lifecycle registration

Text Field Mapping

Default fields processed for embedding generation:

const textFields = [
  'title', 'name', 'content', 'body', 
  'summary', 'description', 'excerpt'
];

Search Parameters

ParameterTypeDefaultDescription
querystringrequiredSearch query text
contentTypestringrequiredStrapi content type UID
limitnumber10Maximum results (max: 50)
thresholdnumber0.1Minimum similarity score
filtersobject{}Additional database filters

Architecture

Plugin Structure

semantic-search/
├── package.json           # Plugin metadata and dependencies
├── strapi-server.js       # Plugin entry point
└── server/
    ├── index.js           # Server exports
    └── src/
        ├── index.js       # Main plugin logic
        ├── controllers/   # API request handlers
        │   ├── index.js
        │   └── search-controller.js
        ├── services/      # Business logic
        │   ├── index.js
        │   ├── embedding-service.js    # OpenAI integration
        │   ├── vector-service.js       # Similarity calculations
        │   └── search-service.js       # Search orchestration
        └── routes/        # API endpoint definitions
            └── index.js

Data Flow

  • Content Creation/Update → Lifecycle hook triggered
  • Text Extraction → Combine relevant text fields
  • OpenAI API Call → Generate 1536-dimension embedding
  • Database Storage → Save embedding in JSON field
  • Search Request → Convert query to embedding
  • Similarity Search → Calculate cosine similarity
  • Result Ranking → Sort by similarity score

Vector Storage

Embeddings are stored as JSON in your existing database:

{
  "embedding": [0.1234, -0.5678, 0.9012, ...], // 1536 dimensions
  "embeddingMetadata": {
    "model": "text-embedding-ada-002",
    "generatedAt": "2025-01-15T10:30:00.000Z",
    "dimensions": 1536,
    "processedText": "Machine learning fundamentals...",
    "originalLength": 1250,
    "processedLength": 1180
  }
}

Similarity Scores

Understanding similarity score ranges:

Score RangeRelevanceDescription
0.85 - 1.0Highly RelevantDirect topic match
0.75 - 0.85RelevantRelated concepts
0.65 - 0.75Somewhat RelevantTangential connection
0.1 - 0.65Low RelevanceWeak semantic relation

Performance

Benchmarks

  • Embedding Generation: 1-3 seconds per document
  • Search Latency: ~250ms end-to-end
  • Vector Comparison: ~50ms for 1000 documents
  • Memory Usage: ~1.5KB per embedding
  • Storage Overhead: ~6KB per document (embedding + metadata)

Optimization Tips

  • Batch Processing: Process multiple embeddings in parallel
  • Caching: Cache frequently searched embeddings
  • Indexing: Add database indexes on frequently filtered fields
  • Text Preprocessing: Optimize text extraction for your content types

Development

Local Development

# Install plugin dependencies
cd src/plugins/semantic-search
npm install

# Return to project root
cd ../../../

# Start development server
npm run develop

Testing

# Test embedding generation
curl -X POST http://localhost:1337/api/semantic-search/search \
-H "Content-Type: application/json" \
-d '{"query": "test query", "contentType": "api::article.article"}'

# Check statistics
curl http://localhost:1337/api/semantic-search/stats

Debugging

Enable debug logging in your Strapi configuration:

// config/logger.js
module.exports = {
  level: 'debug',
  transports: [
    {
      type: 'console',
      options: {
        pool: true,
        format: 'combined',
        level: 'debug'
      }
    }
  ]
};

Production Deployment

Environment Variables

# Required
OPENAI_API_KEY=your_production_openai_key

# Optional
NODE_ENV=production

Security Considerations

  • API Authentication: Enable authentication for search endpoints
  • Rate Limiting: Configure appropriate rate limits
  • Input Validation: Validate search queries and parameters
  • API Key Security: Secure OpenAI API key storage

Monitoring

Monitor these key metrics:

  • Embedding generation success rate
  • Search response times
  • OpenAI API usage and costs
  • Similarity score distributions
  • Search query patterns

Extending the Plugin

Custom Content Type Support

Add support for additional content types:

// In registerEmbeddingLifecycles function
const contentTypes = [
  'api::article.article',
  'api::blog.blog', 
  'api::product.product',  // Add your content type
  'api::course.course'     // Add another content type
];

Custom Text Extraction

Modify text field extraction:

// In extractTextContent function
const textFields = [
  'title', 'content', 'summary',
  'customField',      // Add custom field
  'description'       // Add more fields as needed
];

Advanced Filtering

Add custom filtering logic:

// In search service
const searchWithCustomFilters = async (query, contentType, customFilters) => {
  const filters = {
    ...customFilters,
    publishedAt: { $notNull: true },  // Only published content
    featured: true                     // Only featured content
  };
  
  return await semanticSearch(query, contentType, { filters });
};
  • Complete Implementation Guide - Detailed technical documentation
  • Strapi Plugin Development - Official Strapi plugin docs
  • OpenAI Embeddings API - OpenAI documentation

Troubleshooting

Common Issues

1. Plugin not loading:

  • Check config/plugins.ts configuration
  • Verify plugin path resolution
  • Check for syntax errors in plugin files

2. Embeddings not generating:

  • Verify OpenAI API key is valid
  • Check network connectivity to OpenAI
  • Review content type field configuration

3. Search returning no results:

  • Verify embedding field exists on content type
  • Check similarity threshold (try lowering to 0.1)
  • Ensure content has embeddings generated

4. Performance issues:

  • Monitor OpenAI API rate limits
  • Check database query performance
  • Consider caching strategies

Debug Commands

# Check plugin loading
tail -f logs/strapi.log | grep "semantic-search"

# Test OpenAI connectivity
curl -X POST https://api.openai.com/v1/embeddings \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"input": "test", "model": "text-embedding-ada-002"}'

# Validate embedding storage
curl http://localhost:1337/api/semantic-search/stats

License

MIT License

Contributing

This plugin is part of the semantic search demo project. Contributions and improvements are welcome.

Built with OpenAI Embeddings and Strapi 5

Keywords

strapi

FAQs

Package last updated on 14 Jul 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts