🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
Sign inDemoInstall
Socket

bilingual-summarizer

Package Overview
Dependencies
Maintainers
1
Versions
16
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

bilingual-summarizer

A powerful text summarization package for Arabic and English content with sentiment analysis and topic extraction

1.2.13
latest
npm
Version published
Weekly downloads
41
36.67%
Maintainers
1
Weekly downloads
 
Created
Source

Bilingual Summarizer

A powerful text summarization package for Arabic and English content that provides sentiment analysis, topic extraction, and more.

Features

  • Bilingual Support: Works with both Arabic and English text
  • Advanced Arabic Processing: Specialized algorithms for Arabic text summarization
  • HTML Support: Extract titles and meaningful content from HTML documents
  • Sentiment Analysis: Determine the sentiment of the text (positive, negative, neutral)
  • Topic Extraction: Identify the main topics discussed in the text
  • Reading Time Estimation: Calculate estimated reading time
  • Customizable Response: Control which fields are included in the response
  • Google Gemini AI Integration: Optional high-quality summaries using Google's Gemini models
  • Complete Sentence Guarantee: Ensures all generated summaries contain grammatically complete sentences

Installation

npm install bilingual-summarizer

Or using yarn:

yarn add bilingual-summarizer

Basic Usage

const { summarize } = require('bilingual-summarizer');

// Summarize English text
const englishResult = await summarize('Your English text here. It can be multiple sentences with various topics.');

// Summarize Arabic text
const arabicResult = await summarize('النص العربي الخاص بك هنا. يمكن أن يكون جملًا متعددة بمواضيع مختلفة.');

console.log(englishResult);
// {
//   ok: true,
//   title: '',
//   summary: '...',
//   language: 'en',
//   languageName: 'English',
//   sentiment: 'neutral',
//   topics: ['...'],
//   relatedTopics: ['...'],
//   words: 12,
//   sentences: 2,
//   readingTime: 1,
//   difficulty: 'easy'
// }

Using Google Gemini AI for Summaries

For enhanced summarization quality, you can use Google's Gemini AI models. This requires an API key from Google AI Studio.

const { summarize } = require('bilingual-summarizer');

// Summarize using Gemini AI
const result = await summarize('Your text to summarize here.', {
  useAI: true,
  gemini: {
    apiKey: 'YOUR_GEMINI_API_KEY', // Required
    model: 'gemini-1.5-flash',     // Optional, defaults to gemini-1.5-flash
    temperature: 0.2,              // Optional, controls creativity (0.0-1.0)
    maxOutputTokens: 1000,          // Optional, limits response length
    objective: true                // Optional, defined in the interface but not currently implemented in the prompt
  }
});

console.log(result);

The AI summarization uses a specialized prompt designed for both Arabic and English text. The current prompt format instructs the model to act as a professional linguist in the detected language, creating a brief summary with complete sentences. The prompt is provided in English regardless of the input language, as Gemini models have strong multilingual capabilities.

The prompt structure is:

Context:  
You are a professional linguist in the [detected language]. Your task is to create a brief summary of articles and posts in a paragraph containing no more than [N] complete sentences.

Instructions:  
Analyze the text carefully. Do not use bullet points or numbered lists. Provide a unique, complete summary as your answer, and ensure it is written in the [detected language].

Input:  
The text to summarize is:  
[Original text]

You can also directly use the Gemini AI summarizer:

const { summarizeWithAI } = require('bilingual-summarizer');

const summary = await summarizeWithAI('Your text to summarize.', 3, {
  apiKey: 'YOUR_GEMINI_API_KEY',
  objective: true // Optional, defined in the interface but not currently implemented in the prompt
});

console.log(summary); // AI-generated summary with 3 sentences

OpenAI Compatibility (Alternative Method)

If you prefer, you can also use the OpenAI library with Gemini models by setting the base URL. This approach might be useful if you're transitioning from OpenAI to Gemini:

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: "YOUR_GEMINI_API_KEY",
    baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/"
});

const response = await openai.chat.completions.create({
    model: "gemini-1.5-flash",
    messages: [
        { role: "system", content: "You are a summarization assistant." },
        { role: "user", content: "Summarize the following text: " + yourText },
    ],
});

console.log(response.choices[0].message.content);

Customizing Response Structure

You can specify which fields you want to include in the response using the responseStructure option. It supports three formats:

1. Array Format (Include Only)

const { summarize } = require('bilingual-summarizer');

// Only include specific fields in the response
const result = await summarize('Your text to summarize here.', {
  responseStructure: ['summary', 'language', 'sentiment']
});

console.log(result);
// {
//   ok: true, // 'ok' is always included unless explicitly excluded
//   summary: '...',
//   language: 'en',
//   sentiment: 'neutral'
// }

2. Object with Include Option

const result = await summarize('Your text to summarize here.', {
  responseStructure: {
    include: ['summary', 'language', 'sentiment']
  }
});

// Result is the same as the array format

3. Object with Exclude Option

const result = await summarize('Your text to summarize here.', {
  responseStructure: {
    exclude: ['topics', 'relatedTopics', 'difficulty']
  }
});

// Returns all fields except the excluded ones

Note: You cannot use both include and exclude options simultaneously in the same request. Doing so will throw an error.

Advanced Arabic Summarization

This package includes specialized support for Arabic text summarization:

const { summarizeArabic } = require('bilingual-summarizer');

// Use the dedicated Arabic summarizer
const summary = summarizeArabic('النص العربي الذي تريد تلخيصه هنا.', 3);
console.log(summary); // Returns a concise summary with 3 sentences

Optional Arabic NLP Libraries

For enhanced Arabic text processing, the package attempts to use several Arabic NLP libraries if they're available. The built-in Arabic processing will work well without these libraries, but they can enhance the results.

If you want to try optional Arabic NLP enhancements, you can install these packages:

npm install arabic-nlp@0.0.4 @flowdegree/arabic-strings arabic-persian-reshaper

Each library provides different capabilities:

  • arabic-nlp: Basic Arabic natural language processing utilities
  • @flowdegree/arabic-strings: Enhanced Arabic string manipulation and processing
  • arabic-persian-reshaper: Helps with proper rendering of Arabic characters (using either PersianShaper.convertArabic or ArabicShaper.convertArabic methods)

The package is designed to work even without these optional dependencies - it will automatically fall back to basic Arabic processing methods if the libraries aren't available.

API Reference

summarize(text, options)

Analyze and summarize the provided text.

Parameters:

  • text (string): The text to summarize.
  • options (object, optional): Configuration options.
    • sentenceCount (number): Number of sentences in the summary (default: 5).
    • title (string): Custom title for the summary.
    • includeTitleFromContent (boolean): Extract title from HTML content if available (default: true).
    • includeImage (boolean): Extract image URL from HTML content if available (default: true).
    • responseStructure (array | object): Control the response format:
      • As an array: Fields to include (e.g., ['summary', 'language', 'sentiment'])
      • As an object: With either include or exclude property (e.g., {include: ['summary']} or {exclude: ['topics']})
    • useAI (boolean): Whether to use Google Gemini AI for summarization (default: false).
    • gemini (object): Configuration for Gemini AI (required if useAI is true):
      • apiKey (string): Your Gemini API key from Google AI Studio.
      • model (string): The Gemini model to use (default: 'gemini-1.5-flash').
      • temperature (number): Controls creativity in the output (default: 0.2).
      • maxOutputTokens (number): Limits response length (default: 800).
      • objective (boolean): Defined in the interface for future implementation (not currently used).

Returns: A Promise that resolves to an object with the following properties (unless filtered by responseStructure):

  • ok (boolean): Whether the summarization was successful.
  • title (string): Title of the content (extracted or provided).
  • summary (string): The summarized text.
  • language (string): Detected language code.
  • languageName (string): Full name of the detected language.
  • sentiment (string): 'positive', 'negative', or 'neutral'.
  • topics (array): List of detected topics.
  • relatedTopics (array): List of related topics.
  • words (number): Word count of the original text.
  • sentences (number): Sentence count of the original text.
  • readingTime (number): Estimated reading time in minutes.
  • difficulty (string): 'easy', 'medium', or 'hard'.
  • image (string, optional): URL of the extracted image if available.

summarizeArabic(text, sentenceCount)

Directly summarize Arabic text using specialized techniques.

Parameters:

  • text (string): The Arabic text to summarize.
  • sentenceCount (number, optional): Number of sentences in the summary (default: 5).

Returns: A summarized version of the input text.

summarizeWithAI(text, sentenceCount, geminiConfig)

Directly summarize text using Google's Gemini AI models.

Parameters:

  • text (string): The text to summarize.
  • sentenceCount (number, optional): Number of sentences in the summary (default: 5).
  • geminiConfig (object): Configuration for Gemini AI:
    • apiKey (string): Your Gemini API key from Google AI Studio (required).
    • model (string): The Gemini model to use (default: 'gemini-1.5-flash').
    • temperature (number): Controls creativity in the output (default: 0.2).
    • maxOutputTokens (number): Limits response length (default: 800).
    • objective (boolean): Defined in the interface for future implementation (not currently used).

Returns: A Promise that resolves to the AI-generated summary.

Getting a Gemini API Key

To use the Gemini AI features:

  • Visit Google AI Studio
  • Create an account or sign in with your Google account
  • Navigate to the API keys section
  • Create a new API key
  • Copy the key and use it in your application

Available Gemini Models

Gemini offers several models with different capabilities and performance characteristics:

  • gemini-1.5-flash: Fastest model, good for most summarization tasks
  • gemini-1.5-pro: More powerful model for complex tasks
  • gemini-1.0-pro: Previous generation model
  • gemini-1.0-pro-vision: For processing images and text (if needed for future features)

For the latest model names and capabilities, see the Gemini documentation.

Troubleshooting

Arabic Libraries Messages

If you see the message:

Note: No specialized Arabic NLP libraries found. Using basic Arabic processing.

Don't worry - this is normal and does not affect functionality. The package has built-in basic Arabic processing that works well for most cases.

If you see 404 errors when installing optional Arabic libraries:

GET https://registry.npmjs.org/arabicjs - 404

This is because some of the Arabic NLP packages suggested may no longer be available or maintained. The package is designed to work without these libraries. You can safely ignore these errors, or try installing one of the other suggested libraries.

Gemini API Errors

If you encounter errors with Gemini API integration:

  • Make sure your API key is valid and hasn't expired
  • Check that you're using a correct model name (model names may change over time)
  • Verify your internet connection
  • The package will automatically fall back to regular summarization if Gemini is unavailable

References

This package implements techniques from academic research on Arabic text summarization:

  • Modified PageRank algorithm for Arabic text (Ahmed Soliman, 2019)
  • Advanced morphological analysis for Arabic text
  • Specialized keyword extraction for Arabic content

License

MIT

Keywords

summarizer

FAQs

Package last updated on 05 Apr 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts