Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

mapintel

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

mapintel

MapIntel is a system for acquiring intelligence from vast collections of text data by representing each document as a multidimensional vector that captures its own semantics. The system is designed to handle complex Natural Language queries and visual exploration of the corpus.

  • 1.0
  • PyPI
  • Socket score

Maintainers
1

MapIntel

ci doc

CategoryTools
Developmentblack ruff mypy docformatter
Packageversion pythonversion downloads
Documentationmkdocs
Communicationgitter discussions

Introduction

MapIntel is a system for acquiring intelligence from vast collections of text data by representing each document as a multidimensional vector that captures its semantics. The system is designed to handle complex Natural Language queries while it provides Question-Answering functionality. Additionally, it allows for a visual exploration of the corpus. The MapIntel uses a retriever engine that first finds the closest neighbors to the query embedding and identifies the most relevant documents. It also leverages the embeddings by projecting them onto two dimensions while preserving the multidimensional landscape, resulting in a map where semantically related documents form topical clusters which we capture using topic modeling. This map aims to promote a fast overview of the corpus while allowing a more detailed exploration and interactive information encountering process. MapIntel can be used to explore many types of corpora.

MapIntel UI screenshot

Installation

For user installation, mapintel is currently available on the PyPi's repository, and you can install it via pip:

pip install mapintel

Development installation requires cloning the repository and then using PDM to install the project as well as the main and development dependencies:

git clone https://github.com/NOVA-IMS-Innovation-and-Analytics-Lab/MapIntel.git
cd mapintel
pdm install

Configuration

MapIntel aims to be a flexible system that can run with any user provided corpus. In order to achieve this goal, it standardizes the data and models, while the deployment of all services is expected to be on AWS. An example of how to fully set up a MapIntel instance can be found at MapIntel-News. After deploying the required services, a file .env should be created at the root of the project with environmental variables that are described below.

AWS credentials

The following environmental variable should be included in the .env file:

  • AWS_PROFILE_NAME

The user should have permissions to interact with the services described below.

Data

An OpenSearch database instance should be deployed in AWS with documents contained in an index called document. Each document is expected to have the content, date, embedding, embedding2d and topic fields with the following types:

  • content: text type that contains the main text of the document.
  • date: long type that represents the ordinal format of a date.
  • embedding: knn_vector type that represents the embedding vector of the document.
  • embedding2d: float type that represents the 2D embedding vector of the document.
  • topic: keyword type that assigns a topic label to each document.

The relevant environmental variables are the following:

  • OPENSEARCH_ENDPOINT: The AWS endpoint of the OpenSearch deployed instance.
  • OPENSEARCH_PORT: The port of the instance.
  • OPENSEARCH_USERNAME: The username.
  • OPENSEARCH_PASSWORD: The password.

Models

MapIntel uses three models trained on the user provided data. The first is a Haystack retriever model, the second is a model that reduces the dimensions of the embeddings to 2D, while the third is a generator model used for question-answering. The corresponding environmental variables are the following:

  • HAYSTACK_RETRIEVER_MODEL: The value of the parameter embedding_model of the Haystack class EmbeddingRetriever.
  • SAGEMAKER_DIMENSIONALITY_REDUCTIONER_ENDPOINT: The SageMaker endpoint of the deployed dimensionality reductioner.
  • SAGEMAKER_GENERATOR_MODEL_ENDPOINT: The SageMaker endpoint of the deployed generator.

Usage

To run the application use the following command:

mapintel

Then the server starts and listens to connections at http://localhost:8080. You may open the browser and use this URL to interact with the MapIntel UI.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc