Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
MapIntel is a system for acquiring intelligence from vast collections of text data by representing each document as a multidimensional vector that captures its own semantics. The system is designed to handle complex Natural Language queries and visual exploration of the corpus.
Category | Tools |
---|---|
Development | |
Package | |
Documentation | |
Communication |
MapIntel is a system for acquiring intelligence from vast collections of text data by representing each document as a multidimensional vector that captures its semantics. The system is designed to handle complex Natural Language queries while it provides Question-Answering functionality. Additionally, it allows for a visual exploration of the corpus. The MapIntel uses a retriever engine that first finds the closest neighbors to the query embedding and identifies the most relevant documents. It also leverages the embeddings by projecting them onto two dimensions while preserving the multidimensional landscape, resulting in a map where semantically related documents form topical clusters which we capture using topic modeling. This map aims to promote a fast overview of the corpus while allowing a more detailed exploration and interactive information encountering process. MapIntel can be used to explore many types of corpora.
For user installation, mapintel
is currently available on the PyPi's repository, and you can install it via pip
:
pip install mapintel
Development installation requires cloning the repository and then using PDM to install the project as well as the main and development dependencies:
git clone https://github.com/NOVA-IMS-Innovation-and-Analytics-Lab/MapIntel.git
cd mapintel
pdm install
MapIntel aims to be a flexible system that can run with any user provided corpus. In order to achieve this goal, it standardizes
the data and models, while the deployment of all services is expected to be on AWS. An example of how to fully set up a MapIntel
instance can be found at MapIntel-News. After deploying
the required services, a file .env
should be created at the root of the project with environmental variables that are described
below.
The following environmental variable should be included in the .env
file:
AWS_PROFILE_NAME
The user should have permissions to interact with the services described below.
An OpenSearch database instance should be deployed in AWS with documents contained in an index called document
. Each document is
expected to have the content
, date
, embedding
, embedding2d
and topic
fields with the following types:
content
: text type that contains the main text of the document.date
: long
type that represents the ordinal format of a date.embedding
: knn_vector
type that represents the embedding vector of the document.embedding2d
: float
type that represents the 2D embedding vector of the document.topic
: keyword
type that assigns a topic label to each document.The relevant environmental variables are the following:
OPENSEARCH_ENDPOINT
: The AWS endpoint of the OpenSearch deployed instance.OPENSEARCH_PORT
: The port of the instance.OPENSEARCH_USERNAME
: The username.OPENSEARCH_PASSWORD
: The password.MapIntel uses three models trained on the user provided data. The first is a Haystack retriever model, the second is a model that reduces the dimensions of the embeddings to 2D, while the third is a generator model used for question-answering. The corresponding environmental variables are the following:
HAYSTACK_RETRIEVER_MODEL
: The value of the parameter embedding_model
of the Haystack class EmbeddingRetriever
.SAGEMAKER_DIMENSIONALITY_REDUCTIONER_ENDPOINT
: The SageMaker endpoint of the deployed dimensionality reductioner.SAGEMAKER_GENERATOR_MODEL_ENDPOINT
: The SageMaker endpoint of the deployed generator.To run the application use the following command:
mapintel
Then the server starts and listens to connections at http://localhost:8080
. You may open the browser and use this URL to
interact with the MapIntel UI.
FAQs
MapIntel is a system for acquiring intelligence from vast collections of text data by representing each document as a multidimensional vector that captures its own semantics. The system is designed to handle complex Natural Language queries and visual exploration of the corpus.
We found that mapintel demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.