New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More
Socket
Sign inDemoInstall
Socket

concept-x-converter

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

concept-x-converter

  • 0.3.7
  • PyPI
  • Socket score

Maintainers
1

ConecptX_Converter

This tool is designed for converting ConceptX activation files into Word2Vec format. The resulting Word2Vec file can be used with gensim library in Python to find semantically similar words.

This tool could be built as a binary file, or it could be run as a cargo project. The following sections describe how to build and run the tool.

Usage

Generate a word2vec file from a ConceptX activation file.

cargo run <input_file_name> <output_file_name>

Replace <input_file_name> with the name of your ConceptX activation file, and <output_file_name> with the name you would like to give to your Word2Vec file.

For example, if your input file is named my_conceptx_file.txt and you want to name your output file my_word2vec_file.txt, you would run the following command:

cargo run text.in.tok.sent_len.activations-layer11.json output.txt

The output file will have the following format:

<num_words> <dim>
word1:<line_num>:<position_num> val11 val12 val13 ... val1dim
word2:<line_num>:<position_num> val21 val22 val23 ... val2dim
...

Where num_words is the number of words in the input file, dim is the dimension of the word vectors, and valij is the jth value of the ith word vector. line_num and position_num are the line number and position number of the word in the original context (This is designed for the purpose of retrieving the original context of the word from the input file).

Then, in your Python code, you can load the word2vec file using the following code:

import gensim

model = gensim.models.KeyedVectors.load_word2vec_format('output.txt', binary=False)

# find the most similar word of the word "He", which is in the 0th line and 0th position of the original context

print(model.most_similar(positive=['He:0:0'], topn=5))

Build Binary

cargo build --release

Run Binary

./target/release/conceptx_converter <input_file_name> <output_file_name>

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc