concept-x-converter

0.3.7
PyPI

Maintainers: 1

ConecptX_Converter

This tool is designed for converting ConceptX activation files into Word2Vec format. The resulting Word2Vec file can be used with gensim library in Python to find semantically similar words.

This tool could be built as a binary file, or it could be run as a cargo project. The following sections describe how to build and run the tool.

Usage

Generate a word2vec file from a ConceptX activation file.

cargo run <input_file_name> <output_file_name>

Replace <input_file_name> with the name of your ConceptX activation file, and <output_file_name> with the name you would like to give to your Word2Vec file.

For example, if your input file is named my_conceptx_file.txt and you want to name your output file my_word2vec_file.txt, you would run the following command:

cargo run text.in.tok.sent_len.activations-layer11.json output.txt

The output file will have the following format:

<num_words> <dim>
word1:<line_num>:<position_num> val11 val12 val13 ... val1dim
word2:<line_num>:<position_num> val21 val22 val23 ... val2dim
...

Where num_words is the number of words in the input file, dim is the dimension of the word vectors, and valij is the jth value of the ith word vector. line_num and position_num are the line number and position number of the word in the original context (This is designed for the purpose of retrieving the original context of the word from the input file).

Then, in your Python code, you can load the word2vec file using the following code:

import gensim

model = gensim.models.KeyedVectors.load_word2vec_format('output.txt', binary=False)

# find the most similar word of the word "He", which is in the 0th line and 0th position of the original context

print(model.most_similar(positive=['He:0:0'], topn=5))

Build Binary

cargo build --release

Run Binary

./target/release/conceptx_converter <input_file_name> <output_file_name>

FAQs

What is concept-x-converter?

Is concept-x-converter well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

concept-x-converter

ConecptX_Converter

Usage

Build Binary

Run Binary

Related posts

Bybit Hack Puts Crypto Losses at $1.6B, Surpassing All of Last Year in Just Two Months

OpenSSF Launches Open Source Project Security Baseline to Strengthen Software Supply Chain