New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

nagisa-bert

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

nagisa-bert

A BERT model for nagisa

0.0.4
PyPI

Maintainers: 1

nagisa_bert

This library provides a tokenizer to use a Japanese BERT model for nagisa. The model is available in Transformers 🤗.

You can try fill-mask using nagisa_bert at Hugging Face Space.

Install

Python 3.7+ on Linux or macOS is required. You can install nagisa_bert by using the pip command.

$ pip install nagisa_bert

Usage

This model is available in Transformer's pipeline method.

from transformers import pipeline
from nagisa_bert import NagisaBertTokenizer

text = "nagisaで[MASK]できるモデルです"
tokenizer = NagisaBertTokenizer.from_pretrained("taishi-i/nagisa_bert")
fill_mask = pipeline("fill-mask", model='taishi-i/nagisa_bert', tokenizer=tokenizer)
print(fill_mask(text))

[{'score': 0.1385931372642517,
  'sequence': 'nagisa で 使用 できる モデル です',
  'token': 8092,
  'token_str': '使 用'},
 {'score': 0.11947669088840485,
  'sequence': 'nagisa で 利用 できる モデル です',
  'token': 8252,
  'token_str': '利 用'},
 {'score': 0.04910655692219734,
  'sequence': 'nagisa で 作成 できる モデル です',
  'token': 9559,
  'token_str': '作 成'},
 {'score': 0.03792576864361763,
  'sequence': 'nagisa で 購入 できる モデル です',
  'token': 9430,
  'token_str': '購 入'},
 {'score': 0.026893319562077522,
  'sequence': 'nagisa で 入手 できる モデル です',
  'token': 11273,
  'token_str': '入 手'}]

Tokenization and vectorization.

from transformers import BertModel
from nagisa_bert import NagisaBertTokenizer

text = "nagisaで[MASK]できるモデルです"
tokenizer = NagisaBertTokenizer.from_pretrained("taishi-i/nagisa_bert")
tokens = tokenizer.tokenize(text)
print(tokens)
# ['na', '##g', '##is', '##a', 'で', '[MASK]', 'できる', 'モデル', 'です']

model = BertModel.from_pretrained("taishi-i/nagisa_bert")
h = model(**tokenizer(text, return_tensors="pt")).last_hidden_state
print(h)

tensor([[[-0.2912, -0.6818, -0.4097,  ...,  0.0262, -0.3845,  0.5816],
         [ 0.2504,  0.2143,  0.5809,  ..., -0.5428,  1.1805,  1.8701],
         [ 0.1890, -0.5816, -0.5469,  ..., -1.2081, -0.2341,  1.0215],
         ...,
         [-0.4360, -0.2546, -0.2824,  ...,  0.7420, -0.2904,  0.3070],
         [-0.6598, -0.7607,  0.0034,  ...,  0.2982,  0.5126,  1.1403],
         [-0.2505, -0.6574, -0.0523,  ...,  0.9082,  0.5851,  1.2625]]],
       grad_fn=<NativeLayerNormBackward0>)

Tutorial

You can find here a list of the notebooks on Japanese NLP using pre-trained models and transformers.

Notebook	Description
Fill-mask	How to use the pipeline function in transformers to fill in Japanese text.
Feature-extraction	How to use the pipeline function in transformers to extract features from Japanese text.
Embedding visualization	Show how to visualize embeddings from Japanese pre-trained models.
How to fine-tune a model on text classification	Show how to fine-tune a pretrained model on a Japanese text classification task.
How to fine-tune a model on text classification with csv files	Show how to preprocess the data and fine-tune a pretrained model on a Japanese text classification task.

Model description

Architecture

The model architecture is the same as the BERT bert-base-uncased architecture (12 layers, 768 dimensions of hidden states, and 12 attention heads).

Training Data

The models is trained on the Japanese version of Wikipedia. The training corpus is generated from the Wikipedia Cirrussearch dump file as of August 8, 2022 with make_corpus_wiki.py and create_pretraining_data.py.

Training

The model is trained with the default parameters of transformers.BertConfig. Due to GPU memory limitations, the batch size is set to small; 16 instances per batch, and 2M training steps.

Keywords

FAQs

What is nagisa-bert?

Is nagisa-bert well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

nagisa-bert

nagisa_bert

Install

Usage

Tutorial

Model description

Architecture

Training Data

Training

Keywords

Related posts

Bybit Hack Puts Crypto Losses at $1.6B, Surpassing All of Last Year in Just Two Months

OpenSSF Launches Open Source Project Security Baseline to Strengthen Software Supply Chain