Extract most important keywords from your content using Natural tf-idf. From their docs:
Term Frequency–Inverse Document Frequency (tf-idf) is implemented to determine how important a word (or words) is to a document relative to a corpus. The following formulas are used for calculating tf and idf:
- tf(t, d) is a so-called raw count, so just the count of the term in the document
- idf(t, D) uses the following formula: 1 + ln(N / (1 + n*t)) where N is the number of documents, and n_t the number of documents in which the term appears. The 1 + in the denominator is for handling the possibility that n_t is 0.
In our context, N
is just 1, your page/post content.
Supports both MD and MDX format.
Table of Contents
Installation
npm install --save gatsby-remark-extract-keywords
or
yarn add gatsby-remark-extract-keywords
It has gatsby as peerDependency
.
Usage
In your gatsby-config.js
:
plugins: [
{
resolve: `gatsby-transformer-remark`,
options: {
plugins: [`gatsby-remark-extract-keywords`],
},
},
];
This creates a new field on each MD/MDX node called keywords
, you can use it on your GraphQL query:
query ListingQuery {
allMarkdownRemark(sort: {fields: [frontmatter___date], order: DESC}) {
edges {
node {
id
frontmatter {
title
}
fields {
keywords
}
}
}
}
}
blacklist
option as function
This will only return keywords with keyword length higher than 5.
const filterKeywords = term => term.length > 5;
plugins: [
{
resolve: `gatsby-transformer-remark`,
options: {
plugins: [
{
resolve: `gatsby-remark-extract-keywords`,
options: {
blacklist: filterKeywords,
},
},
],
},
},
];
Options
Option | Description |
---|
max | Maximum number of keywords to return |
blacklist | String, array of strings or function to blacklist terms. If function, is used as filter parameter. |
Contributors ✨
Thanks goes to these wonderful people (emoji key):
This project follows the all-contributors specification. Contributions of any kind welcome!