Research
Security News
Threat Actor Exposes Playbook for Exploiting npm to Build Blockchain-Powered Botnets
A threat actor's playbook for exploiting the npm ecosystem was exposed on the dark web, detailing how to build a blockchain-powered botnet.
websitecategorization
Advanced tools
Website / Domain Categorization API is a Node.js module that uses machine learning model to classify arbitrary blocks of input texts or URLs into content categories.
Content categories are based on two taxonomies:
npm i @websitecategorization/websitecategorization
var request = require('request');
var options = {
'method': 'POST',
'url': 'https://www.websitecategorizationapi.com/api/gpt/gpt_category1.php',
'headers': {
'Content-Type': 'application/x-www-form-urlencoded'
},
form: {
'query': 'earphone buds'
}
};
request(options, function (error, response) {
if (error) throw new Error(error);
console.log(response.body);
});
Web Categorization API is used by a wide variety of companies for many different use cases.
It is suitable for Ad Exchanges, Demand Side Platforms (DSPs), Supply Side Platforms (SSPs) and Ad Networks. SSP (Supply Side Platform) companies can e.g. use it to identify the advertiser’s category to check its eligibility for real-time bidding.
Other use cases include Web Content Filtering where a company can employ it to filter out non-work related websites, such as social media networks, shopping platforms and similar.
Website / Domain categorization API is based on a machine learning model that has been extensively tested and used both in small and large scale classification projects, including on a project with 30+ million texts.
It is continuously developed and regularly updated (in terms of training data set) to reflect and include new verticals arising each year.
Text classification is usually automated as it is often used on use cases where the number of texts needed to be classified number in millions.
For this reasons, we most often machine learning models for text classifications.
In early period of machine learning, the most common models used for text classification were ranging from simpler ones, like Naive Bayes to more complex ones like Random Forests, Support Vector Machines and Logistic Regression.
Support Vector Machines are especially good in terms of accuracy and f1 scores achieved, it however has a downside in that the complexity of training a SVM model rapidly increases with number of texts in training dataset.
In last decade, with the rise of neural networks, more text classification models utilize the deep neural networks for this purpose. Earlier deep neural networks for text classification were often based on LSTM neural net. In recent times there have been other neural network architectures successfuly used for text classification.
Authors in this highly cited paper: https://arxiv.org/pdf/1803.01271.pdf researched convolutional neural networks for text classification and came to conclusion that even a simple convolutional archi- tecture outperforms canonical recurrent networks as the previously mentioned LSTMs across different classification tasks.
The NN model in question can be accessed here: https://github.com/locuslab/TCN, with keras implementation available at https://github.com/philipperemy/keras-tcn.
Website categorization service can also be used in form of dashboard UI, as seen here:
Example output from IAB1 Website Categorization API on example domain:
{
"classification": [
{
"category": "Style & Fashion",
"value": 0.6335134346543948
},
{
"category": "Religion & Spirituality",
"value": 0.31965677636420087
},
{
"category": "Events and Attractions",
"value": 0.028203161466589827
},
{
"category": "Pop Culture",
"value": 0.008486557302356994
},
{
"category": "Books and Literature",
"value": 0.0028975322143729425
},
{
"category": "Shopping",
"value": 0.0014989265842864407
},
{
"category": "Fine Art",
"value": 0.0014698938766846063
},
{
"category": "Family and Relationships",
"value": 0.0008695569530150543
},
{
"category": "Hobbies & Interests",
"value": 0.0007021051093678122
},
{
"category": "Travel",
"value": 0.00045551400716377827
},
{
"category": "Movies",
"value": 0.0003105774008160576
},
{
"category": "Television",
"value": 0.0002812439624312471
},
{
"category": "Healthy Living",
"value": 0.00027001968240167887
},
{
"category": "Careers",
"value": 0.0002666186301324818
},
{
"category": "Food & Drink",
"value": 0.0002460227720972317
},
{
"category": "Home & Garden",
"value": 0.00021331353597162862
},
{
"category": "Medical Health",
"value": 0.00018344636503169902
},
{
"category": "Music and Audio",
"value": 0.00007348860474246987
},
{
"category": "Video Gaming",
"value": 0.00006822010822593386
},
{
"category": "Real Estate",
"value": 0.00006517844821148466
},
{
"category": "Pets",
"value": 0.00006069812911973799
},
{
"category": "Education",
"value": 0.00004860296854985923
},
{
"category": "News and Politics",
"value": 0.000035123587801619264
},
{
"category": "Sports",
"value": 0.00003402965849228489
},
{
"category": "Science",
"value": 0.000026461875107857055
},
{
"category": "Automotive",
"value": 0.000024825949895016523
},
{
"category": "Personal Finance",
"value": 0.00001581204114251354
},
{
"category": "Technology & Computing",
"value": 0.000015037047929356491
},
{
"category": "Business and Finance",
"value": 0.000007820699466562138
}
],
"language": "en"
}
Supported API calls (in curl) that can be adapted to javascript:
curl --location --request POST 'https://www.websitecategorizationapi.com/api/gpt/gpt_category1.php' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'query=polaroid land camera' \
--data-urlencode 'api_key=b4dcde2ce5fb2d0b887b5e'
curl --location --request POST 'https://www.websitecategorizationapi.com/api/gpt/gpt_category2.php' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'query=polaroid land camera' \
--data-urlencode 'api_key=b4dcde2ce5fb2d0b887b5e'
curl --location --request POST 'https://www.websitecategorizationapi.com/api/gpt/gpt_category3.php' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'query=polaroid land camera' \
--data-urlencode 'api_key=b4dcde2ce5fb2d0b887b5e'
curl --location --request POST 'https://www.websitecategorizationapi.com/api/iab/gpt_category1.php' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'query=credit card' \
--data-urlencode 'api_key=b4dcde2ce5fb2d0b887b5e'
curl --location --request POST 'https://www.websitecategorizationapi.com/api/iab/gpt_category2.php' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'query=credit card' \
--data-urlencode 'api_key=b4dcde2ce5fb2d0b887b5e'
Service supports website categorization of texts written in german, french, italian, spanish, portuguese and many other languages.
IAB taxonomy: https://iabtechlab.com/press-releases/tech-lab-releases-content-taxonomy-3-0/
Facebook Taxonomy: https://www.facebook.com/business/help/526764014610932?id=725943027795860
Survey of text classification models: https://github.com/kk7nc/Text_Classification
Introduction to product classification machine learning models:
FAQs
Unknown package
We found that websitecategorization demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A threat actor's playbook for exploiting the npm ecosystem was exposed on the dark web, detailing how to build a blockchain-powered botnet.
Security News
NVD’s backlog surpasses 20,000 CVEs as analysis slows and NIST announces new system updates to address ongoing delays.
Security News
Research
A malicious npm package disguised as a WhatsApp client is exploiting authentication flows with a remote kill switch to exfiltrate data and destroy files.