Links Classifier
Use Case
We want to filter links from a given webpage and classify them into different document types, like Privacy Policy, Terms of Service, etc.
Approach
We expose two functions, one for filtering the links, removing external, invalid and duplicate links, and another one for classifying the links into different document types.
Usage
const { filterLinks, classifyLinks, keywords } = require('links-classifier');
const links = document.querySelectorAll('a');
const filteredLinks = filterLinks(
links,
window.location,
['en', 'fr', 'it'],
false,
console.log
);
const classifiedLinks = classifyLinks(filteredLinks, keywords, 'fr');
console.log(classifiedLinks);
Data
This module imports its own dataset, located in data/keywords.js
, which contains variations for each document type. It is exposed as a symbol from the index, but you are free to use your own dataset.