Security News
GitHub Removes Malicious Pull Requests Targeting Open Source Repositories
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
@ndpnt/open-terms-archive
Advanced tools
Tracks and makes visible changes to the Terms of Service of online service
Services have terms that can change over time. Open Terms Archive enables users rights advocates, regulatory bodies and any interested citizen to follow the changes to these terms by being notified whenever a new version is published, and exploring their entire history.
Les services ont des conditions générales qui évoluent dans le temps. Open Terms Archive permet aux défenseurs des droits des utilisateurs, aux régulateurs et à toute personne intéressée de suivre les évolutions de ces conditions générales en étant notifiée à chaque publication d'une nouvelle version, et en explorant leur historique.
Note: Words in bold are business domain names.
Services are declared within Open Terms Archive with a declaration file listing all the documents that, together, constitute the terms under which this service can be used. These documents all have a type, such as “terms and conditions”, “privacy policy”, “developer agreement”…
In order to track their changes, documents are periodically obtained by fetching a web location and selecting content within the web page to remove the noise (ads, navigation menu, login fields…). Beyond selecting a subset of a page, some documents have additional noise (hashes in links, CSRF tokens…) that would be false positives for changes. Open Terms Archive thus supports specific filters for each document.
However, the shape of that noise can change over time. In order to recover in case of information loss during the noise filtering step, a snapshot is recorded every time there is a change. After the noise is filtered out from the snapshot, if there are changes in the resulting document, a new version of the document is recorded.
Anyone can run their own private instance and track changes on their own. However, we also publish each version on a public instance that makes it easy to explore the entire history and enables notifying over email whenever a new version is recorded. Users can subscribe to notifications.
Note: For now, when multiple versions coexist, terms are only tracked in their English version and for the European jurisdiction.
We offer a public database of versions recorded each time there is a change in the terms of service and other contractual documents of tracked services: contrib-versions.
From the repository homepage contrib-versions, open the folder of the service of your choice (e.g. WhatsApp).
You will see the set of documents tracked for that service, now click on the document of your choice (e.g. WhatsApp's Privacy Policy). The latest version (updated hourly) will be displayed.
To view the history of changes made to this document, click on History at the top right of the document (for our previous example). The changes are ordered by date, with the latest first.
Click on a change to see what it consists of (for example this one). There are two types of display you can choose from the icons in the gray bar above the document.
You can go on the official front website opentermsarchive.org. From there, you can select a service and then the corresponding document type. After you enter your email and click on subscribe, we will add your email to the correspondning mailing list in SendInBlue and will not store your email anywhere else. Then, everytime a modification is found on the correspondning document, we will send you an email.
You can unsubscribe at any moment by clicking on the unsubscribe
link at the bottom of the received email.
You can subscribe to receive an email whenever a document is updated in the database.
Beware, you are likely to receive a large amount of notifications! You can unsubscribe by replying to any email you will receive.
You can receive notification for a specific service or document by subscribing to RSS feeds.
An RSS feed is a type of web page that contains information about the latest content published by a website, such as the date of publication and the address where you can view it. When this resource is updated, a feed reader app automatically notifies you and you can see the update.
To find out the address of the RSS feed you want to subscribe to:
https://github.com/OpenTermsArchive/contrib-versions/commits/main/WhatsApp/Privacy%20Policy.md
..atom
at the end of this address. In the WhatsApp example, this would become https://github.com/OpenTermsArchive/contrib-versions/commits/main/WhatsApp/Privacy%20Policy.md.atom
.Updated for | URL |
---|---|
all services and documents | https://github.com/OpenTermsArchive/contrib-versions/commits.atom |
all the documents of a service | Replace $serviceId with the service ID:https://github.com/OpenTermsArchive/contrib-versions/commits/main/$serviceId.atom. |
a specific document of a service | Replace $serviceId with the service ID and $documentType with the document type:https://github.com/OpenTermsArchive/contrib-versions/commits/main/$serviceId/$documentType.md.atom |
For example:
Facebook
documents, the URL is https://github.com/OpenTermsArchive/contrib-versions/commits/main/Facebook.atom
.Privacy Policy
from Google
, the URL is https://github.com/OpenTermsArchive/contrib-versions/commits/main/Google/Privacy%20Policy.md.atom
.Open Terms Archive exposes a JavaScript API to make some of its capabilities available in NodeJS. You can install it as an NPM module:
npm install "ambanum/OpenTermsArchive#main"
The following commands are available where the package is installed:
./node_modules/.bin/ota-lint-declarations [service_id]...
: check and normalise the format of declarations../node_modules/.bin/ota-validate-declarations [service_id]...
: validate declarations../node_modules/.bin/ota-track [service_id]...
: track services. Recorded snapshots and versions will be stored in the data
folder at the root of the module where the package is installed.In order to have them available globally in your command line, install it with the --global
option.
The fetch
module gets the MIME type and content of a document from its URL.
You can use it in your code by using import fetch from 'open-terms-archive/fetch';
.
Documentation on how to use fetch
is provided as JSDoc within ./src/archivist/fetcher/index.js.
If you plan to use executeClientScripts
as a parameter of fetch
, the fetching will be done using a headless browser.
In order to not instantiate this browser at each fetch, the starting and stopping of the browser is your responsibility.
Here is an example on how to use it:
import fetch, { launchHeadlessBrowser, stopHeadlessBrowser } from 'open-terms-archive/fetch';
await launchHeadlessBrowser();
await fetch({ executeClientScripts: true, ... });
await fetch({ executeClientScripts: true, ... });
await fetch({ executeClientScripts: true, ... });
await stopHeadlessBrowser();
The fetch
module can also be configured as a node-config
submodule.
If node-config
is used in the project, the default fetcher
configuration can be overridden by adding a fetcher
object to the local config. See Configuration file for full reference.
The filter
module transforms HTML or PDF content into a Markdown string.
It will filter content based on the document declaration.
You can use the filter in your code by using import filter from 'open-terms-archive/filter';
.
The filter
function documentation is available as JSDoc within ./src/archivist/filter/index.js.
PageDeclaration object is used to describe a page to be tracked by Open Terms Archive.
You can use the page-declaration in your code by using import pageDeclaration from 'open-terms-archive/page-declaration';
.
This module is built with Node and is tested on macOS, UNIX and Windows. You will need to install Node >= v16.x to run it.
git@github.com:OpenTermsArchive/contrib-declarations.git
.cd contrib-declarations; npm install
../declarations/
folder, following these instructions.npm test
.npm test $serviceId
, e.g., npm test HER
.npm run test:schema $serviceId
, e.g., npm run test:schema HER
.InaccessibleContentError
: Your selector is wrong and should be fixed.TypeError
: The file declaration is invalid.Testing works with multiple tests (e.g., checking the validity of the file, that the URL is correct and reachable, that the content is correctly gathered, etc.); as it may take a bit of time, that's why you may want to use npm run test:schema
.
When refering to the base folder, it means the folder where you will be git pull
ing everything.
git clone git@github.com:ambanum/OpenTermsArchive.git
.cd contrib-declarations; npm install
.dating-declarations
), create a new config file, config/development.json
, and add:
{
"services": {
"declarationsPath": "../<name of the repo>/declarations"
}
}
e.g.,
{
"services": {
"declarationsPath": "../dating-declarations/declarations"
}
}
OpenTermsArchive
), use npm start
.
data/
.git pull
to have the latest updates, both in the core engine and in the declarations repos.npm install
in the declarations repo at least once, and a least once each time package.json
changes.snapshots
and versions
in data/
.You can clone as many declarations repositories as you want. The one that will be loaded at execution will be defined through configuration.
The default configuration can be found in config/default.json
. The full reference is given below. You are unlikely to want to edit all of these elements.
{
"services": {
"declarationsPath": "Directory containing services declarations and associated filters"
},
"recorder": {
"versions": {
"storage": {
"<storage-repository>": "Storage repository configuration object; see below"
}
},
"snapshots": {
"storage": {
"<storage-repository>": "Storage repository configuration object; see below"
}
}
},
"fetcher": {
"waitForElementsTimeout": "Maximum time (in milliseconds) to wait for elements to be present in the page when fetching document in a headless browser"
"navigationTimeout": "Maximum time (in milliseconds) to wait for page to load",
"language": "Language (in ISO 639-1 format) to pass in request headers"
},
"notifier": { // Notify specified mailing lists when new versions are recorded
"sendInBlue": { // SendInBlue API Key is defined in environment variables, see the “Environment variables” section below
"updatesListId": "SendInBlue contacts list ID of persons to notify on document updates",
"updateTemplateId": "SendInBlue email template ID used for updates notifications"
}
},
"logger": { // Logging mechanism to be notified upon error
"smtp": {
"host": "SMTP server hostname",
"username": "User for server authentication" // Password for server authentication is defined in environment variables, see the “Environment variables” section below
},
"sendMailOnError": { // Can be set to `false` if you do not want to send email on error
"to": "The address to send the email to in case of an error",
"from": "The address from which to send the email",
"sendWarnings": "Boolean. Set to true to also send email in case of warning",
}
},
"tracker": { // Tracking mechanism to create GitHub issues when document content is inaccessible
"githubIssues": {
"repository": "GitHub repository where to create isssues",
"label": {
"name": "Label to attach to bot-created issues. This specific label will be created automatically in the target repository",
"color": "The hexadecimal color code for the label, without the leading #",
"description": "A short description of the label"
}
}
},
"dataset": { // Release mechanism to create dataset periodically
"title": "Title of the dataset; recommended to be the name of the instance that generated it",
"versionsRepositoryURL": "GitHub repository where the dataset will be published as a release; recommended to be the versions repository for discoverability and tagging purposes"
}
}
The default configuration is merged with (and overridden by) environment-specific configuration that can be specified at startup with the NODE_ENV
environment variable. For example, you would run NODE_ENV=development npm start
to load the development.json
configuration file.
If you want to change your local configuration, we suggest you create a config/development.json
file with overridden values. Example production configuration files can be found in the config
folder.
Two storage repositories are currently supported: Git and MongoDB. Each one can be used independently for versions and snapshots.
{
…
"storage": {
"git": {
"path": "Versions database directory path, relative to the root of this project",
"publish": "Boolean. Set to true to push changes to the origin of the cloned repository at the end of every run. Recommended for production only.",
"snapshotIdentiferTemplate": "Text. Template used to explicit where to find the referenced snapshot id. Must contain a %SNAPSHOT_ID that will be replaced by the snapshot ID. Only useful for versions",
"author": {
"name": "Name to which changes in tracked documents will be credited",
"email": "Email to which changes in tracked documents will be credited"
}
}
}
…
}
{
…
"storage": {
"mongo": {
"connectionURI": "URI for defining connection to the MongoDB instance. See https://docs.mongodb.com/manual/reference/connection-string/",
"database": "Database name",
"collection": "Collection name"
}
}
…
}
Environment variables can be passed in the command-line or provided in a .env
file at the root of the repository. See .env.example
for an example of such a file.
SMTP_PASSWORD
: a password for email server authentication, in order to send email notifications.SENDINBLUE_API_KEY
: a SendInBlue API key, in order to send email notifications with that service.GITHUB_TOKEN
: a token with repository privileges to access the GitHub API.If your infrastructure requires using an outgoing HTTP/HTTPS proxy to access the Internet, you can provide it through the HTTP_PROXY
and HTTPS_PROXY
environment variable.
To get the latest versions of all documents:
npm start
The latest version of a document will be available in the versions path defined in your configuration, under $versions_folder/$service_provider_name/$document_type.md
.
To update documents automatically:
npm run start:scheduler
To get the latest version of a specific service's terms:
npm start <service_id>
The service ID is the case sensitive name of the service declaration file without the extension. For example, for
Twitter.json
, the service ID is
See Ops Readme.
To generate a dataset:
npm run dataset:generate
To release a dataset:
npm run dataset:release
To weekly release a dataset:
npm run dataset:scheduler
Thanks for wanting to contribute! There are different ways to contribute to Open Terms Archive. We describe the most common below. If you want to explore other venues for contributing, please contact us over email (contact@[our domain name]) or Twitter.
See the CONTRIBUTING of repository OpenTermsArchive/contrib-declarations
. You will need knowledge of JSON and web DOM.
To contribute to the core engine of Open Terms Archive, see the CONTRIBUTING file of this repository. You will need knowledge of JavaScript and NodeJS.
Beyond individual contributions, we need funds and committed partners to pay for a core team to maintain and grow Open Terms Archive. If you know of opportunities, please let us know! You can find on our website an up-to-date list of the partners and funders that make Open Terms Archive possible.
The code for this software is distributed under the European Union Public Licence (EUPL) v1.2. Contact the author if you have any specific need or question regarding licensing.
FAQs
Tracks and makes visible changes to the Terms of Service of online service
The npm package @ndpnt/open-terms-archive receives a total of 0 weekly downloads. As such, @ndpnt/open-terms-archive popularity was classified as not popular.
We found that @ndpnt/open-terms-archive demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
Security News
RubyGems.org has added a new "maintainer" role that allows for publishing new versions of gems. This new permission type is aimed at improving security for gem owners and the service overall.
Security News
Node.js will be enforcing stricter semver-major PR policies a month before major releases to enhance stability and ensure reliable release candidates.