New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

reffy

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

reffy

W3C/WHATWG spec dependencies exploration companion. Features a short set of tools to study spec references as well as WebIDL term definitions and references found in W3C specifications.

1.0.0
Source
npm

Version published: 7 years ago

Weekly downloads: 468; decreased by-25.71%

Maintainers: 1

Weekly downloads

Created: 7 years ago

Source

Reffy

Reffy is your W3C spec dependencies exploration companion. It features a short set of tools to study spec references as well as WebIDL term definitions and references found in W3C specifications.

See published reports for human-readable examples of reports generated by Reffy.

How to use

To launch the crawler and the report study tool, follow these steps:

Pre-requisites: Git, Node.js, a W3C account, an API key for the W3C API, and Pandoc if you want to generate an HTML version of the report.
Clone the repository: git clone git@github.com:tidoust/reffy.git
From the root folder of reffy, install required dependencies: npm install
Create a config.json file, initialized with { "w3cApiKey": [API key] }
To produce a W3C-centric vision of the Web platform using Editor's Drafts, run npm run w3c.
To produce a W3C-centric vision of the Web platform using latest published versions in /TR/, run npm run w3c-tr.
To produce a WHATWG-centric vision of the Web platofrm, run npm run whatwg.

Under the hoods, these commands run the following steps (and related commands) in turn:

Crawling: Crawls a list of spec and outputs relevant information in a JSON structure in the specified folder. node crawl-specs.js ./specs-w3c.json ./reports/w3c [tr]. Add tr to tell the crawler to load the latest published version of TR specifications instead of the latest Editor's Draft.
Analysis: Analyses the result of the crawling step, and produces a study report. node study-crawl.js ./reports/w3c/crawl.json [url]. When the url parameter is given, the resulting analysis will only contain the results for the spec at that URL (multiple URLs may be given as a comma-separated value list without spaces). You will probably want to redirect the output to a file, e.g. using node study-crawl.js ./reports/w3c/crawl.json > reports/w3c/study.json.
Markdown report generation: Produces a human-readable report in Markdown format out of the report returned by the analysis step, or directly out of results of the crawling step. node generate-report.js ./reports/w3c/study.json [perspec|dep]. By default, the tool generates a report per anomaly, pass perspec to create a report per specification and dep to generate a dependencies report. You will probably want to redirect the output to a file, e.g. using node generate-report.js ./reports/w3c/study.json > reports/w3c/index.md.
Conversion to HTML: Takes the Markdown analysis per specification and prepares an HTML report with expandable sections. pandoc reports/w3c/index.md -f markdown -t html5 --section-divs -s --template report-template.html -o reports/w3c/index.html (where report.md is the Markdown report)
Diff with latest published version of the crawl report: Compares the crawl results with the latest published crawl results and produce a human-readable diff in Markdown format. node generate-report.js ./reports/w3c/crawl.json diff https://tidoust.github.io/reffy-reports/w3c/crawl.json

Some notes:

The crawler may take some time
The crawler uses a local cache for HTTP exchanges. It will create and fill a cache subfolder in particular.
The ./ prefix is needed to point the crawler and study tools at local files for the time being (one of the many things to improve in the code!)

Reffy's tools

Specs crawler

Reffy's crawler takes an initial list of spec URLs as input and generates a machine-readable report with facts about each spec, including:

Generic information such as the title of the spec or the URL of the Editor's Draft. This information is typically extracted from the W3C API.
The list of normative/informative references found in the spec.
Extended information about WebIDL term definitions and references that the spec contains

Study tool

Reffy's report study tool takes the machine-readable report generated by the crawler, and creates a study report of potential anomalies found in the report. The study report can then easily be converted to a human-readable Markdown report. Reported potential anomalies are:

specs that do not seem to reference any other spec normatively;
specs that define WebIDL terms but do not normatively reference the WebIDL spec;
specs that contain invalid WebIDL terms definitions;
specs that use obsolete WebIDL constructs (e.g. [] instead of FrozenArray);
specs that define WebIDL terms that are also defined in another spec;
specs that use WebIDL terms defined in another spec without referencing that spec normatively;
specs that use WebIDL terms for which the crawler could not find any definition in any of the specs it studied;
specs that link to another spec but do not include a reference to that other spec;
specs that link to another spec inconsistently in the body of the document and in the list of references (e.g. because the body of the document references the Editor's draft while the reference is to the latest published version).

WebIDL terms explorer

See the related WebIDLPedia project and its repo.

Other tools

Some of the tools that compose Reffy may also be used directly.

The references parser takes the URL of a spec as input and generates a JSON structure that lists the normative and informative references found in the spec. To run the references parser: node parse-references.js [url]

The WebIDL extractor takes the URL of a spec as input and outputs the IDL definitions found in the spec as one block of text. To run the extractor: node extract-webidl.js [url]

The WebIDL parser takes the URL of a spec as input and generates a JSON structure that describes WebIDL term definitions and references that the spec contains. The parser uses WebIDL2 to parse the WebIDL content found in the spec. To run the WebIDL parser: node parse-webidl.js [url]

The Spec finder takes a JSON crawl report as input and checks a couple of sites that list Web specifications to detect new specifications that are not yet part of the crawl. To run the spec finder: node find-spec.js ./results.json

The crawl results merger merges a new JSON crawl report into a reference one. This tool is typically useful to replace the crawl results of a given specification with the results of a new run of the crawler on that specification. To run the crawl results merger: node merge-crawl-results.js [new crawl report] [reference crawl report] [crawl report to create]

The spec checker takes the URL of a spec, a reference crawl report and the name of the study report to create as inputs. It crawls and studies the given spec against the reference crawl report. Essentially, it applies the crawler, the merger and the study tool in order, to produces the anomalies report for the given spec. Note the URL can check multiple specs at once, provided the URLs are passed as a comma-separated value list without spaces. To run the spec checker: node check-specs.js [url] [reference crawl report] [study report to create]

For instance:

node parse-references.js https://w3c.github.io/presentation-api/
node extract-webidl.js https://www.w3.org/TR/webrtc/
node parse-webidl.js https://fetch.spec.whatwg.org/
node check-specs.js https://www.w3.org/TR/webstorage/ ./reports/w3c/crawl.json ./reports/study-webstorage.json

Technical notes

Reffy is still at an early stage of development. It may crash from time to time.

Reffy should be able to parse most of the W3C/WHATWG specifications that define WebIDL terms (both published versions and Editor's Drafts). The tool may work with other types of specs, but has not been tested with any of them.

List of specs to crawl

The recommended lists appear in specs-w3c.json and spec-whatwg.json. Both files reference a common list in specs-common.json. These lists were built out of the JavaScript APIs TR bucket, semi-manually completed to create a more comprehensive list.

It should be possible to crawl other specs, but note Reffy has not yet been tested with specs that do not define any WebIDL term, and would need to be adjusted to return "interesting" information. Feel free to try out other specs and report any issue!

Crawling a spec

Given the URL of a spec, the crawler basically goes through the following steps:

If the URL looks like http(s)://www.w3.org/TR/[something], the crawler extracts the shortname of the specification, and sends a couple of requests to the W3C API to retrieve the URL of the Editor's Draft, or the URL of the latest published version if the URL of the Editor's Draft could not be found. This new URL replaces the given one.
Fetch the URL. Note Reffy uses a network cache on the local filesystem, and sends conditional HTTP requests if the URL is already in that cache
Render the response with jsdom, which should create a Window object.
If the document contains a "head" section that includes a link whose label looks like "single page", go back to step 2 and load the target of that link instead. This makes the crawler load the single page version of multi-page specifications such as HTML5.
If the document uses ReSpec, use Respec HTML writer tool to render the document in a virtual browser environment and access the resulting DOM
Run internal tools on the generated document to build the relevant information.

The crawler processes 10 specifications at a time. Network and parsing errors should be reported in the crawl results.

Config parameters

The crawler reads parameters from the config.json file. To be able to interact with the W3C API, that file must contain a w3cApiKey entry whose value is a valid W3C API Key.

Optional parameters:

avoidNetworkRequests: set this flag to true to tell the crawler to use the cache entry for a URL directly, instead of sending a conditional HTTP request to check whether the entry is still valid. This parameter is typically useful when developing Reffy's code to work offline.
resetCache: set this flag to true to tell the crawler to reset the contents of the local cache when it starts.

Hardcoded rules

Some rules or exceptions to the rule are hardcoded. In particular:

The URL of some of the Editor's Drafts returned by the W3C API can be invalid, or a document that when loaded redirects to another. The list is hardcoded in the completeWithInfoFromW3CApimethod in crawl-specs.js. The crawler loads the latest published version for these specs.
Some specs cannot be loaded with jsdom for the time being, typically some specs that use ReSpec's markdown format. This should hopefully be fixed soon. The list is hardcoded in the completeWithInfoFromW3CApimethod in crawl-specs.js.
Some specs load external scripts that may not run properly in jsdom. Such scripts are ignored. See details in loadSpecificationFromHtml function in util.js.
The heuristics used to find the "single page" link are defined in the loadSpecificationFromHtml function in util.js. They may need to be extended to support other cases.
For each spec, the crawler reports a list of URLs which may be considered as equivalent for the purpose of referencing. This list typically includes the initial shortname URL for W3C specs, the dated URL of the latest published version of the spec, and the URL of the Editor's Draft. For a couple of specs, it also includes links to previous or alternate "versions" of the spec. For instance, the versions of the HTML5.1 spec include the HTML5 W3C Recommendation and the WHATWG HTML Living Standard. The study tool uses that information when it checks the list of references to find missing ones. Ideally, the W3C API would return up-to-date information such as "supercedes" to clarify the relationship between versions of the same spec. The mapping is hardcoded in addKnownVersions in util.js.

Contributing

Authors so far are François Daoust and Dominique Hazaël-Massieux.

Additional ideas, bugs and/or code contributions are most welcome. Create issues on GitHub as needed!

Licensing

The code is available under an MIT license.

v11.0.0 - 2022-11-28

This new major version modifies and completes the CSS extraction logic. See #1117 for details.

No other change was made, meaning breaking and non-breaking changes only affect CSS extracts.

Breaking changes

Arrays are now used throughout instead of indexed objects.
Function names are no longer enclosed in < and > because they are not defined in specs with these characters (as opposed to types). Beware though, references to functions in value syntax do use enclosing < and > characters.
The property valuespaces at the root level is now named values. An array is used there as well. The values property lists both function and type definitions that are not namespaced to anything in particular (it used to also contain namespaced definitions).

Added

Selectors are now reported under a selectors property at the root level.
Possible values that some definition may take are now reported under a values property directly within the definition.
Functions and types that are namespaced to some other definition are included in the list of values of that definition.
Anomalies detected in the spec are now reported under a warnings property at the root of the extract. Four types of anomalies are reported:
1. Missing definition: when a production rule was found but when the spec does not include a corresponding <dfn> (or when that <dfn> does not have a data-dfn-type attribute that identifies a CSS construct)
2. Duplicate definition: when the spec defines the same term twice.
3. Unmergeable definition: when the spec defines the same property twice and both definitions cannot be merged.
4. Dangling value: when the spec defines a CSS "value" definition (value, function or type) for something and that something cannot be found in the spec
To distinguish between function, type and value definitions listed in a values property, definitions that appear in a values property have a type property.

Additional notes

Only namespaced values associated with a definition are listed under its values property. Non-namespaced values are not. For instance, <quote> is not listed as a value of the <content-list> type, even though its value syntax references it. This is to avoid duplicating constructs in the extracts.
Values are only listed under the deepest definition to which they apply. For instance, open-quote is only listed as a value of <quote> but neither as a value of the <content-list> type that references <quote> nor as a value of the content property that references <content-list>. This is also to avoid duplicating constructs in the extracts.
Some of the extracts contain things that may look weird at first, but that is "by design". For instance, CSS Will change defines a <custom-ident> value construct whose actual value is the <custom-ident> type construct defined in CSS Values. Having both a namespaced value and a non-namespaced <type> is somewhat common in CSS specs.

FAQs

What is reffy?

Is reffy popular?

Is reffy well maintained?

Package last updated on 29 Mar 2018

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

reffy

Reffy

How to use

Reffy's tools

Specs crawler

Study tool

WebIDL terms explorer

Other tools

Technical notes

List of specs to crawl

Crawling a spec

Config parameters

Hardcoded rules

Contributing

Licensing

v11.0.0 - 2022-11-28

Breaking changes

Added

Additional notes

Related posts

Malicious PyPI Package Exploits Deezer API for Coordinated Music Piracy

TON Wallet Security Threat: Malicious npm Package Steals Cryptocurrency Wallet Keys