Security News
GitHub Removes Malicious Pull Requests Targeting Open Source Repositories
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
bat-publisher
Advanced tools
Routines to identify publishers for the BAT Ledger:
A publisher identity is derived from a URL and is intended to correspond to the publisher associated with the URL.
var getPublisher = require('bat-publisher').getPublisher
var publisher = getPublisher('URL')
Note that because some domains host multiple publishers,
a publisher identity may contain both a domain and a path separated by a solidus(/
).
Also note that certain URLs aren't really appropriate for a publisher mapping. For example, if a URL returns a 302, don't bother mapping that URL.
Consider this URL:
https://foo.bar.example.com/component1/...?query
The label com
from the URL's domain is a top-level domain (TLD),
and the string example.com
is a second-level domain (SLD).
By convention,
the relative domain (RLD) is the string to the left of the SLD (e.g., foo.bar
),
and the qualifying label (QLD) is the right-most label of the RLD (e.g., bar
).
There are two popular types of TLDs: infrastructure and international country code (ccTLD).
Although an SLD is normally thought of being the next-to-last right-most label (e.g., example
),
for domains with a ccTLD,
the convention differs.
Consider this URL:
http://search.yahoo.co.jp/search?query
The string co.jp
corresponds to the TLD, the string yahoo.co.jp
corresponds to the SLD,
and the QLD and RLD are both the string search
.
The ABNF syntax for a publisher identity is:
publisher-identity = site-identity / provider-identity
site-identity = domain [ "/" segment ]
domain = [ RLD "." ] SLD
RLD = *[ label "." ] QLD
QLD = label
SLD = label "." TLD
TLD = infraTLD / ccTLD
ccTLD = label "." 2ALPHA ; a two-letter country code, cf. ISO 3166
infraTLD = label ; ".com", ".gov", etc.
label = alphanum *62(alphanum / "-") ; any octet encoded according to RFC 2181
alphanum = ALPHA / DIGIT
path-abempty = *( "/" segment) ; as defined in Section 3.3 of RFC 3986
provider-prefix = provider-scheme ":" provider-value
provider-scheme = provider-prefix "#" provider-suffix
provider-prefix = label
provider-suffix = label
provider-value = 1*(unreserved / pct-encoded)
pct-encoded = "%" HEXDIG HEXDIG ; as defined in section 2.1 of RFC 3986
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" ; as defined in section 2.3 of RFC 3986
Note that a site-identity
must not include either a fragment (#...
) or a query (?...
).
var isPublisher = require('bat-publisher').isPublisher
if (isPublisher('...')) ...
The package uses a rule set expressed as a JavaScript array.
Each rule in the array consists of an object with one mandatory property,
condition
,
a JavaScript boolean expression.
In addition,
there is usually either a consequent
property
(a JavaScript expression returning either a string, null
, or undefined
),
or a dom
property.
To detetermine the publisher identity associated with a URL:
If the TLD associated with the URL's domain does not correspond to an infrastructure or ccTLD,
then the publisher identity is undefined
.
The URL is parsed into an object using the URL module.
The parsed object is extended with the URL
, TLD
, SLD
, RLD
, and QLD
objects.
If there is no RLD
, the empty string (""
) is used for both the RLD
and QLD
.
If the dom.publisher
property of the rule is present,
then the HTML associated with the URL must be present,
and one additional object is present during evaluation,
node
, which is the result of jsdom(markup).body.querySelector(dom.publisher.nodeSelector)
,
and the dom.publisher.consequent
property is used instead of the consequent
property for the rule in Step 5.2.
Each rule is examined, in order, starting from the first element:
5.1. If the condition
evaluates to false
,
then execution continues with the next rule.
5.2. Otherwise,
the consequent
is evaluated.
5.3. If the resulting value is the empty string (""
),
then execution continues with the next rule.
5.4. If the resulting value is false
, null
or undefined
,
then the publisher identity is undefined
.
5.5. Otherwise, the resulting value is used as the publisher identity.
If Step 5.5 is never executed,
then the publisher identity is undefined
.
The initial rule set is built by a NPM script:
npm run build-rules
An initial rule set is available as:
require('bat-publisher').ruleset
Please submit a pull request with updates to the rule set.
If you are running the Brave Browser on your desktop, you can run
% node dump.js
in order to examine all the URLs you have visited in your current session (from the file session-store-1
)
and see the resulting publisher identities.
A page visit is just what you'd expect, but it requires both a URL and the duration of the focus (in milliseconds). A synopsis is a collection of page visits that have been reduced to a a publisher and a score. The synopsis includes a rolling window so that older visits are removed.
var synopsis = new (require('bat-publisher').Synopsis)()
// each time a page is unloaded, record the focus duration
// markup is an optional third-parameter, cf., getPublisher above
synopsis.addVisit('URL', duration)
// addVisit is a wrapper around addPublisher
synopsis.addPublisher(publisher, props)
At present, these properties are examined:
duration
- the number of milli-seconds (mandatory)
markup
- the HTML markup (optional)
In order to calculate the score, options can be provided when creating the object. The defaults are:
{ minPublisherDuration : 8 * 1000
, numFrames : 30
, frameSize : 24 * 60 * 60 * 1000
}
When addVisit
is invoked,
the duration must be at least minPublisherDuration
milliseconds in length.
If so,
then one or more "scorekeepers" are run to calculate the score for the visit,
using both the options
and props
.
At present,
there are two scorekeepers:
concave
- courtesy of @dimitry-xyz
visits
- the total number of visits
The concave scorekeeper rewards the publisher of a page according to:
The reward increases as the user spends more time on the page, but the model uses a
concave quadratic (utility) function to provide diminishing returns as the time spent
on the page increases. If we set the durationWeight
parameter to zero, the model
only takes into account the page hit and ignores the time spent on the page when
calculating the reward.
Scorekeepers may be "tuned" using options,
at present,
only the concave
scorekeeper makes use of these.
The defaults are:
{ _d : 1 / (30 * 1000) // 0.0000333...
, _a : (1 / (_d * 2)) - minPublisherDuration // 5000
, _b : minPublisherDuration - _a // 5000
}
The sliding window consist of numFrames
frames,
each having a timeframe of frameSize
milliseconds.
So, for the default values,
the sliding window will be 30
days long.
Once a synopsis is underway,
the "top N" publishers can be determined.
Each publisher will has an associated weighted score,
so that the sum of the scores "should approximate" 1.0
:
// get the top "N" publishers
console.log(JSON.stringify(synopsis.topN(20), null, 2))
// e.g., [ { publisher: "example.com", weight 0.0123456789 } ... ]
The parameter to the topN
method is optional.
Similarly, to pseudo-randomly select a single publisher, using the weighted score:
// select a single publisher
console.log(synopsis.winner())
// e.g., "brave.com"
// or multiple winners
console.log(synopsis.winners(n))
Many thanks to Elijah Insua for the excellent jsdom package, and to Thomas Parisot for the excellent tldjs package.
FAQs
Routines to identify publishers for the BAT.
The npm package bat-publisher receives a total of 3 weekly downloads. As such, bat-publisher popularity was classified as not popular.
We found that bat-publisher demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
Security News
RubyGems.org has added a new "maintainer" role that allows for publishing new versions of gems. This new permission type is aimed at improving security for gem owners and the service overall.
Security News
Node.js will be enforcing stricter semver-major PR policies a month before major releases to enhance stability and ensure reliable release candidates.