Security News
tea.xyz Spam Plagues npm and RubyGems Package Registries
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
db-clean-station-name
Advanced tools
Readme
Remove noise and common typographic issues from Deutsche Bahn (DB, german railways) and VBB (Berlin-Brandenburg transportation authority) station names, returned e.g. by the db-hafas and vbb-hafas module. For a list of changes that are applied to names, see the rules section.
npm install db-clean-station-name
const cleanStationName = require('db-clean-station-name')
const noisy = 'S+U Berlin Yorckstr. S2 U7 (S+U)'
const cleaned = cleanStationName(noisy)
console.log(cleaned) // 'Berlin Yorckstraße'
The module applies a specific set of rules, which were test-run against a dataset of ≈250,000 station names. The matches column in the following table specifies how many of those station names a rule could be applied to at least once. Samples for each rule were checked manually for QA.
Rule | Example | Matches |
---|---|---|
Fix typographically incorrect apostrophes | Up`n Kiwitt → Up’n Kiwitt | ≈100 |
Replace non-round braces | Bersarinplatz [Weidenweg] → Bersarinplatz (Weidenweg) | ≈100 |
Fix whitespace for braces | Frankfurt(Main)Hbf → Frankfurt (Main) Hbf | ≈2100 |
Fix whitespace for punctuation and slashes | St.Georg → St. Georg , Landau a.d. Isar → Landau a.d. Isar | ≈700 |
Replace generic abbreviations: Abzw., b., Ri. | Abzw. Baalborn → Abzweig Baalborn , Garching b. München → Garching bei München | ≈10000 |
Replace most common location abbreviations: Thür, Württ, Meckl, … Note that this rule will probably be replaced by a more generic location-parsing rule at some point | Minden(Westf) → Minden (Westfalen) | ≈5000 |
Replace Str. with Straße | Bülowstr. → Bülowstraße , Willy-Brandt-Str. → Willy-Brandt-Straße | ≈22000 |
Remove defined set of line and product names: U1, (U), (Bus), (S 1), S+U, … | Alexanderplatz (U) U5 U8, Berlin → Alexanderplatz, Berlin , Budberg (B63), Werl → Budberg (B63), Werl | ≈700 |
There are some additional rules which aren't listed here, but those only affect a handful of stations or fix the result of other rules (e.g. removing duplicate whitespace).
This module also offers a method to attempt to remove location names from station names, e.g. the Berlin
in Amrumer Straße, Berlin
or Köln
in Köln Messe/Deutz
. The module also has a blacklist for particles for which the location is never removed even if it was detected correctly, e.g. for Frankfurt Süd
, where the remaining part wouldn't really make sense on its own.
const cleanStationNameWithLocation = require('db-clean-station-name/lib/with-location')
// you must provide a station name as well as a geolocation for that station
const cleaned = cleanStationNameWithLocation('(S) Berlin Hauptbahnhof', { longitude: 13.0991973, latitude: 52.404288 })
const result = {
full: 'Berlin Hauptbahnhof' // normal output of db-clean-station-name
short: 'Hauptbahnhof' // will be `null` if no locations were detected
matchedLocationIds: ['11000000'] // some id(s) corresponding to locations that were detected, you can use those to check e.g. if two stations are in the same city. will be empty of no location(s) were detected. note that for cases like `Frankfurt Süd`, where `short` will be null because nothing could be removed because of some blacklisted name, the list of matched location ids can still contain values
}
If you found a bug or want to propose a feature, feel free to visit the issues page.
FAQs
Remove noise and fix common typographic errors in Deutsche Bahn (DB, german railways) and VBB (Berlin-Brandenburg transportation authority) station names.
The npm package db-clean-station-name receives a total of 10 weekly downloads. As such, db-clean-station-name popularity was classified as not popular.
We found that db-clean-station-name demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.
Security News
UnitedHealth Group disclosed that the ransomware attack on Change Healthcare compromised protected health information for millions in the U.S., with estimated costs to the company expected to reach $1 billion.