Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

github.com/hauke96/osm-changeset-analyser

Package Overview
Dependencies
Alerts
File Explorer
Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/hauke96/osm-changeset-analyser

  • v0.0.0-20200324173100-338b7226e87f
  • Source
  • Go
  • Socket score

Version published
Created
Source

OSM changeset analyser

A tool analysing the changesets from OpenStreetMap (OSM).

Compilation

This uses sigolo (logging) and kingpin (CLI options) as dependencies. Everything can be compiled normally.

go get https://github.com/hauke96/sigolo
go get https://github.com/hauke96/kingpin
go run .

Usage

Here a short version of the --help flag:

usage: OSM changeset analyser --analysers=ANALYSERS [<flags>] <file>

A tool analysing the changesets from OpenStreetMap (OSM).

Flags:
  -h, --help                 Show context-sensitive help (also try --help-long and --help-man).
  -d, --debug                Verbose mode, showing additional debug information
      --analysers=ANALYSERS  A comma separated list of analysers
  -v, --version              Show application version.

Args:
  <file>  The file to analyse

ANALYSERS:
  The 'analysers' flag is a comma separated list of analysers all creating their own CSV file:

  * editor-count : Counts the amount of the most common editors for each month.
  * no-source-count : Counts the amount of monthly changesets without source tag, sorted by editor.
  * user-without-source : Counts for each user the amount of changesets without source tag for each editor editor.
  * comment-keywords(foo,bar) : Takes keywords (in this case "foo" and "bar") and counts their occurrence per month. Comments and keywords are converted into lower case.

So for example this call analyses the data.osm using the three analysers for the editor count, the editor without source and the users without source:

$> go build .
$> ./osm-changeset-analyser --analysers=editor-count,no-source-count,user-without-source data.osm
$> ll result*
-rw-r--r-- 1 hauke hauke 8,2K  7. Mär 15:03 result_editor-count.csv
-rw-r--r-- 1 hauke hauke 8,2K  7. Mär 15:03 result_no-source-count.csv
-rw-r--r-- 1 hauke hauke  529  7. Mär 15:03 result_user-without-source.csv

Input data and format

OSM changesets have a simple XML structure. Each changeset has basic metadata (user, location, creation date, etc.) and more specific metadata (comment, source of data, etc.), which can consist of arbitrary XML tags.

<changeset id="1234567"
		created_at="2020-01-12T14:03:44Z"
		open="false"
		comments_count="2"
		changes_count="154"
		closed_at="020-01-12T14:04:15Z"
		min_lat="10.24"
		min_lon="20.48"
		max_lat="5.12"
		max_lon="2.56"
		uid="12345"
		user="mega-mapper-3000">
	<tag k="source" v="survey; Bing"/>
	<tag k="hashtags" v="#github;#example"/>
	<tag k="created_by" v="JOSM/1.5 (15492 en)"/>
	<tag k="comment" v="Useful information for other mappers"/>
</changeset>

The latest data for the whole planet can be downloaded from https://planet.openstreetmap.org/planet/changesets-latest.osm.bz2. This is over 3GB large (decompressed approx. 34GB) and contains all changesets from 2005 til now.

Performance

I tested the performance on my private computer (s. below). Of course there were some other applications running (like E-Mail client, Browser, Editors, etc.) but I wasn't doing anything during the execution.

Dataset

I used the changesets-200224.osm.bz2 (donwload size: 3.2GB / decompressed size: 34GB).

My system:

  • CPU: Intel Xeon E3-1231 v3, 8x3.4GHz
  • RAM: 16GB DDR3 1333MHz
  • Drive: Samsung SSD 850 EVO

Measurements

Here are some example executions:

active analysersexecution timeprocessing speedRAM usage (approx.)
no-editor6m, 39s85 MB/s6.8 GB
user-without-source7m, 12s78 MB/sapprox. 10 GB
no-editor
no-source-count
user-without-source
7m, 21s77 MB/s10GB

Output files

13K result_editor-count.csv
13K result_no-source-count.csv
52M result_user-without-source.csv

For developers

There exist multiple goroutines processing the data asynchronously. See the doc folder for more information.

FAQs

Package last updated on 24 Mar 2020

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc