Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

github.com/duderman/gositemap

Package Overview

Dependencies

Alerts

File Explorer

Install Socket

Detect and block malicious and high-risk dependencies

Install

github.com/duderman/gositemap

v0.0.0-20200124145240-dfaf8fa43ed3
Source
Go

Version published: 5 years ago

Created: 5 years ago

Source

Importer benchmark

Sitemap importer -ish . Written in Go to figure out if we could benefit from using this language

Details

After our discussion about the new architecture of importers, I've decided to implement a test importer. Instead of mimicking the existing tooling we have I've implemented a workflow matching our future approach to how all the importers suppose to work. Basically, it should download the data, parse it, extract the required data to TSV and upload it to S3. Keeping that in mind this particular importer does the following steps:

Parse provided via env variable payload
Download sitemap from specified URL
Parse XML
Output links to TSV file
Upload the file to S3

To properly compare the whole workflow and efficiency the script was written in Ruby as well

Benchmarks

Test	Go	Ruby	Difference
Resulting docker image size	17 MB	74 MB	78%
Image build time	~ 45 sec	~ 70 sec	36%
Run time (Average of 10 runs)	1163 ms	2356 ms	51%
Startup time (Average of 10 runs)	974 ms	1538 ms	37%
Memory consumption	70 MB	200 MB	65%

Let's go over those numbers:

Image size - Scripts written in go are compiled into a small executable file containing everything required for it to run which makes the resulting docker image as small as possible. It only contains OS and the executable itself. The Ruby version, on the other hand, has to be shipped along with the interpreter and all the required libraries which makes the final image much bigger
Image build time - In this example, Ruby image could be built a bit faster but due to usage of Nokogiri for XML parsing it's compilation time results in a much longer image building time
Run time - Represents a total time taken by the script to finish. Measured from within the script itself
Startup time - Measured by the small script called test.js. It basically shows how much time passed between calling docker run command and the start of the script itself. Wanted to measure it to figure out how long it takes to spin up the whole environment. In the case of Ruby, it takes a bit longer because you have to spin up the interpreter and load all the libraries
Memory consumption - System memory used by the process at the end of the script. Go is much more efficient due to its strongly typed nature I guess

How to use

Comes with a Makefile containing all the required commands

To build containers:

make build
# or separately
make build-rb
make build-go

To run the script:

make run-minio # starts a local s3 attached to a new network
make run-rb
make run-go

To remove created network and stop local s3:

make clean

To measure startup time:

node test.js make run-rb
node test.js make run-go

Comments

Using a different paradigm in development writing code in Golang brings some complexity. Strong typing might be a bit difficult when working with third-party resources and flexible schema. On the other hand, combined with code compilation it gives more certainty eliminating lots of runtime errors. Also it worth to consider the tooling around the language. Go has a lot of really powerful features like formatting, documentation and testing out of the box. Also from my small experience with this language I've noticed how well it supported by editors. I'm using VS Code and it provides a lot more functionality when using Go like better refactoring and testing. And it comes right out of the box as well when with Ruby I had to spend a good amount of time setting everything up for a comfort development process. I guess it's much easier to support stricter languages. On the contrary, the time it took me to implement such a small script in Go is a bit disappointing. But it wasn't that hard to start coding right away tbh.

FAQs

What is github.com/duderman/gositemap?

Package last updated on 24 Jan 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

github.com/duderman/gositemap

Importer benchmark

Details

Benchmarks

How to use

Comments

Related posts

Threat Actor Exposes Playbook for Exploiting npm to Build Blockchain-Powered Botnets

NVD Backlog Tops 20,000 CVEs Awaiting Analysis as NIST Prepares System Updates