Importer benchmark
Sitemap importer -ish . Written in Go to figure out if we could benefit from using this language
Details
After our discussion about the new architecture of importers, I've decided to implement a test importer. Instead of mimicking the existing tooling we have I've implemented a workflow matching our future approach to how all the importers suppose to work. Basically, it should download the data, parse it, extract the required data to TSV and upload it to S3. Keeping that in mind this particular importer does the following steps:
- Parse provided via env variable payload
- Download sitemap from specified URL
- Parse XML
- Output links to TSV file
- Upload the file to S3
To properly compare the whole workflow and efficiency the script was written in Ruby as well
Benchmarks
Test | Go | Ruby | Difference |
---|
Resulting docker image size | 17 MB | 74 MB | 78% |
Image build time | ~ 45 sec | ~ 70 sec | 36% |
Run time (Average of 10 runs) | 1163 ms | 2356 ms | 51% |
Startup time (Average of 10 runs) | 974 ms | 1538 ms | 37% |
Memory consumption | 70 MB | 200 MB | 65% |
Let's go over those numbers:
- Image size - Scripts written in go are compiled into a small executable file containing everything required for it to run which makes the resulting docker image as small as possible. It only contains OS and the executable itself. The Ruby version, on the other hand, has to be shipped along with the interpreter and all the required libraries which makes the final image much bigger
- Image build time - In this example, Ruby image could be built a bit faster but due to usage of Nokogiri for XML parsing it's compilation time results in a much longer image building time
- Run time - Represents a total time taken by the script to finish. Measured from within the script itself
- Startup time - Measured by the small script called
test.js
. It basically shows how much time passed between calling docker run
command and the start of the script itself. Wanted to measure it to figure out how long it takes to spin up the whole environment. In the case of Ruby, it takes a bit longer because you have to spin up the interpreter and load all the libraries - Memory consumption - System memory used by the process at the end of the script. Go is much more efficient due to its strongly typed nature I guess
How to use
Comes with a Makefile containing all the required commands
To build containers:
make build
make build-rb
make build-go
To run the script:
make run-minio
make run-rb
make run-go
To remove created network and stop local s3:
make clean
To measure startup time:
node test.js make run-rb
node test.js make run-go
Using a different paradigm in development writing code in Golang brings some complexity. Strong typing might be a bit difficult when working with third-party resources and flexible schema. On the other hand, combined with code compilation it gives more certainty eliminating lots of runtime errors. Also it worth to consider the tooling around the language. Go has a lot of really powerful features like formatting, documentation and testing out of the box. Also from my small experience with this language I've noticed how well it supported by editors. I'm using VS Code and it provides a lot more functionality when using Go like better refactoring and testing. And it comes right out of the box as well when with Ruby I had to spend a good amount of time setting everything up for a comfort development process. I guess it's much easier to support stricter languages. On the contrary, the time it took me to implement such a small script in Go is a bit disappointing. But it wasn't that hard to start coding right away tbh.