Go Embedded Database — in-process document database using Bolt (Badger soon)
Why?
A single server in 202x can handle almost any load almost any small-to-mid-sized company has. This means you can massively save on development time, costs and complexity by going back from the cloud to a dedicated unicorn server.
This won't be the right choice every time; there's availability and security boundaries to consider, too. However, a lot of companies could benefit from faster time-to-market and smaller development teams, and it is very often the right choice for startups in particular.
To realize those benefits, you need an efficient database:
- Very fast data access means that, instead of optimizing a tricky SQL, you can just loop over your data with code. Imagine the savings.
- Serialized writes simplify statistics & similar, eliminate a whole class of errors, and greatly simplify the code.
To access data fast, you want that data to already be a part of your process memory. In-process key-value datastores do that quickly an efficiently. We're using Bolt for now, but its maintenance story leaves a lot to be desired, so we'll be switching to Badger soon.
We use msgpack for encoding structs currently. More options are possible, but not available right now.
Usage
Install:
go get github.com/andreyvit/edb
Saves ordinary structs in the database, the first field is the primary key (use another struct for a composite key) and must be marked as msgpack:"-"
to avoid storing it as part of the value:
type Post struct {
ID string `msgpack:"-"`
Time time.Time `msgpack:"tm"`
Author string `msgpack:"a"`
Content string `msgpack:"c"`
Published bool `msgpack:"pub"`
}
Define schema:
var (
mySchema = edb.NewSchema(edb.SchemaOpts{})
postsTable = edb.AddTable[Post](mySchema, "posts", 1, nil, nil, nil)
)
Those nils are: indexer func, migration func, a list of indices, all optional. Let's add a couple of indices:
var (
postsTable = edb.AddTable(mySchema, "posts", 1, func (post *Post, ib *edb.IndexBuilder) {
if post.Author != "" {
ib.Add(postsByAuthor, post.Author)
}
if post.Published {
ib.Add(publishedPostsByTime, post.Time)
}
}, nil, []*edb.Index{
postsByAuthor,
publishedPostsByTime,
})
postsByAuthor = AddIndex[string]("by_author")
publishedPostsByTime = AddIndex[time.Time]("published_by_time")
)
Open a db:
db := must(edb.Open(filePath, mySchema, edb.Options{}))
Save a post:
post := &Post{
ID: "123",
Time: time.Now(),
Author: "alice",
Content: "This is my first post.",
Published: true,
}
ensure(db.Tx(true, func(tx *db.Tx) error {
edb.Put(tx, post)
return nil
}))
All operations must be performed inside a transaction. Those can be read-only (db.Tx(false, ...)
) or mutable (db.Tx(true, ...)
). We'll assume you have a transaction going in the code below.
Find a post by ID:
post := edb.Get[Post](tx, "123")
if post == nil {
log.Printf("not found")
} else {
log.Printf("post = %v", *post)
}
Find posts by author name:
for c := edb.ExactIndexScan[Post](tx, postsByAuthor, "alice"); c.Next(); {
post := c.Row()
log.Printf("found: %v", *post)
}
ExactIndexScan
is a helper that combines IndexScan
with ExactScan
option. To find posts by time, scanning backwards, we'll have to use these lower-level tools:
for c := edb.IndexScan[Post](tx, postsByAuthor, edb.ExactScan("alice").Reversed()); c.Next(); {
post := c.Row()
log.Printf("found: %v", *post)
}
You can scan the entire table:
for c := edb.TableScan[Post](tx, edb.FullScan()); c.Next(); {
post := c.Row()
log.Printf("found: %v", *post)
}
Use edb.All
to obtain a slice of all rows from a cursor (noting, of course, that this might use unbounded memory, so looping is preferrable whenever possible):
allPosts := edb.All(edb.ExactIndexScan[Post](tx, postsByAuthor, "alice"))
Migrate schema versions by adding a migrator func; the number you pass in to edb.AddTable
is the latest schema version number, and your func is responsible for migrating older versions:
var (
postsTable = edb.AddTable(mySchema, "posts", 2, ..., func(tx *edb.Tx, post *Post, oldVer uint64) {
if oldVer < 2 {
post.Author = strings.ToLower(posts.Author)
}
}, ...)
)
The migrator will be invoked when loading a post. Schema version is stored per row, and you don't need to migrate all rows immediately; it's okay to migrate them as they get saved. We'll add more options for schema migrations later.
These examples use the following two error handling helpers:
func must[T any](v T, err error) T {
if err != nil {
panic(err)
}
return v
}
func ensure(err error) {
if err != nil {
panic(err)
}
}
Technical Details
Buckets
We rely on scoped namespaces for keys called buckets. Bolt supports them natively. A flat database like Badger will simulate buckets via key prefixes. We use nested buckets in Bolt (tablename/data and tablename/i_indexname), but only for conveninece; flat buckets are fine.
Key Encoding
Keys are encoded using a tuple encoding. This concatenates all values together, and then appends a reversed variable-length encoding of lengths of each component except for the last one, and then the number of components. For single-component keys, the overhead is a single byte (1) at the end.
Value Encoding
Value = value header, encoded data, encoded index keys.
Value header:_
- Flags (uvarint).
- Schema version (uvarint).
- Data size (uvarint).
- Index size (uvarint).
Encoded data is just msgpack encoding of the struct.
Encoded index keys record the keys contributed by this row. If index computation changes in the future, we still need to know which index
keys to delete when updating the row, so we store all index keys added by the row. Format:
- Number of entries (uvarint).
- For each entry: index ordinal (uvarint), key length (uvarint), key bytes.
Table States
We store a meta document per table, called “table state”. This document holds info on which indexes are defined for the table, and assigns
an index ordinal (a unique positive integer) to each one. Ordinals are never reused as indexes are removed and added, so are safe to store inside values.
We should probably move to a single per-database state document.
Contributing
Contributions are welcome. There's a lot to do here still. Tests and documention will be much appreciated, too.
Auto-testing via modd (go install github.com/cortesi/modd/cmd/modd@latest
):
modd
MIT license
Copyright (c) 2023 Andrey Tarantsov. Published under the terms of the MIT license.