Baleen
Baleen is fluent Kotlin DSL for validating data (JSON, XML, CSV, Avro)
Features
Example Baleen Data Description
import com.shoprunner.baleen.Baleen.describeAs
import com.shoprunner.baleen.ValidationError
import com.shoprunner.baleen.dataTrace
import com.shoprunner.baleen.types.StringType
val departments = listOf("Mens", "Womens", "Boys", "Girls", "Kids", "Baby & Toddler")
val productDescription = "Product".describeAs {
"sku".type(StringType(min = 1, max = 500),
required = true)
"brand_manufacturer".type(StringType(min = 1, max = 500),
required = true)
"department".type(StringType(min = 0, max = 100))
.describeAs {
test("department is correct value") { data ->
assertThat(data).hasAttribute("department") {
it.isOneOf(departments)
}
}
}
}
val data: Data =
val validation: Validation = dataDesc.validate(data)
val cachedValidation: CachedValidation = validation.cache()
val isValid: Boolean = cachedValidation.isValid()
cachedValidation.results.forEach { }
cachedValidation.results.watch().forEach { }
val validationSummary: CachedValidation = cachedValidation.createSummary()
validationSummary.results.forEach { }
File("validation.html").writer().use {
HtmlPrinter(it).print(validationSummary.results)
}
Getting Help
Join the slack channel
Core Concepts
-
Tests are great
There are a lot of great libraries for testing code. We should use those same concepts for testing
data.
-
Performance and streaming are important
A data validation library should be able to handle large amounts of data quickly.
-
Invalid data is also important
Warnings and Errors need to be treated as first class objects.
-
Data Traces
Similar to a stack trace being used to debug a code path, a data trace can be used to debug a
path through data.
-
Don't map data to Types too early.
Type safe code is great but if the data hasn't been sanitized then it isn't really typed.
Warnings
Sometimes you will want an attribute or type to warn instead of error. The asWarnings()
method will transform the output
from ValidationError
to ValidationWarning
for all nested tests run underneath that attribute/type.
import com.shoprunner.baleen.Baleen.describeAs
import com.shoprunner.baleen.ValidationError
import com.shoprunner.baleen.dataTrace
import com.shoprunner.baleen.types.StringType
import com.shoprunner.baleen.types.asWarnings
val productDescription = "Product".describeAs {
"sku".type(StringType(min = 1, max = 500).asWarnings(), required = true)
"brand_manufacturer".type(StringType(min = 1, max = 500), required = true).asWarnings()
"department".type(StringType(min = 0, max = 100)).describeAs {
test("department is correct value") { data ->
assertThat(data).hasAttribute("department") {
it.isOneOf(departments)
}
}
}.asWarnings()
}
Tagging
A feature of Baleen is to add tags to tests, so that you can more easily identify, annotate, and filter your results.
There are a couple use-cases tagging becomes useful. For example, you have an identifier, like a sku, that you want each
test to have so that you can group together failed tests by that identifier. Another use-case is that you have different
priority levels for your tests that you can set so you can highlight the most important errors.
val productDescription = "Product".describeAs {
"sku".type(StringType().tag("priority", "critical").tag("sku", withValue()))
"brand_manufacturer".type(StringType(), required = true)
.tag("priority", "low")
.tag("sku", withAttributeValue("sku"))
"department".type(StringType(min = 0, max = 100))
.tag("priority", "high")
.tag("sku", withAttributeValue("sku"))
.tag("gender") { d ->
when {
d is Data && d.containsKey("gender") ->
when(d["gender"]) {
"male" -> "male"
"mens" -> "male"
"female" -> "female"
"womens" -> "femle"
else -> "other"
}
else -> "none"
}
}
}
.tag("sku", withAttributeValue("sku"))
Tagging is also done at the data evaluation level. When writing tests, additional tags can be passed in using the Tagger function.
"department".type(StringType(min = 0, max = 100)).describeAs {
test("department is correct value", "sku" to withAttributeValue("sku")) { data ->
assertThat(data).hasAttribute("department") {
it.isOneOf(departments)
}
}
}
Some Baleen Validation libraries, such as the XML or JSON validators, use tags to add line and column numbers as it
parses the original raw data. This will help identify errors in the raw data much more quickly.
Gotchas
- Baleen does not assume that an attribute is not set and an attribute that is set with the value of null are the same thing.
Similar Projects