cd2t
repository: https://gitlab.com/ko.no/cd2t
Table of Content
Key Features
- Feature Rich Data Type and Value Validation: Many data types which basic
testing of content available.
- Unlimited Data Structure: Recursive linking of data types like lists or
objects can represent any data structure.
- Data Structure Nesting: Templates allows you to define repeating data
structures only once. Templates can be unlimited nested. Loops are not
allowed.
- Referencing: Referencing can check the uniqueness of values at
different positions in the data structure (i.e. lists of objects with ID
attribute). It also can enforce a consumer/producer modell. In example,
strings at some positions can be collected as producers. Strings at other
positions must match to one of those produced string. Scope of references
can be limited to namespace.
- Multi Data Support: Multiple data sources can be checked with one schema
or many schemas. You can switch schemas during iterating over data sources.
Referencing or Autogeneration works across schemas and data sources by using
namespaces and reference keys.
- Value Autogeneration: Some data types support creation of non-existing
values. I.e. unique IDs can be added to a data structure.
Uniqueness can be limited to namespaces.
- Schema Validation: Typos, syntax mistakes or missing required options are
reported as SchemaErrors (Exception) during schema loading. Reason and path
through schema structured are provided.
- Data Validation: Validation returns a list of findings. Each finding
provides the position within the data structure and the validation finding
reason.
Data Schema
Data schema describes the expected structure and the expected values of the data,
which to be validated. It also includes schema meta data and control options.
Version 2
version: 2
name: < str >
description: < str >
allow_data_type_shortcuts: < bool | default -> true >
root: { data type schema }
custom_data_types:
< data type name >:
type: < built-in data type name >
< build-in data type option >: < value >
templates:
< template name >: { data schema }
template_merge_options:
recursive: < bool | default -> true >
# How lists in data type options should be merged.
list_merge: < append | append_rp | prepend | prepend_rp | replace
| default -> append_rp >
Version 1
name: < str >
description: < str >
allow_data_type_shortcuts: < bool | default -> false >
root: { data type schema }
subschemas:
< sub schema name >: { data type schema }
Built-In Data Types
Generic Data Types
'any' Data Type
'any' Description
This data type represents any data. The validator stops further data validation
or autogeneration.
'any' Limitations
- referencing is not supported
- autogeneration is not supported
'any' Schema
type: 'any'
description: < str >
'bool' Data Type
'bool' Description
This data type represents a boolean values (true/false).
'bool' Limitations
- referencing is not supported
'bool' Schema
type: 'bool'
description: < str >
allowed_value: < bool >
autogenerate: < bool | default -> false >
autogenerate_default: < bool >
'enum' Data Type
'enum' Description
This data type represents a selection of allowed values.
'enum' Limitations
- referencing is not supported
- autogeneration is not supported
'enum' Schema
type: 'enum'
description: < str >
allowed_values:
- < value >
'float' Data Type
'float' Description
This data type represents float values.
'float' Limitations
None.
'float' Schema Keys
type: 'float'
description: < str >
reference: { unique options }
maximum: < float >
minimum: < float >
maximum_decimals: < int >
allowed_values:
- < float >
- round: < int >
matches: < float >
- range_start: < float >
range_end: < float >
not_allowed_values:
- < float >
- round: < int >
matches: < float >
- range_start: < float >
range_end: < float >
autogenerate: < bool | default -> false >
autogenerate_default: < float >
autogenerate_random_tries: < int | default 10 >
autogenerate_ranges:
- minimum: < float >
maximum: < float >
autogenerate_random_decimals: < int | default 2 >
'float' Validation Process
If options are missing, corresponding checks are skipped.
- value >= minimum
- value <= maximum
- round of value == value
- value is not in not_allowed_values
- value is in allowed_values
'idlist' Data Type
'idlist' Description
This data type represents a dictionary, where keys are IDs.
IDs can be strings or integer.
'idlist' Limitations
- autogeneration is not supported
'idlist' Schema (in Version 2)
type: 'idlist'
description: < str >
reference: { unique options }
minimum: <int | default -> 0 >
maximum: < int >
elements: { data type schema }
id: { data type schema }
'idlist' Schema (in Version 1)
type: 'idlist'
description: < str >
reference: { unique options }
minimum: <int | default -> 0 >
maximum: < int >
elements: { data type schema }
id_type: < 'integer' | 'string' | default -> 'string' >
id_minimum: <int | default -> 0 >
id_maximum: < int >
allowed_ids:
- < string | integer >
not_allowed_ids:
- < string >
'integer' Data Type
'integer' Description
This data type represents integer values.
'integer' Limitations
None.
'integer' Schema
type: 'integer'
description: < str >
reference: { unique options }
maximum: < int >
minimum: < int >
not_allowed_values:
- < int >
autogenerate: < bool | default -> false >
autogenerate_default: < int >
autogenerate_maximum: < int >
autogenerate_minimum: < int >
autogenerate_find: < 'next_higher' | 'next_lower' |
'random' | default -> 'next_higher' >
'list' Data Type
'list' Description
This data type represents a list of same data types.
If different data types are allowed in the list,
use data type 'multitype' as elements.
'list' Limitations
- Not customizable
- referencing is not supported - use referencing in the 'elements' data type
- autogeneration of list elements is not supported - but autogeneration
within existing elements data structure is supported (pass-through).
'list' Schema
type: 'list'
description: < str >
elements: { data type schema }
minimum: <int | default -> 0 >
maximum: < int >
allow_duplicates: < bool | default -> true >
'multitype' Data Type
'multitype' Description
This data type represents a selection of allowed data types.
'multitype' Features & Limitations
- Not customizable
- referencing is not supported - use referencing in the 'elements' data type
- autogeneration of data types is not supported - but autogeneration within
existing data structure is supported (pass-through).
- Multitype in Multiype is not allowed
'multitype' Schema
type: 'multitype'
description: < str >
types:
- { data type schema }
'none' Data Type
'none' Description
This data type represents a none or null value.
'none' Limitations
- referencing is not supported
- autogeneration of data types is not supported - it is already none :wink:
'none' Schema Keys
type: 'none'
description: < str >
'object' Data Type
'object' Description
This data type represents an object with attributes.
Technically its a dictionary in Python.
Attributes of the object are keys in the dictionary.
'object' Limitations
- Not customizable
- autogeneration of missing keys is supported, if value data type supports autogeneration
'object' Schema (in Version 2)
type: 'object'
description: < str >
attributes:
< attribute_name >: { data type schema }
required_attributes:
- < attribute_name >
required_xor_attributes:
- - < attribute_name>
ignore_undefined_attributes: < bool | default -> false >
dependencies:
< attribute_name >:
requires:
- < attribute_name >
# List of attribute names, which must be in the object,
# if this attribute is in.
excludes:
- < attribute_name >
# List of attribute names, which must not be in the object,
# if this attribute is in.
allow_regex_attributes: < bool | default -> False >
autogenerate: < bool | default -> True >
reference:
{ reference options }
attributes:
- < attribute_name >
# List of attribute names, which values should be combined uniqueness check.
'object' Schema (in Version 1)
type: 'object'
description: < str >
attributes:
< attribute_name >: { data type schema }
required_attributes:
- < attribute_name >
ignore_undefined_attributes: < bool | default -> false >
dependencies:
< attribute_name >:
requires:
- < attribute_name >
# List of attribute names, which must be in the object,
# if this attribute is in.
excludes:
- < attribute_name >
# List of attribute names, which must not be in the object,
# if this attribute is in.
allow_regex_attributes: < bool | default -> False >
autogenerate: < bool | default -> True >
reference: { reference options }
reference_attributes:
- < attribute_name >
'schema' Data Type
'schema' Description
This data type does not represents an expected data value.
It uses a subschema's root data type to process the data structure.
'schema' Limitations
- Supported in Version 1 only (Use templates in version 2)
- Not customizable
'schema' Schema
type: 'schema'
description: < str >
subschema: < str >
'string' Data Type
'string' Description
This data type represents a string.
'string' Limitations
- autogeneration is not supported.
'string' Schema
type: 'string'
description: < str >
reference:
allow_namespace_lookups: < bool >
namespace_separator_char: < string >
minimum: <int | default -> 0 >
maximum: < int >
allowed_values:
- < string >
not_allowed_values:
- < string >
regex_mode: < bool | default -> false >
regex_multiline: < bool | default -> false >
regex_fullmatch: < bool | default -> true >
Special Data Types
'fqdn' Data Type
'fqdn' Description
This data type represents a full-qualified domain name. A FQDN consits of
a hostname (first label; defined by by RFC 953 and RFC 1123) and domain name
(remaining labels; defined by RFC 1035 section 2.3.1.).
RFCs allows and ignores upper cases in hostnames and domain labels.
This is not the default case in cd2t validation.
cd2t creates a finding on upper cases. If upper cases
should be ignored by cd2t, set option 'strict_lower' to false.
'fqdn' Limitations
'fqdn' Schema
type: 'fqdn'
minimum: < 4-255 | default -> 4 >
maximum: < 4-255 | default -> 255 >
minimum_labels: < int | default -> 2 >
maximum_labels: < int >
allowed_values:
- < string >
not_allowed_values:
- < string >
strict_lower: < bool | default -> true >
'hostname' Data Type
'hostname' Description
This data type represents a internet hostname defined by RFC 953 and RFC 1123.
Thus, a hostname contains ASCII character a through z, digits 0 through 9 or
hyphen-character '-'. A hostname is 1 to 63 characters long.
RFCs allows and ignores upper cases in hostnames. This is not the default case
in cd2t validation. cd2t creates a finding on upper cases. If upper cases
should be ignored by cd2t, set option 'strict_lower' to false.
'hostname' Limitations
'hostname' Schema
type: 'hostname'
minimum: < 1-63 | default -> 1 >
maximum: < 1-63 | default -> 63 >
allowed_values:
- < string >
not_allowed_values:
- < string >
strict_lower: < bool | default -> true >
'hostname' Validation Process
- If 'strict_lower' is true: Check for upper cases in hostname
- Check on allowed characters
- Chech minimum and maximum length
- Check on not allowed values
- Check on allowed values
'ip' Data Type
'ip' Description
This data type represents an IP object. Meaning, IPv4 or IPv6 as well as address,
network or interface.
'ip' Limitations
'ip' Schema
type: ip
version: < 4 | 6 >
loopback: < bool >
link_local: < bool >
private: < bool >
public: < bool >
multicast: < bool >
allowed_values:
- < IP address string >
# Data must match one of the values.
not_allowed_values:
- < IP address string >
# Data mustn't match all values.
'ip_address' Data Type
'ip_address' Description
This data type represents an IP address.
'ip_address' Limitations
'ip_address' Schema
type: ip
version: < 4 | 6 >
loopback: < bool >
link_local: < bool >
private: < bool >
public: < bool >
multicast: < bool >
allowed_values:
- < IP address string >
# Data must match one of the values.
not_allowed_values:
- < IP address string >
# Data mustn't match all values.
allowed_subnets:
- < IP network string >
# IP address must be within one of the networks
not_allowed_subnets:
- < IP network string >
# IP address mustn't be within any of the networks
'ip_network' Data Type
'ip_network' Description
This data type represents an IP network.
'ip_network' Limitations
'ip_network' Schema
type: ip
version: < 4 | 6 >
loopback: < bool >
link_local: < bool >
private: < bool >
public: < bool >
multicast: < bool >
allowed_values:
- < IP address string >
# Data must match one of the values.
not_allowed_values:
- < IP address string >
# Data mustn't match all values.
allowed_subnets:
- < IP network string >
# IP address must be within one of the networks
not_allowed_subnets:
- < IP network string >
# IP address mustn't be within any of the networks
minimum_prefix_length: < int >
# 0 < lenght < 32|128 (v4|v6)
maximum_prefix_length: < int >
# 'minimum_prefix_length' < lenght < 32|128 (v4|v6)
'ip_interface' Data Type
'ip_interface' Description
This data type represents an IP interface (address with subnet prefix length,
i.e. 10.1.1.21/24).
'ip_interface' Limitations
'ip_interface' Schema
type: ip
version: < 4 | 6 >
loopback: < bool >
link_local: < bool >
private: < bool >
public: < bool >
multicast: < bool >
allowed_values:
- < IP address string >
# Data must match one of the values.
not_allowed_values:
- < IP address string >
# Data mustn't match all values.
allowed_subnets:
- < IP network string >
# IP address must be within one of the networks
not_allowed_subnets:
- < IP network string >
# IP address mustn't be within any of the networks
General Data Type Options
Default Value
Defining a default value for documentation purpose. I.e. if object attribute is missing.
default_value: < value >
Description
Description of the value. Can be a single string or a list of strings.
description: < string | list[string] >
Referencing
Referencing achieves two validation goals:
- It can makes sure that a value is unique.
- It can makes sure that a value is defined at another place within the data
structure.
Basically a reference is defined by a key, so that one or more 'producers' of
values can match with one or more 'consumers' of values.
In addition, many options are available to implement 1:1, 1:n or n:m relations,
scope of 'reachability' of values with namespaces and more ...
Reference Options
If a data type supports referencing, these options are available.
reference:
key: < string >
mode: < 'unique' | 'producer' | 'consumer' | default -> 'unique' >
# - 'producer': collect values as allowed values for 'consumer' positions.
# - 'unique': Inherits 'producer' and checks uniqueness of the value
# among other values at other positions with the same key.
# - 'consumer': data value must match to a 'producer' value.
credits: < int >
# - 'Producer' and 'unique' mode have infinite credits by default.
# - If 0 with 'producer' or 'unique' mode, no 'consumer' is allowed.
# - If >0 with 'producer' or 'unique':
# - all producer's credits with same value are summed up,
# - a consumer is allowed, when enough credits are available,
# - each consumer's credits are subtracted from the producers credit sum.
# - In 'consumer' moder the credit value is 1 by default.
# - 'Consumer' credits must be greater than 0.
allow_orphan_producer: < bool | default -> true >
# If disabled, producer value without a consumer are not allowed.
unique_scope: < 'namespace' | 'global' | default -> 'global' >
# Ignored in 'provider' or 'consumer' mode
provider_scope: < 'namespace' | 'global' | default -> 'global' >
# 'ignored in 'consumer' mode
consumer_scope: < 'namespace' | 'global' | default -> 'global' >
# Ignored in 'unique' or 'provider' mode
# 'namespace' scopes to the same namespace data only.
# References across namespaces only works,
# if both 'ends' specify 'global'.
Feature Details
Custom Data Types
Introduced in version 2
Custom data types provides an efficient way to define data type and options,
which used at multiple locations within the data structure.
Data type options can be overwritten for each use.
Note: Custom data type names mustn't conflict with built-in data types.
Good practice is to start with a capital letter.
Example
root:
type: object
attributes:
name: string
age: NonNegativeInt
rate:
- type: NonNegativeInt
maximum: 100
custom_data_types:
NonNegativeInt:
type: integer
minimum: 0
Python Code Example
import os
import yaml
from cd2t import Validator
with open('my_schema.yml') as f:
schema = yaml.load(f)
validator = Validator()
validator.load_schema(schema)
results = []
for filename in os.listdir('./my_data_folder'):
with open(filename) as f:
test_data = yaml.load(f)
validator.change_namespace(filename)
_results = validator.validate_data(test_data)
results.extend(_results)
_results = validator.get_reference_findings()
results.extend(_results)
print('\n'.join(results))