Carica - A Python Configurator
Carica is a python application configurator, interfacing between a pure python config module, and TOML representation of that module.
Credits
A huge thank you goes to @sdispater, author of the fantastic tomlkit library, which makes this project's variable docstrings retaining features possible.
Project Goals
Python applications can be configured in a number of ways, each with its own advantages and limitations.
Common Configuration Methods
Method | Advantages | Problems |
---|
Environment variables/Command line arguments |
- Easy to handle in code
- Container/venv safe
|
- Not scalable to large numbers of variables
- Primative data types only
- Not human-friendly
- No typing in code
- No code autocompletion or other editor features
- Difficult to version control
|
TOML config file |
- Container/venv safe
- More scalable
- More expressive, with tables
- Easy to version control
- Human friendly
|
- Not easy to manage in code
- No code autocompletion or other editor features
- No dot syntax for objects
- No typing in code
|
Python module with variables |
- Easy to handle in code
- Easy to version control, with rich, human-readable diffs
- Highly scalable
- Completely expressive
- Dot syntax for objects
- Variable typing in code
- Complete language and editor features
|
- Not container/venv safe
- Not human-friendly
- Module must be accessible to the application namespace - difficult for packages
|
Carica aims to mix the best bits from two of the most convenient configuration methods, acting as an interface between pure python modules and TOML config files.
Basic Usage
To use Carica, your application configuration should be defined as a python module.
Example Application
loginApp.py
import cfg
import some_credentials_manager
import re
print(cfg.welcome_message)
new_user_data = {}
for field_name, field_config in cfg.new_user_required_fields.items():
print(field_config['display'] + ":")
new_value = input()
if re.match(new_value, field_config['validation_regex']):
new_user_data[field_name] = new_value
else:
raise ValueError(f"The value for {field_name} did not pass validation")
some_credentials_manager.create_user(new_user_data)
cfg.py
welcome_message = "Welcome to the application. Please create an account:"
new_user_required_fields = {
"username": {
"display": "user-name",
"validation_regex": "[a-z]+"
},
"password": {
"display": "pw",
"validation_regex": "\\b(?!password\\b)\\w+"
},
}
Default config generation
Carica is able to auto-generate a default TOML config file for your application, with the values specified in your python module as defaults:
>>> import cfg
>>> import carica
>>> carica.makeDefaultCfg(cfg)
Created defaultCfg.toml
The above code will produce the following file:
defaultCfg.toml
welcome_message = "Welcome to the application. Please create an account:"
[new_user_required_fields]
[new_user_required_fields.username]
display = "user-name"
validation_regex = "[a-z]+"
[new_user_required_fields.password]
display = "pw"
validation_regex = "\\b(?!password\\b)\\w+"
Loading a configuration file
Carica will map the variables given in your config file to those present in your python module.
Since the config python module contains default values, Carica does not require every variable to be specified:
myConfig.toml
[new_user_required_fields]
[new_user_required_fields.avatar]
display = "profile picture"
validation_regex = "[a-z]+"
>>> import cfg
>>> import carica
>>> carica.loadCfg(cfg, "myConfig.toml")
Config successfully loaded: myConfig.toml
>>> import loginApp
Welcome to the application. Please create an account:
profile picture:
123
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "loginApp.py", line 14, in <module>
raise ValueError(f"The value for {field_name} did not pass validation")
ValueError: The value for avatar did not pass validation
Variable Pseudo-Docstrings
When encountering a comment in your python config module, Carica will treat it as a variable 'docstring' in the following cases:
- Inline comments on the same line as a variable declaration
- Line comments immediately preceeding a variable declaration ('preceeding comments') *Beta feature: still in testing*
- Line comments immediately preceeding an existing preceeding comment *Beta feature: still in testing*
Carica will consider your variable docstrings when building TOML config files:
cfg.py
welcome_message = "Welcome to the application. Please create an account:"
new_user_required_fields = {
"username": {
"display": "user-name",
"validation_regex": "[a-z]+"
},
"password": {
"display": "pw",
"validation_regex": "\\b(?!password\\b)\\w+"
},
}
>>> import cfg
>>> import carica
>>> carica.makeDefaultCfg(cfg)
Created defaultCfg.toml
The above code will produce the following file:
defaultCfg.toml
welcome_message = "Welcome to the application. Please create an account:"
[new_user_required_fields]
[new_user_required_fields.username]
display = "user-name"
validation_regex = "[a-z]+"
[new_user_required_fields.password]
display = "pw"
validation_regex = "\\b(?!password\\b)\\w+"
Advanced Usage
Carica will handle non-primative variable types according to a very simple design pattern:
The SerializableType
type protocol
class SerializableType:
def serialize(self, **kwargs): ...
@classmethod
def deserialize(cls, data, **kwargs): ...
Any type which defines serialize
and deserialize
member methods will be automatically serialized during config generation, and deserialized on config loading.
serialize
must return a representation of your object with primative types - types which can be written to toml.deserialize
must be a class method, and should transform a serialized object representation into a new object.
Carica enforces this pattern on non-primative types using the SerializableType
type protocol, which allows for duck-typed serializable types. This protocol is exposed for use with isinstance
.
Projects which prefer strong typing may implement the carica.ISerializable
interface to enforce this pattern with inheritence. Carica will validate serialized objects against the carica.PrimativeType
type alias, which is also exposed for use.
Example
cfg.py
class MySerializableType:
def __init__(self, myField):
self.myField = myField
def serialize(self, **kwargs):
return {"myField": self.myField}
@classmethod
def deserialize(self, data, **kwargs):
return MySerializableClass(data["myField"])
mySerializableVar = MySerializableClass("hello")
Default config generation
>>> import cfg
>>> import carica
>>> carica.makeDefaultCfg(cfg)
Created defaultCfg.toml
The above code will produce the following file:
defaultCfg.toml
[mySerializableVar]
myField = "hello"
Config file loading
myConfig.toml
[mySerializableVar]
myField = "some changed value"
>>> import cfg
>>> import carica
>>> carica.loadCfg(cfg, "myConfig.toml")
Config successfully loaded: myConfig.toml
>>> cfg.mySerializableVar.myField
some changed value
Premade models
Carica provides serializable models that are ready to use (or extend) in your code. These models can be found in the carica.models
package, which is imported by default.
SerializableDataClass
Removes the need to write boilerplate serializing functionality for dataclasses. This class is intended to be extended, adding definitions for your dataclass's fields. Extensions of SerializableDataClass
must themselves be decorated with @dataclasses.dataclass
in order to function correctly.
SerializablePath
An OS-agnostic filesystem path, extending pathlib.Path
. The serializing/deserializing behaviour added by this class is minimal, a serialized SerializablePath
is simply the string representation of the path, for readability. All other behaviour of pathlib.Path
applies, for example. SerializablePath
can be instantiated from a single path: SerializablePath("my/directory/path")
, or from path segments: SerializablePath("my", "file", "path.toml")
.
SerializableTimedelta
datetime.datetime
is already considered a primitive type by TomlKit, and so no serializability needs to be added for you to use this class in your configs. However, datetime.timedelta
is not serializable by default. SerializableTimedelta
solves this issue as a serializable subclass. As a subclass, all timedelta
behaiour applies, including the usual constructor. In addition, SerializableTimedelta.fromTimedelta
is a convenience class method that accepts a datetime.timedelta
and constructs a new SerializableTimedelta
from it.
Premade models example
The recommended usage pattern for SerializableDataClass
is to separate your models into a separate module/package, allowing for 'schema' definition as python code. This pattern is not necessary, model definition can be done in your config file.
configSchema.py
from carica.models import SerializableDataClass
from dataclasses import dataclass
@dataclass
class UserDataField(SerializableDataClass):
name: str
validation_regex: str
config.py
from carica.models import SerializablePath, SerializableTimedelta
from configSchema import UserDataField
from datetime import datetime
new_user_required_fields = [
UserDataField(
name = "user-name"
validation_regex = "[a-z]+"
),
UserDataField(
name = "password"
validation_regex = "\\b(?!password\\b)\\w+"
)
]
database_path = SerializablePath("default/path.csv")
birthday = datetime(day=1, month=1, year=1500)
connection_timeout = SerializableTimedelta(minutes=5)
Planned features
- Preceeding comments: This functionality is 'complete' in that it functions as intended and passes all unit tests, however an issue needs to be worked aruond before the feature can be enabled: In order to disambiguate between variables and table fields, the TOML spec requires that arrays and tables be placed at the end of a document. Carica currently depends upon documents being rendered with variables appearing in the same order as they appear in the python config module, which is not guaranteed. This leads to trailing and otherwise misplaced preceeding comments.
- Config mutation: Carica should allow for loading an existing config, changing some values, and then updating the TOML document with new values. This should retain all formatting from the original document, including variable ordering and any comments that are not present in the python module.
Limitations
-
No support for schema migration
-
No support for asynchronous object serializing/deserializing
-
Imperfect estimation of variables defined in python modules: Listing the variables defined within a scope is not a known feature of python, and so Carica estimates this information by iterating over the tokens in your module. Carica does not build an AST of your python module.
This means that certain name definition structures will result in false positives/negatives. This behaviour has not been extensively tested, but once such false positive has been identified:
When invoking a callable (such as a class or function) with a keyword argument on a new, unindented line, the argument
name will be falsely identified as a variable name. E.g:
my_variable = dict(key1=value1,
key2=value2)
produces my_variable
and key2
as variable names.