In Python, you often need to dump and load objects based on the schema you
have. It can be a dataclass model, a list of third-party generic classes or
whatever. Mashumaro not only lets you save and load things in different ways,
but it also does it super quick.
Key features
🚀 One of the fastest libraries
☝️ Mature and time-tested
👶 Easy to use out of the box
⚙️ Highly customizable
🎉 Built-in support for JSON, YAML, TOML, MessagePack
📦 Built-in support for almost all Python types including typing-extensions
This library provides two fundamentally different approaches to converting
your data to and from various formats. Each of them is useful in different
situations:
Codecs
Mixins
Codecs are represented by a set of decoder / encoder classes and
decode / encode functions for each supported format. You can use them
to convert data of any python built-in and third-party type to JSON, YAML,
TOML, MessagePack or a basic form accepted by other serialization formats.
For example, you can convert a list of datetime objects to JSON array
containing string-represented datetimes and vice versa.
Mixins are primarily for dataclass models. They are represented by mixin
classes that add methods for converting to and from JSON, YAML, TOML,
MessagePack or a basic form accepted by other serialization formats.
If you have a root dataclass model, then it will be the easiest way to make it
serializable. All you have to do is inherit a particular mixin class.
In addition to serialization functionality, this library also provides JSON
Schema builder that can be used in places where interoperability matters.
Installation
Use pip to install:
$ pip install mashumaro
The current version of mashumaro supports Python versions 3.8 — 3.13.
It's not recommended to use any version of Python that has reached its
end of life and is no longer receiving
security updates or bug fixes from the Python development team.
For convenience, there is a table below that outlines the
last version of mashumaro that can be installed on unmaintained versions
of Python.
If we need to serialize something different from a root dataclass,
we can use codecs. In the following example we create a JSON decoder and encoder
for a list of currencies:
This library works by taking the schema of the data and generating a
specific decoder and encoder for exactly that schema, taking into account the
specifics of serialization format. This is much faster than inspection of
data types on every call of decoding or encoding at runtime.
These specific decoders and encoders are generated by
codecs and mixins:
When using codecs, these methods are compiled during the creation of the
decoder or encoder.
When using serialization
mixins, these methods are compiled during import time (or at runtime in some
cases) and are set as attributes to your dataclasses. To minimize the import
time, you can explicitly enable
lazy compilation.
Benchmark
macOS 14.0 Sonoma
Apple M1
16GB RAM
Python 3.12.0
Benchmark using pyperf with GitHub Issue model. Please note that the
following charts use logarithmic scale, as it is convenient for displaying
very large ranges of values.
[!NOTE]
Benchmark results may vary depending on the specific configuration and
parameters used for serialization and deserialization. However, we have made
an attempt to use the available options that can speed up and smooth out the
differences in how libraries work.
There are preconfigured codecs and mixin classes. However, you're free
to override some settings if necessary.
[!IMPORTANT]
As for codecs, you are
offered to choose between convenience and efficiency. When you need to decode
or encode typed data more than once, it's highly recommended to create
and reuse a decoder or encoder specifically for that data type. For one-time
use with default settings it may be convenient to use global functions that
create a disposable decoder or encoder under the hood. Remember that you
should not use these convenient global functions more that once for the same
data type if performance is important to you.
Basic form
Basic form denotes a python object consisting only of basic data types
supported by most serialization formats. These types are:
str,
int,
float,
bool,
list,
dict.
This is also a starting point you can play with for a comprehensive
transformation of your data.
Efficient decoder and encoder can be used as follows:
from mashumaro.codecs import BasicDecoder, BasicEncoder
# or from mashumaro.codecs.basic import BasicDecoder, BasicEncoder
decoder = BasicDecoder(<shape_type>, ...)
decoder.decode(...)
encoder = BasicEncoder(<shape_type>, ...)
encoder.encode(...)
Convenient functions are recommended to be used as follows:
import mashumaro.codecs.basic as basic_codec
basic_codec.decode(..., <shape_type>)
basic_codec.encode(..., <shape_type>)
Mixin can be used as follows:
from mashumaro import DataClassDictMixin
# or from mashumaro.mixins.dict import DataClassDictMixin@dataclassclassMyModel(DataClassDictMixin):
...
MyModel.from_dict(...)
MyModel(...).to_dict()
[!TIP]
You don't need to inherit DataClassDictMixin along with other serialization
mixins because it's a base class for them.
JSON
JSON is a lightweight data-interchange format. You can
choose between standard library
json for compatibility and
third-party dependency orjson for better
performance.
json library
Efficient decoder and encoder can be used as follows:
from mashumaro.codecs.orjson import json_decode, json_encode
json_decode(..., <shape_type>)
json_encode(..., <shape_type>)
Convenient function aliases are recommended to be used as follows:
import mashumaro.codecs.orjson as json_codec
json_codec.decode(...<shape_type>)
json_codec.encode(..., <shape_type>)
Mixin can be used as follows:
from mashumaro.mixins.orjson import DataClassORJSONMixin
@dataclassclassMyModel(DataClassORJSONMixin):
...
MyModel.from_json(...)
MyModel(...).to_json()
MyModel(...).to_jsonb()
YAML
YAML is a human-friendly data serialization language for
all programming languages. In order to use this format, the
pyyaml package must be installed.
You can install it manually or using an extra option for mashumaro:
pip install mashumaro[yaml]
Efficient decoder and encoder can be used as follows:
from mashumaro.codecs.yaml import yaml_decode, yaml_encode
yaml_decode(..., <shape_type>)
yaml_encode(..., <shape_type>)
Convenient function aliases are recommended to be used as follows:
import mashumaro.codecs.yaml as yaml_codec
yaml_codec.decode(...<shape_type>)
yaml_codec.encode(..., <shape_type>)
Mixin can be used as follows:
from mashumaro.mixins.yaml import DataClassYAMLMixin
@dataclassclassMyModel(DataClassYAMLMixin):
...
MyModel.from_yaml(...)
MyModel(...).to_yaml()
TOML
TOML is config file format for humans.
In order to use this format, the tomli and
tomli-w packages must be installed.
In Python 3.11+, tomli is included as
tomlib standard library
module and is used for this format. You can install the missing packages
manually or using an extra option
for mashumaro:
pip install mashumaro[toml]
The following data types will be handled by
tomli/
tomli-w library by default:
from mashumaro.codecs.toml import toml_decode, toml_encode
toml_decode(..., <shape_type>)
toml_encode(..., <shape_type>)
Convenient function aliases are recommended to be used as follows:
import mashumaro.codecs.toml as toml_codec
toml_codec.decode(...<shape_type>)
toml_codec.encode(..., <shape_type>)
Mixin can be used as follows:
from mashumaro.mixins.toml import DataClassTOMLMixin
@dataclassclassMyModel(DataClassTOMLMixin):
...
MyModel.from_toml(...)
MyModel(...).to_toml()
MessagePack
MessagePack is an efficient binary serialization format.
In order to use this mixin, the msgpack
package must be installed. You can install it manually or using an extra
option for mashumaro:
pip install mashumaro[msgpack]
The following data types will be handled by
msgpack library by default:
from mashumaro.codecs.msgpack import msgpack_decode, msgpack_encode
msgpack_decode(..., <shape_type>)
msgpack_encode(..., <shape_type>)
Convenient function aliases are recommended to be used as follows:
import mashumaro.codecs.msgpack as msgpack_codec
msgpack_codec.decode(...<shape_type>)
msgpack_codec.encode(..., <shape_type>)
Mixin can be used as follows:
from mashumaro.mixins.msgpack import DataClassMessagePackMixin
@dataclassclassMyModel(DataClassMessagePackMixin):
...
MyModel.from_msgpack(...)
MyModel(...).to_msgpack()
Customization
Customization options of mashumaro are extensive and will most likely cover your needs.
When it comes to non-standard data types and non-standard serialization support, you can do the following:
Turn an existing regular or generic class into a serializable one
by inheriting the SerializableType class
Write different serialization strategies for an existing regular or generic type that is not under your control
using SerializationStrategy class
Define serialization / deserialization methods:
for a specific dataclass field by using field options
for a specific data type used in the dataclass by using Config class
Alter input and output data with serialization / deserialization hooks
Separate serialization scheme from a dataclass in a reusable manner using dialects
Choose from predefined serialization engines for the specific data types, e.g. datetime and NamedTuple
SerializableType interface
If you have a custom class or hierarchy of classes whose instances you want
to serialize with mashumaro, the first option is to implement
SerializableType interface.
User-defined types
Let's look at this not very practicable example:
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.types import SerializableType
classAirport(SerializableType):
def__init__(self, code, city):
self.code, self.city = code, city
def_serialize(self):
return [self.code, self.city]
@classmethoddef_deserialize(cls, value):
return cls(*value)
def__eq__(self, other):
return self.code, self.city == other.code, other.city
@dataclassclassFlight(DataClassDictMixin):
origin: Airport
destination: Airport
JFK = Airport("JFK", "New York City")
LAX = Airport("LAX", "Los Angeles")
input_data = {
"origin": ["JFK", "New York City"],
"destination": ["LAX", "Los Angeles"]
}
my_flight = Flight.from_dict(input_data)
assert my_flight == Flight(JFK, LAX)
assert my_flight.to_dict() == input_data
You can see how Airport instances are seamlessly created from lists of two
strings and serialized into them.
By default _deserialize method will get raw input data without any
transformations before. This should be enough in many cases, especially when
you need to perform non-standard transformations yourself, but let's extend
our example:
If we pass the flight list as is into Itinerary._deserialize, our itinerary
will have something that we may not expect — list[dict] instead of
list[Flight]. The solution is quite simple. Instead of calling
Flight._deserialize yourself, just use annotations:
Here we add annotations to the only argument of _deserialize method and
to the return value of _serialize method as well. The latter is needed for
correct serialization.
[!IMPORTANT]
The importance of explicit passing use_annotations=True when defining a
class is that otherwise implicit using annotations might break compatibility
with old code that wasn't aware of this feature. It will be enabled by
default in the future major release.
User-defined generic types
The great thing to note about using annotations in SerializableType is that
they work seamlessly with generic
and variadic generic types.
Let's see how this can be useful:
You can see that formatted date is deserialized to date object before passing
to DictWrapper._deserialize in a key or value according to the generic
parameters.
If you have generic dataclass types, you can use SerializableType for them as well, but it's not necessary since
they're supported out of the box.
SerializationStrategy
If you want to add support for a custom third-party type that is not under your control,
you can write serialization and deserialization logic inside SerializationStrategy class,
which will be reusable and so well suited in case that third-party type is widely used.
SerializationStrategy is also good if you want to create strategies that are slightly different from each other,
because you can add the strategy differentiator in the __init__ method.
Third-party types
To demonstrate how SerializationStrategy works let's write a simple strategy for datetime serialization
in different formats. In this example we will use the same strategy class for two dataclass fields,
but a string representing the date and time will be different.
There is no need to add use_annotations=True here because it's enabled implicitly
for generic serialization strategies.
For example, there is a third-party multidict package that has a generic MultiDict type.
A generic serialization strategy for it might look like this:
from dataclasses import dataclass
from datetime import date
from pprint import pprint
from typing importGeneric, List, Tuple, TypeVar
from mashumaro import DataClassDictMixin
from mashumaro.types import SerializationStrategy
from multidict import MultiDict
T = TypeVar("T")
classMultiDictSerializationStrategy(SerializationStrategy, Generic[T]):
defserialize(self, value: MultiDict[T]) -> List[Tuple[str, T]]:
return [(k, v) for k, v in value.items()]
defdeserialize(self, value: List[Tuple[str, T]]) -> MultiDict[T]:
return MultiDict(value)
@dataclassclassExample(DataClassDictMixin):
floats: MultiDict[float]
date_lists: MultiDict[List[date]]
classConfig:
serialization_strategy = {
MultiDict: MultiDictSerializationStrategy()
}
example = Example(
floats=MultiDict([("x", 1.1), ("x", 2.2)]),
date_lists=MultiDict(
[("x", [date(2023, 1, 1), date(2023, 1, 2)]),
("x", [date(2023, 2, 1), date(2023, 2, 2)])]
),
)
pprint(example.to_dict())
# {'date_lists': [['x', ['2023-01-01', '2023-01-02']],# ['x', ['2023-02-01', '2023-02-02']]],# 'floats': [['x', 1.1], ['x', 2.2]]}assert Example.from_dict(example.to_dict()) == example
Field options
In some cases creating a new class just for one little thing could be
excessive. Moreover, you may need to deal with third party classes that you are
not allowed to change. You can use dataclasses.field function to
configure some serialization aspects through its metadata parameter. Next
section describes all supported options to use in metadata mapping.
If you don't want to remember the names of the options you can use
field_options helper function:
from dataclasses import dataclass, field
from mashumaro import field_options
@dataclassclassA:
x: int = field(metadata=field_options(...))
serialize option
This option allows you to change the serialization method. When using
this option, the serialization behaviour depends on what type of value the
option has. It could be either Callable[[Any], Any] or str.
A value of type Callable[[Any], Any] is a generic way to specify any callable
object like a function, a class method, a class instance method, an instance
of a callable class or even a lambda function to be called for serialization.
A value of type str sets a specific engine for serialization. Keep in mind
that all possible engines depend on the data type that this option is used
with. At this moment there are next serialization engines to choose from:
Applicable data types
Supported engines
Description
NamedTuple, namedtuple
as_list, as_dict
How to pack named tuples. By default as_list engine is used that means your named tuple class instance will be packed into a list of its values. You can pack it into a dictionary using as_dict engine.
Any
omit
Skip the field during serialization
[!TIP]
You can pass a field value as is without changes on serialization using
pass_through.
Example:
from datetime import datetime
from dataclasses import dataclass, field
from typing import NamedTuple
from mashumaro import DataClassDictMixin
classMyNamedTuple(NamedTuple):
x: int
y: float@dataclassclassA(DataClassDictMixin):
dt: datetime = field(
metadata={
"serialize": lambda v: v.strftime('%Y-%m-%d %H:%M:%S')
}
)
t: MyNamedTuple = field(metadata={"serialize": "as_dict"})
deserialize option
This option allows you to change the deserialization method. When using
this option, the deserialization behaviour depends on what type of value the
option has. It could be either Callable[[Any], Any] or str.
A value of type Callable[[Any], Any] is a generic way to specify any callable
object like a function, a class method, a class instance method, an instance
of a callable class or even a lambda function to be called for deserialization.
A value of type str sets a specific engine for deserialization. Keep in mind
that all possible engines depend on the data type that this option is used
with. At this moment there are next deserialization engines to choose from:
How to parse datetime string. By default native fromisoformat of corresponding class will be used for datetime, date and time fields. It's the fastest way in most cases, but you can choose an alternative.
NamedTuple, namedtuple
as_list, as_dict
How to unpack named tuples. By default as_list engine is used that means your named tuple class instance will be created from a list of its values. You can unpack it from a dictionary using as_dict engine.
[!TIP]
You can pass a field value as is without changes on deserialization using
pass_through.
Example:
from datetime import datetime
from dataclasses import dataclass, field
from typing importList, NamedTuple
from mashumaro import DataClassDictMixin
import ciso8601
import dateutil
classMyNamedTuple(NamedTuple):
x: int
y: float@dataclassclassA(DataClassDictMixin):
x: datetime = field(
metadata={"deserialize": "pendulum"}
)
classB(DataClassDictMixin):
x: datetime = field(
metadata={"deserialize": ciso8601.parse_datetime_as_naive}
)
@dataclassclassC(DataClassDictMixin):
dt: List[datetime] = field(
metadata={
"deserialize": lambda l: list(map(dateutil.parser.isoparse, l))
}
)
@dataclassclassD(DataClassDictMixin):
x: MyNamedTuple = field(metadata={"deserialize": "as_dict"})
serialization_strategy option
This option is useful when you want to change the serialization logic
for a dataclass field depending on some defined parameters using a reusable
serialization scheme. You can find an example in the
SerializationStrategy chapter.
[!TIP]
You can pass a field value as is without changes on
serialization / deserialization using
pass_through.
from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, field_options
@dataclassclassDataClass(DataClassDictMixin):
a: int = field(metadata=field_options(alias="FieldA"))
b: int = field(metadata=field_options(alias="#invalid"))
x = DataClass.from_dict({"FieldA": 1, "#invalid": 2}) # DataClass(a=1, b=2)
Config options
If inheritance is not an empty word for you, you'll fall in love with the
Config class. You can register serialize and deserialize methods, define
code generation options and other things just in one place. Or in some
classes in different ways if you need flexibility. Inheritance is always on the
first place.
There is a base class BaseConfig that you can inherit for the sake of
convenience, but it's not mandatory.
In the following example you can see how
the debug flag is changed from class to class: ModelA will have debug mode enabled but
ModelB will not.
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
classBaseModel(DataClassDictMixin):
classConfig(BaseConfig):
debug = TrueclassModelA(BaseModel):
a: intclassModelB(BaseModel):
b: intclassConfig(BaseConfig):
debug = False
Next section describes all supported options to use in the config.
debug config option
If you enable the debug option the generated code for your data class
will be printed.
code_generation_options config option
Some users may need functionality that wouldn't exist without extra cost such
as valuable cpu time to execute additional instructions. Since not everyone
needs such instructions, they can be enabled by a constant in the list,
so the fastest basic behavior of the library will always remain by default.
The following table provides a brief overview of all the available constants
described below.
Adds context keyword-only argument to to_* methods.
serialization_strategy config option
You can register custom SerializationStrategy, serialize and deserialize
methods for specific types just in one place. It could be configured using
a dictionary with types as keys. The value could be either a
SerializationStrategy instance or a dictionary with serialize and
deserialize values with the same meaning as in the
field options.
from dataclasses import dataclass
from datetime import datetime, date
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
from mashumaro.types import SerializationStrategy
classFormattedDateTime(SerializationStrategy):
def__init__(self, fmt):
self.fmt = fmt
defserialize(self, value: datetime) -> str:
return value.strftime(self.fmt)
defdeserialize(self, value: str) -> datetime:
return datetime.strptime(value, self.fmt)
@dataclassclassDataClass(DataClassDictMixin):
x: datetime
y: date
classConfig(BaseConfig):
serialization_strategy = {
datetime: FormattedDateTime("%Y"),
date: {
# you can use specific str values for datetime here as well"deserialize": "pendulum",
"serialize": date.isoformat,
},
}
instance = DataClass.from_dict({"x": "2021", "y": "2021"})
# DataClass(x=datetime.datetime(2021, 1, 1, 0, 0), y=Date(2021, 1, 1))
dictionary = instance.to_dict()
# {'x': '2021', 'y': '2021-01-01'}
Note that you can register different methods for multiple logical types which
are based on the same type using NewType and Annotated,
see Extending existing types for details.
It's also possible to define a generic (de)serialization method for a generic
type by registering a method for its
origin type.
Although this technique is widely used when working with third-party generic
types using generic strategies, it can also be
applied in simple scenarios:
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
@dataclassclassC(DataClassDictMixin):
ints: list[int]
floats: list[float]
classConfig:
serialization_strategy = {
list: { # origin type for list[int] and list[float] is list"serialize": lambda x: list(map(str, x)),
}
}
assert C([1], [2.2]).to_dict() == {'ints': ['1'], 'floats': ['2.2']}
aliases config option
Sometimes it's better to write the field aliases in one place. You can mix
aliases here with aliases in the field options, but the last ones will always
take precedence.
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
@dataclassclassDataClass(DataClassDictMixin):
a: int
b: intclassConfig(BaseConfig):
aliases = {
"a": "FieldA",
"b": "FieldB",
}
DataClass.from_dict({"FieldA": 1, "FieldB": 2}) # DataClass(a=1, b=2)
serialize_by_alias config option
All the fields with aliases will be serialized by them by
default when this option is enabled. You can mix this config option with
by_alias keyword argument.
from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, field_options
from mashumaro.config import BaseConfig
@dataclassclassDataClass(DataClassDictMixin):
field_a: int = field(metadata=field_options(alias="FieldA"))
classConfig(BaseConfig):
serialize_by_alias = True
DataClass(field_a=1).to_dict() # {'FieldA': 1}
allow_deserialization_not_by_alias config option
When using aliases, the deserializer defaults to requiring the keys to match
what is defined as the alias.
If the flexibility to deserialize aliased and unaliased keys is required then
the config option allow_deserialization_not_by_alias can be set to
enable the feature.
from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
@dataclassclassAliasedDataClass(DataClassDictMixin):
foo: int = field(metadata={"alias": "alias_foo"})
bar: int = field(metadata={"alias": "alias_bar"})
classConfig(BaseConfig):
allow_deserialization_not_by_alias = True
alias_dict = {"alias_foo": 1, "alias_bar": 2}
t1 = AliasedDataClass.from_dict(alias_dict)
no_alias_dict = {"foo": 1, "bar": 2}
# This would raise `mashumaro.exceptions.MissingField`# if allow_deserialization_not_by_alias was False
t2 = AliasedDataClass.from_dict(no_alias_dict)
assert t1 == t2
omit_none config option
All the fields with None values will be skipped during serialization by
default when this option is enabled. You can mix this config option with
omit_none keyword argument.
from dataclasses import dataclass
from typing importOptionalfrom mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
@dataclassclassDataClass(DataClassDictMixin):
x: Optional[int] = 42classConfig(BaseConfig):
omit_none = True
DataClass(x=None).to_dict() # {}
omit_default config option
When this option enabled, all the fields that have values equal to the defaults
or the default_factory results will be skipped during serialization.
Dataclasses are a great way to declare and use data models. But it's not the only way.
Python has a typed version of namedtuple
called NamedTuple
which looks similar to dataclasses:
from typing import NamedTuple
classPoint(NamedTuple):
x: int
y: int
the same with a dataclass will look like this:
from dataclasses import dataclass
@dataclassclassPoint:
x: int
y: int
At first glance, you can use both options. But imagine that you need to create
a bunch of instances of the Point class. Due to how dataclasses work you will
have more memory consumption compared to named tuples. In such a case it could
be more appropriate to use named tuples.
By default, all named tuples are packed into lists. But with namedtuple_as_dict
option you have a drop-in replacement for dataclasses:
If you want to serialize only certain named tuple fields as dictionaries, you
can use the corresponding serialization and
deserialization engines.
allow_postponed_evaluation config option
PEP 563 solved the problem of forward references by postponing the evaluation
of annotations, so you can write the following code:
from __future__ import annotations
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
@dataclassclassA(DataClassDictMixin):
x: B
@dataclassclassB(DataClassDictMixin):
y: int
obj = A.from_dict({'x': {'y': 1}})
You don't need to write anything special here, forward references work out of
the box. If a field of a dataclass has a forward reference in the type
annotations, building of from_* and to_* methods of this dataclass
will be postponed until they are called once. However, if for some reason you
don't want the evaluation to be possibly postponed, you can disable it using
allow_postponed_evaluation option:
from __future__ import annotations
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
@dataclassclassA(DataClassDictMixin):
x: B
classConfig:
allow_postponed_evaluation = False# UnresolvedTypeReferenceError: Class A has unresolved type reference B# in some of its fields@dataclassclassB(DataClassDictMixin):
y: int
In this case you will get UnresolvedTypeReferenceError regardless of whether
class B is declared below or not.
dialect config option
This option is described below in the
Dialects section.
orjson_options config option
This option changes default options for orjson.dumps encoder which is
used in DataClassORJSONMixin. For example, you can
tell orjson to handle non-strdict keys as the built-in json.dumps
encoder does. See orjson documentation
to read more about these options.
import orjson
from dataclasses import dataclass
from typing importDictfrom mashumaro.config import BaseConfig
from mashumaro.mixins.orjson import DataClassORJSONMixin
@dataclassclassMyClass(DataClassORJSONMixin):
x: Dict[int, int]
classConfig(BaseConfig):
orjson_options = orjson.OPT_NON_STR_KEYS
assert MyClass({1: 2}).to_json() == {"1": 2}
By using this option, the compilation of the from_* and to_* methods will
be deferred until they are called first time. This will reduce the import time
and, in certain instances, may enhance the speed of deserialization
by leveraging the data that is accessible after the class has been created.
[!CAUTION]
If you need to save a reference to from_* or to_* method, you should
do it after the method is compiled. To be safe, you can always use lambda
function:
When set, the keys on serialized dataclasses will be sorted in alphabetical order.
Unlike the sort_keys option in the standard library's json.dumps function, this option acts at class creation time and has no effect on the performance of serialization.
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
@dataclassclassSortedDataClass(DataClassDictMixin):
foo: int
bar: intclassConfig(BaseConfig):
sort_keys = True
t = SortedDataClass(1, 2)
assert t.to_dict() == {"bar": 2, "foo": 1}
forbid_extra_keys config option
When set, the deserialization of dataclasses will fail if the input dictionary contains keys that are not present in the dataclass.
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
@dataclassclassDataClass(DataClassDictMixin):
a: intclassConfig(BaseConfig):
forbid_extra_keys = True
DataClass.from_dict({"a": 1, "b": 2}) # ExtraKeysError: Extra keys: {'b'}
It plays well with aliases and allow_deserialization_not_by_alias options.
Passing field values as is
In some cases it's needed to pass a field value as is without any changes
during serialization / deserialization. There is a predefined
pass_through
object that can be used as serialization_strategy or
serialize / deserialize options:
There are situations where you might want some values of the same type to be
treated as their own type. You can create new logical types with
NewType,
Annotated
or TypeAliasType
and register serialization strategies for them:
Although using NewType is usually the most reliable way to avoid logical
errors, you have to pay for it with notable overhead. If you are creating
dataclass instances manually, then you know that type checkers will
enforce you to enclose a value in your "NewType" callable, which leads
to performance degradation:
python -m timeit -s "from typing import NewType; MyInt = NewType('MyInt', int)""MyInt(42)"10000000 loops, best of 5: 31.1 nsec per loop
python -m timeit -s "from typing import NewType; MyInt = NewType('MyInt', int)""42"50000000 loops, best of 5: 4.35 nsec per loop
However, when you create dataclass instances using the from_* method provided
by one of the mixins or using one of the decoders, there will be no performance
degradation, because the value won't be enclosed in the callable in the
generated code. Therefore, if performance is more important to you than
catching logical errors by type checkers, and you are actively creating or
changing dataclasses manually, then you should take a closer look at using
Annotated.
Field aliases
In some cases it's better to have different names for a field in your dataclass
and in its serialized view. For example, a third-party legacy API you are
working with might operate with camel case style, but you stick to snake case
style in your code base. Or you want to load data with keys that are
invalid identifiers in Python. Aliases can solve this problem.
There are multiple ways to assign an alias:
Using Alias(...) annotation in a field type
Using alias parameter in field metadata
Using aliases parameter in a dataclass config
By default, aliases only affect deserialization, but it can be extended to
serialization as well. If you want to serialize all the fields by aliases you
have two options to do so:
Here is an example with Alias annotation in a field type:
from dataclasses import dataclass
from typing import Annotated
from mashumaro import DataClassDictMixin
from mashumaro.types import Alias
@dataclassclassDataClass(DataClassDictMixin):
foo_bar: Annotated[int, Alias("fooBar")]
obj = DataClass.from_dict({"fooBar": 42}) # DataClass(foo_bar=42)
obj.to_dict() # {"foo_bar": 42} # no aliases on serialization by default
The same with field metadata:
from dataclasses import dataclass, field
from mashumaro import field_options
@dataclassclassDataClass:
foo_bar: str = field(metadata=field_options(alias="fooBar"))
And with a dataclass config:
from dataclasses import dataclass
from mashumaro.config import BaseConfig
@dataclassclassDataClass:
foo_bar: strclassConfig(BaseConfig):
aliases = {"foo_bar": "fooBar"}
[!TIP]
If you want to deserialize all the fields by its names along with aliases,
there is a config option
for that.
Dialects
Sometimes it's needed to have different serialization and deserialization
methods depending on the data source where entities of the dataclass are
stored or on the API to which the entities are being sent or received from.
There is a special Dialect type that may contain all the differences from the
default serialization and deserialization methods. You can create different
dialects and use each of them for the same dataclass depending on
the situation.
Suppose we have the following dataclass with a field of type date:
@dataclassclassEntity(DataClassDictMixin):
dt: date
By default, a field of date type serializes to a string in ISO 8601 format,
so the serialized entity will look like {'dt': '2021-12-31'}. But what if we
have, for example, two sensitive legacy Ethiopian and Japanese APIs that use
two different formats for dates — dd/mm/yyyy and yyyy年mm月dd日? Instead of
creating two similar dataclasses we can have one dataclass and two dialects:
This dialect option has the same meaning as the
similar config option
but for the dialect scope. You can register custom SerializationStrategy,
serialize and deserialize methods for the specific types.
serialize_by_alias dialect option
This dialect option has the same meaning as the
similar config option
but for the dialect scope.
omit_none dialect option
This dialect option has the same meaning as the
similar config option but for the dialect scope.
omit_default dialect option
This dialect option has the same meaning as the
similar config option but for the dialect scope.
namedtuple_as_dict dialect option
This dialect option has the same meaning as the
similar config option
but for the dialect scope.
no_copy_collections dialect option
By default, all collection data types are serialized as a copy to prevent
mutation of the original collection. As an example, if a dataclass contains
a field of type list[str], then it will be serialized as a copy of the
original list, so you can safely mutate it after. The downside is that copying
is always slower than using a reference to the original collection. In some
cases we know beforehand that mutation doesn't take place or is even desirable,
so we can benefit from avoiding unnecessary copies by setting
no_copy_collections to a sequence of origin collection data types.
This is applicable only for collections containing elements that do not
require conversion.
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
from mashumaro.dialect import Dialect
classNoCopyDialect(Dialect):
no_copy_collections = (list, dict, set)
@dataclassclassDataClass(DataClassDictMixin):
simple_list: list[str]
simple_dict: dict[str, str]
simple_set: set[str]
classConfig(BaseConfig):
dialect = NoCopyDialect
obj = DataClass(["foo"], {"bar": "baz"}, {"foobar"})
data = obj.to_dict()
assert data["simple_list"] is obj.simple_list
assert data["simple_dict"] is obj.simple_dict
assert data["simple_set"] is obj.simple_set
This option is enabled for list and dict in the default dialects that
belong to mixins and codecs for the following formats:
You can change the default serialization and deserialization methods not only
in the serialization_strategy config
option but also using the dialect config option. If you have multiple
dataclasses without a common parent class the default dialect can help you
to reduce the number of code lines written:
There is a special Discriminator class that allows you to customize how
a union of dataclasses or their hierarchy will be deserialized.
It has the following parameters that affects class selection rules:
field — optional name of the input dictionary key (also known as tag)
by which all the variants can be distinguished
include_subtypes — allow to deserialize subclasses
include_supertypes — allow to deserialize superclasses
variant_tagger_fn — a custom function used to generate tag values
associated with a variant
By default, each variant that you want to discriminate by tags should have a
class-level attribute containing an associated tag value. This attribute should
have a name defined by field parameter. The tag value coule be in the
following forms:
without annotations: type = 42
annotated as ClassVar: type: ClassVar[int] = 42
annotated as Final: type: Final[int] = 42
annotated as Literal: type: Literal[42] = 42
annotated as StrEnum: type: ResponseType = ResponseType.OK
[!NOTE]
Keep in mind that by default only Final, Literal and StrEnum fields are
processed during serialization.
However, it is possible to use discriminator without the class-level
attribute. You can provide a custom function that generates one or many variant
tag values. This function should take a class as the only argument and return
either a single value of the basic type like str or int or a list of them
to associate multiple tags with a variant. The common practice is to use
a class name as a single tag value:
variant_tagger_fn = lambda cls: cls.__name__
Next, we will look at different use cases, as well as their pros and cons.
Subclasses distinguishable by a field
Often you have a base dataclass and multiple subclasses that are easily
distinguishable from each other by the value of a particular field.
For example, there may be different events, messages or requests with
a discriminator field "event_type", "message_type" or just "type". You could've
listed all of them within Union type, but it would be too verbose and
impractical. Moreover, deserialization of the union would be slow, since we
need to iterate over each variant in the list until we find the right one.
We can improve subclass deserialization using Discriminator as annotation
within Annotated type. We will use field parameter and set
include_subtypes to True.
[!IMPORTANT]
The discriminator field should be accessible from the __dict__ attribute
of a specific descendant, i.e. defined at the level of that descendant.
A descendant class without a discriminator field will be ignored, but
its descendants won't.
Suppose we have a hierarchy of client events distinguishable by a class
attribute "type":
from dataclasses import dataclass
from ipaddress import IPv4Address
from mashumaro import DataClassDictMixin
@dataclassclassClientEvent(DataClassDictMixin):
pass@dataclassclassClientConnectedEvent(ClientEvent):
type = "connected"
client_ip: IPv4Address
@dataclassclassClientDisconnectedEvent(ClientEvent):
type = "disconnected"
client_ip: IPv4Address
We use base dataclass ClientEvent for a field of another dataclass:
from typing import Annotated, List# or from typing_extensions import Annotatedfrom mashumaro.types import Discriminator
@dataclassclassAggregatedEvents(DataClassDictMixin):
list: List[
Annotated[
ClientEvent, Discriminator(field="type", include_subtypes=True)
]
]
Now we can deserialize events based on "type" value:
In rare cases you have to deal with subclasses that don't have a common field
name which they can be distinguished by. Since Discriminator can be
initialized without "field" parameter you can use it with only
include_subclasses enabled. The drawback is that we will have to go through all
the subclasses until we find the suitable one. It's almost like using Union
type but with subclasses support.
Suppose we're making a brunch. We have some ingredients:
plate = Plate.from_dict(
{
"ingredients": [
{
"name": "hummus from the shop",
"made_of": "chickpeas",
"grams": 150,
},
{"name": "celery from my garden", "pieces": 5},
]
}
)
assert plate == Plate(
ingredients=[
Hummus(name="hummus from the shop", made_of="chickpeas", grams=150),
Celery(name="celery from my garden", pieces=5),
]
)
In some cases it's necessary to fall back to the base class if there is no
suitable subclass. We can set include_supertypes to True:
@dataclassclassPlate(DataClassDictMixin):
ingredients: List[
Annotated[
Ingredient,
Discriminator(include_subtypes=True, include_supertypes=True),
]
]
plate = Plate.from_dict(
{
"ingredients": [
{
"name": "hummus from the shop",
"made_of": "chickpeas",
"grams": 150,
},
{"name": "celery from my garden", "pieces": 5},
{"name": "cumin"} # <- new unknown ingredient
]
}
)
assert plate == Plate(
ingredients=[
Hummus(name="hummus from the shop", made_of="chickpeas", grams=150),
Celery(name="celery from my garden", pieces=5),
Ingredient(name="cumin"), # <- unknown ingredient added
]
)
Class level discriminator
It may often be more convenient to specify a Discriminator once at the class
level and use that class without Annotated type for subclass deserialization.
Depending on the Discriminator parameters, it can be used as a replacement for
subclasses distinguishable by a field
as well as for subclasses without a common field.
The only difference is that you can't use include_supertypes=True because
it would lead to a recursion error.
Reworked example will look like this:
from dataclasses import dataclass
from ipaddress import IPv4Address
from typing importListfrom mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
from mashumaro.types import Discriminator
@dataclassclassClientEvent(DataClassDictMixin):
classConfig(BaseConfig):
discriminator = Discriminator( # <- add discriminator
field="type",
include_subtypes=True,
)
@dataclassclassClientConnectedEvent(ClientEvent):
type = "connected"
client_ip: IPv4Address
@dataclassclassClientDisconnectedEvent(ClientEvent):
type = "disconnected"
client_ip: IPv4Address
@dataclassclassAggregatedEvents(DataClassDictMixin):
list: List[ClientEvent] # <- use base class here
And now we can deserialize events based on "type" value as we did earlier:
The same is applicable for subclasses without a common field:
@dataclassclassIngredient(DataClassDictMixin):
name: strclassConfig:
discriminator = Discriminator(include_subtypes=True)
...
celery = Ingredient.from_dict({"name": "celery from my garden", "pieces": 5})
assert celery == Celery(name="celery from my garden", pieces=5)
Working with union of classes
Deserialization of union of types distinguishable by a particular field will
be much faster using Discriminator because there will be no traversal
of all classes and an attempt to deserialize each of them.
Usually this approach can be used when you have multiple classes without a
common superclass or when you only need to deserialize some of the subclasses.
In the following example we will use include_supertypes=True to
deserialize two subclasses out of three:
Again, it's not necessary to have a common superclass. If you have a union of
dataclasses without a field that they can be distinguishable by, you can still
use Discriminator, but deserialization will almost be the same as for Union
type without Discriminator except that it could be possible to deserialize
subclasses with include_subtypes=True.
[!IMPORTANT]
When both include_subtypes and include_supertypes are enabled,
all subclasses will be attempted to be deserialized first,
superclasses — at the end.
In the following example you can see how priority works — first we try
to deserialize ChickpeaHummus, and if it fails, then we try Hummus:
Sometimes it is impractical to have a class-level attribute with a tag value,
especially when you have a lot of classes. We can have a custom tagger
function instead. This method is applicable for all scenarios of using
the discriminator, but for demonstration purposes, let's focus only on one
of them.
Suppose we want to use the middle part of Client*Event as a tag value:
from dataclasses import dataclass
from ipaddress import IPv4Address
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig
from mashumaro.types import Discriminator
defclient_event_tagger(cls):
# not the best way of doing it, it's just a demoreturn cls.__name__[6:-5].lower()
@dataclassclassClientEvent(DataClassDictMixin):
classConfig(BaseConfig):
discriminator = Discriminator(
field="type",
include_subtypes=True,
variant_tagger_fn=client_event_tagger,
)
@dataclassclassClientConnectedEvent(ClientEvent):
client_ip: IPv4Address
@dataclassclassClientDisconnectedEvent(ClientEvent):
client_ip: IPv4Address
If we need to associate multiple tags with a single variant, we can return
a list of tags:
defclient_event_tagger(cls):
name = cls.__name__[6:-5]
return [name.lower(), name.upper()]
Code generation options
Add omit_none keyword argument
If you want to have control over whether to skip None values on serialization
you can add omit_none parameter to to_* methods using the
code_generation_options list. The default value of omit_none
parameter depends on whether the omit_none
config option or omit_none dialect option is enabled.
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig, TO_DICT_ADD_OMIT_NONE_FLAG
@dataclassclassInner(DataClassDictMixin):
x: int = None# "x" won't be omitted since there is no TO_DICT_ADD_OMIT_NONE_FLAG here@dataclassclassModel(DataClassDictMixin):
x: Inner
a: int = None
b: str = None# will be omittedclassConfig(BaseConfig):
code_generation_options = [TO_DICT_ADD_OMIT_NONE_FLAG]
Model(x=Inner(), a=1).to_dict(omit_none=True) # {'x': {'x': None}, 'a': 1}
Add by_alias keyword argument
If you want to have control over whether to serialize fields by their
aliases you can add by_alias parameter to to_* methods
using the code_generation_options list. The default value of by_alias
parameter depends on whether the serialize_by_alias
config option is enabled.
from dataclasses import dataclass, field
from mashumaro import DataClassDictMixin, field_options
from mashumaro.config import BaseConfig, TO_DICT_ADD_BY_ALIAS_FLAG
@dataclassclassDataClass(DataClassDictMixin):
field_a: int = field(metadata=field_options(alias="FieldA"))
classConfig(BaseConfig):
code_generation_options = [TO_DICT_ADD_BY_ALIAS_FLAG]
DataClass(field_a=1).to_dict() # {'field_a': 1}
DataClass(field_a=1).to_dict(by_alias=True) # {'FieldA': 1}
Add dialect keyword argument
Support for dialects is disabled by default for performance reasons. You can enable
it using a ADD_DIALECT_SUPPORT constant:
from dataclasses import dataclass
from datetime import date
from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig, ADD_DIALECT_SUPPORT
@dataclassclassEntity(DataClassDictMixin):
dt: date
classConfig(BaseConfig):
code_generation_options = [ADD_DIALECT_SUPPORT]
Add context keyword argument
Sometimes it's needed to pass a "context" object to the serialization hooks
that will take it into account. For example, you could want to have an option
to remove sensitive data from the serialization result if you need to.
You can add context parameter to to_* methods that will be passed to
__pre_serialize__ and
__post_serialize__ hooks. The type of this context
as well as its mutability is up to you.
Along with user-defined generic types
implementing SerializableType interface, generic and variadic
generic dataclasses can also be used. There are two applicable scenarios
for them.
Generic dataclass inheritance
If you have a generic dataclass and want to serialize and deserialize its
instances depending on the concrete types, you can use inheritance for that:
You can override TypeVar field with a concrete type or another TypeVar.
Partial specification of concrete types is also allowed. If a generic dataclass
is inherited without type overriding the types of its fields remain untouched.
Generic dataclass in a field type
Another approach is to specify concrete types in the field type hints. This can
help to have different versions of the same generic dataclass:
from dataclasses import dataclass
from datetime import date
from typing importGeneric, TypeVar
from mashumaro import DataClassDictMixin
T = TypeVar('T')
@dataclassclassGenericDataClass(Generic[T], DataClassDictMixin):
x: T
@dataclassclassDataClass(DataClassDictMixin):
date: GenericDataClass[date]
str: GenericDataClass[str]
instance = DataClass(
date=GenericDataClass(x=date(2021, 1, 1)),
str=GenericDataClass(x='2021-01-01'),
)
dictionary = {'date': {'x': '2021-01-01'}, 'str': {'x': '2021-01-01'}}
assert DataClass.from_dict(dictionary) == instance
GenericSerializableType interface
There is a generic alternative to SerializableType
called GenericSerializableType. It makes it possible to decide yourself how
to serialize and deserialize input data depending on the types provided:
As you can see, the code turns out to be massive compared to the
alternative but in rare cases such flexibility
can be useful. You should think twice about whether it's really worth using it.
Serialization hooks
In some cases you need to prepare input / output data or do some extraordinary
actions at different stages of the deserialization / serialization lifecycle.
You can do this with different types of hooks.
Before deserialization
For doing something with a dictionary that will be passed to deserialization
you can use __pre_deserialize__ class method:
@dataclassclassA(DataClassJSONMixin):
abc: int @classmethoddef__pre_deserialize__(cls, d: Dict[Any, Any]) -> Dict[Any, Any]:
return {k.lower(): v for k, v in d.items()}
print(DataClass.from_dict({"ABC": 123})) # DataClass(abc=123)print(DataClass.from_json('{"ABC": 123}')) # DataClass(abc=123)
After deserialization
For doing something with a dataclass instance that was created as a result
of deserialization you can use __post_deserialize__ class method:
For simple one-time cases it's recommended to start from using a configurable
build_json_schema function. It returns JSONSchema object that can be
serialized to json or to dict:
from dataclasses import dataclass, field
from typing importListfrom uuid import UUID
from mashumaro.jsonschema import build_json_schema
@dataclassclassUser:
id: UUID
name: str = field(metadata={"description": "User name"})
print(build_json_schema(List[User]).to_json())
All dataclass JSON Schemas can or can not be placed in the
definitions
section, depending on the all_refs parameter, which default value comes
from a dialect used (False for Draft 2022-12, True for OpenAPI
Specification 3.1.0):
The omitted definitions could be found later in the Context object that
you could have created and passed to the function, but it could be easier
to use JSONSchemaBuilder for that. For example, you might found it handy
to build OpenAPI Specification step by step passing your models to the builder
and get all the registered definitions later. This builder has reasonable
defaults but can be customized if necessary.
Apart from required keywords, that are added automatically for certain data
types, you're free to use additional validation keywords.
They're presented by the corresponding classes in
mashumaro.jsonschema.annotations:
You can also change the "additionalProperties" key to a specific schema
by passing it a JSONSchema instance instead of a bool value.
JSON Schema and custom serialization methods
Mashumaro provides different ways to override default serialization methods for
dataclass fields or specific data types. In order for these overrides to be
reflected in the schema, you need to make sure that the methods have
annotations of the return value type.
from dataclasses import dataclass, field
from mashumaro.config import BaseConfig
from mashumaro.jsonschema import build_json_schema
defstr_as_list(s: str) -> list[str]:
returnlist(s)
defint_as_str(i: int) -> str:
returnstr(i)
@dataclassclassFooBar:
foo: str = field(metadata={"serialize": str_as_list})
bar: intclassConfig(BaseConfig):
serialization_strategy = {
int: {
"serialize": int_as_str
}
}
print(build_json_schema(FooBar).to_json())
We found that mashumaro demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.