
Security News
ECMAScript 2025 Finalized with Iterator Helpers, Set Methods, RegExp.escape, and More
ECMAScript 2025 introduces Iterator Helpers, Set methods, JSON modules, and more in its latest spec update approved by Ecma in June 2025.
Transform is the main building block of data pipelines in fastai. And elsewhere if you want.
Install latest from the GitHub repository:
$ pip install git+https://github.com/AnswerDotAI/fasttransform.git
or from pypi:
$ pip install fasttransform
Transform is a class that lets you create reusable data transformations.
You initialize a Transform by passing in or decorating a raw function.
The Transform then provides an enhanced version of that function via
Transform.encodes
, which can be used in your data pipeline.
It provides various conveniences:
The simplest way to create a Transform is by decorating a function:
from fasttransform import Transform, Pipeline
@Transform
def add_one(x):
return x + 1
# Usage
add_one(2)
3
To make a transform reversible, you provide the raw function and its inverse. This is useful in data pipelines where, for instance, you might want to normalize and then de-normalize numerical values, or encode to category indexes and then decode back to categories.
def enc(x): return x*2
def dec(x): return x//2
t = Transform(enc,dec)
t(2), t.decode(2), t.decode(t(2))
(4, 1, 2)
You can customize an individual Transform instance at initialization time, so that it can depend on aggregate properties of the data set.
Here we define a z-score normalization Transform by defining encodes
and decodes
methods directly:
import statistics
class NormalizeMean(Transform):
def setups(self, items):
self.mean = statistics.mean(items)
self.std = statistics.stdev(items)
def encodes(self, x):
return (x - self.mean) / self.std
def decodes(self, x):
return x * self.std + self.mean
normalize = NormalizeMean()
normalize.setup([1, 2, 3, 4, 5])
normalize.mean
3
Instead of providing one raw functions, you can provide multiple raw functions which differ in their parameter types. Tranform will use type-based dispatch to automatically execute the correct function.
This is handy when your inputs come in different types (eg., different image formats, different numerical types).
def inc1(x:int): return x+1
def inc2(x:str): return x+"a"
t = Transform(enc=(inc1,inc2))
t(5), t('b')
(6, 'ba')
If an input type does not match any of the type annotations then the original input is returned.
add_one(2.0)
3.0
normalize(3.0)
0.0
You initialize a Transform by passing in or decorating a raw function.
A Transform encodes
or decodes
will note the return type of its raw
function, which may be defined explicitly or implicitly, and enhance
type-handling behavior in three ways:
Guaranteed return type. It will always return the return type of the raw function, promoting values if necessary.
Type Preservation. It will return the runtime type of its argument, whenever that is a subtype of the return type.
Opt-out conversion. If you explicitly mark the raw function’s
return type as None
, then it will not perform any type conversion
or preservation.
Examples help make this clear:
Say you define FS
, a subclass of float
. The usual Python type
promotion behavior means that an FS
times a float
is still a
float
:
class FS(float):
def __repr__(self): return f'FS({float(self)})'
f1 = float(1)
FS2 = FS(2)
val = f1 * FS2
type(val) # => float
float
With Transform, you can define a new multiplication operation which will
be guaranteed to return a FS
, because Transform reads the required raw
function’s annotated return type:
def double_FS(x)->FS: return FS(2)*x
t = Transform(double_FS)
val = t(1)
assert isinstance(val,FS)
val
FS(2.0)
Let us say that we define a transform without any return type annotation, so that the raw function is defined only by the behavior of multiplying its argument by the float 2.0.
Multiplying the subtype FS
with the float value 2 would normally
return a float
. However, Transform’s encodes
will preserve the
runtime type of its argument, so that it returns FS
:
def double(x): return x*2.0 # no type annotation
t = Transform(double)
fs1 = FS(1)
val = t(fs1)
assert isinstance(val,FS)
val # => FS(2), an FS value of 2
FS(2.0)
Sometimes you don’t want Transform to do any type-based logic. You can
opt-out of this system by declaring that your raw function’s return type
is None
:
def double_none(x) -> None: return x*2.0 # "None" returnt type means "no conversion"
t = Transform(double_none)
fs1 = FS(1)
val = t(fs1)
assert isinstance(val,float)
val # => 2.0, a float of 2, because of fallback to standard Python type logic
2.0
Transforms can be combined into larger Pipelines:
def double(x): return x*2.0
def halve(x): return x/2.0
dt = Transform(double,halve)
class NormalizeMean(Transform):
def setups(self, items):
self.mean = statistics.mean(items)
self.std = statistics.stdev(items)
def encodes(self, x):
return (x - self.mean) / self.std
def decodes(self, x):
return x * self.std + self.mean
p = Pipeline((dt, normalize))
v = p(5)
v
4.427188724235731
p.decode(v)
5.0
This was just a quickstart. Learn more by reading the documentation.
FAQs
Transform is the main building block of data pipelines in fastai. And elsewhere if you want.
We found that fasttransform demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
ECMAScript 2025 introduces Iterator Helpers, Set methods, JSON modules, and more in its latest spec update approved by Ecma in June 2025.
Security News
A new Node.js homepage button linking to paid support for EOL versions has sparked a heated discussion among contributors and the wider community.
Research
North Korean threat actors linked to the Contagious Interview campaign return with 35 new malicious npm packages using a stealthy multi-stage malware loader.