pathtree
Named path hierarchies
The goal of this package is to make it easier to both define and understand folder structures in your scripts and pipelines. I find it difficult to visualize file structure when looking at a series of os.path.join
commands scattered throughout a script, so instead, you define all of the paths up top, and fill in the placeholders later.
Install
pip install path-tree
Tutorial
Quick start
NOTE: I want to update this to be more illustrative and less verbose. The problem is that this package can do a lot, but it's hard to show it concisely out of context. I'm not sure when I'll get a chance, but I'll try to get to it!
Also, I apologize for not having an API Reference. I haven't quite ironed that into my deployment pipeline yet!
import pathtree
paths = pathtree.tree('./logs', {
'{log_id}': {
'model.h5': 'model',
'model_spec.pkl': 'model_spec',
'plots': {
'epoch_{i_epoch:04d}': {
'{plot_name}.png': 'plot',
'': 'plot_dir'
}
}
}
})
...
paths.update(log_id=12345)
paths.update(log_id=12346)
...
plt.imwrite(paths.plot.format(i_epoch=5, plot_name='f1_score'))
The old way - makes my brain sad
Usually, you end up defining paths doing something like this (unless I'm doing something weird/dumb - lmk !!). And often,
these end up scattered throughout your project and getting a high level
picture of your path hierarchy is difficult.
base_log_dir = './blah/logs'
run_dir = os.path.join(base_log_dir, log_id)
resources_dir = os.path.join(run_dir, 'resources')
...
model_file = os.path.join(run_dir, 'model.h5')
model_spec = os.path.join(run_dir, 'model_spec.pkl')
...
pump_file = os.path.join(resources_dir, 'pump.pkl')
...
plot_dir = os.path.join(run_dir, 'plots', 'epoch_{i_epoch:04d}')
plot_file = os.path.join(plot_dir, '{plot_name}.png')
The new way ! *brain smiles*
Instead, you can define your path hierarchy all in one place and give each tree node a name.
import os
import pathtree
base_paths = pathtree.tree('./logs', {
'{log_id}': {
'model.h5': 'model',
'model_spec.pkl': 'model_spec',
'plots': {
'epoch_{i_epoch:04d}': {
'{plot_name}.png': 'plot',
'': 'plot_dir'
}
}
}
})
paths = base_paths.specify(log_id=12345)
Basic Concepts
Paths
- a collection of paths items as defined using pathtree.tree
.
Essentially, it is a wrapper around a flat dictionary of name -> path
and a data dictionary that is provided to all paths format.
Path
- a single path. It extends os.PathLike
so it can be used where
path objects are expected (e.g. open(path).read()
). It is a wrapper around a pathlib.Path
object and a data dictionary.
It provides basic path operations (join('subdir')
, .up(2)
to go up 2 parent directories, .glob()
glob replacing missing fields with '*'
).
Conversion to string
You can access the paths using the name defined in the tree:
assert str(paths.model_spec) == './logs/12345/model_spec.pkl'
A Paths
object (as defined above) is really just a dictionary of name => pathtree.Path
objects. This is a wrapper around string format pattern and a data dictionary.
To convert a Path
to a string, there are a few ways to get what you want.
For fully specified strings (meaning that str.format will run without a KeyError), all three of these methods return the same thing: a fully formatted string.
assert paths.model.format() == './logs/12345/model.h5'
assert paths.model.partial_format() == './logs/12345/model.h5'
assert paths.model.maybe_format() == './logs/12345/model.h5'
For an underspecified string (missing data keys), the return values are different:
try:
paths.plot.format(plot_name='f1_score')
assert False
except KeyError:
assert True
assert (paths.plot.partial_format(plot_name='f1_score') ==
'./logs/12345/plots/epoch_{i_epoch:04d}/f1_score.png')
assert isinstance(
paths.plot.maybe_format(plot_name='f1_score'), pathtree.Path)
Paths are castable to string which is the same as partial_format
. You can also access the unformatted path using Path().path
.
assert str(paths.plot) == paths.plot.partial_format()
assert paths.plot.path == the_unformatted_plot_path
pathtree.Path
subclasses os.PathLike
, meaning that os.path functions know how to convert it to a path automatically.
assert isinstance(paths.model, os.PathLike)
assert os.path.join(paths.model) == paths.model.format()
assert os.path.isfile(paths.model)
Updating format data
You can add path specificity at various points along the way. This is helpful when you need to reference subdirectories based on some loops or similar pattern. You can do that in a couple ways:
paths.update(log_id=12345)
paths2 = paths.specify(log_id=23456)
assert paths.data['log_id'] == 12345 and paths2.data['log_id'] = 23456
paths2 = paths2.unspecify('log_id')
assert 'log_id' not in paths2.data
plot_file = paths.plot
plot_file.update(plot_name='f1_score')
plot_file = paths.plot.specify(plot_name='f1_score')
assert 'plot_name' not in paths.data
assert plot_file.data['plot_name'] == 'f1_score'
plot_file = plot_file.unspecify('plot_name')
assert 'plot_name' not in plot_file.data
Additional Features
You can automatically do glob searching. Any missing fields will be filled with a glob wildcard (asterisk). Note that this would fail using plain string format because the leading zero formatter (:04d
) will throw an error if you try to insert '*'
(because it's a string).
plot_file = paths.plot.specify(plot_name='f1_score')
assert (plot_file.partial_format() ==
'./logs/12345/plots/epoch_{i_epoch:04d}/f1_score.png')
assert (plot_file.glob_pattern ==
'./logs/12345/plots/epoch_*/f1_score.png')
assert (plot_file.glob() ==
glob.glob('./logs/12345/plots/epoch_*/f1_score.png'))
You can also sometimes parse out data from a formatted string. Be warned,
it may not always work correctly because sometimes the parsing is ambiguous. See https://github.com/r1chardj0n3s/parse#potential-gotchas
plot_file = paths.plot.specify(root='some/logs')
assert (plot_file.partial_format() ==
'some/logs/12345/plots/epoch_{i_epoch:04d}/{plot_name}.png')
expected = {
'root': 'some/logs'
'log_id': '12345',
'i_epoch': '0002',
'plot_name': 'f1_score.png',
}
plot_data = plot_file.parse('./logs/12345/plots/0002/f1_score.png')
assert set(plot_data.items()) == set(expected.items())