
Security News
Crates.io Implements Trusted Publishing Support
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
resource-backed-dask-array
Advanced tools
experimental Dask array that opens/closes a resource when computing
ResourceBackedDaskArray
is an experimental Dask array subclass
that opens/closes a resource when computing (but only once per compute call).
pip install resource-backed-dask-array
Consider the following class that simulates a file reader capable of returning a dask array (using dask.array.map_blocks) The file handle must be in an open state in order to read a chunk, otherwise it segfaults (or otherwise errors)
import dask.array as da
import numpy as np
class FileReader:
def __init__(self):
self._closed = False
def close(self):
"""close the imaginary file"""
self._closed = True
@property
def closed(self):
return self._closed
def __enter__(self):
if self.closed:
self._closed = False # open
return self
def __exit__(self, *_):
self.close()
def to_dask(self) -> da.Array:
"""Method that returns a dask array for this file."""
return da.map_blocks(
self._dask_block,
chunks=((1,) * 4, 4, 4),
dtype=float,
)
def _dask_block(self):
"""simulate getting a single chunk of the file."""
if self.closed:
raise RuntimeError("Segfault!")
return np.random.rand(1, 4, 4)
As long as the file stays open, everything works fine:
>>> fr = FileReader()
>>> dsk_ary = fr.to_dask()
>>> dsk_ary.compute().shape
(4, 4, 4)
However, if one closes the file, the dask array returned
from to_dask
will now fail:
>>> fr.close()
>>> dsk_ary.compute() # RuntimeError: Segfault!
A "quick-and-dirty" solution here might be to force the _dask_block
method to
temporarily reopen the file if it finds the file in the closed state, but if the
file-open process takes any amount of time, this could incur significant
overhead as it opens-and-closes for every chunk in the array.
ResourceBackedDaskArray.from_array
This library attempts to provide a solution to the above problem with a
ResourceBackedDaskArray
object. This manages the opening/closing of
an underlying resource whenever .compute()
is called – and does so only once for all chunks in a single compute task graph.
>>> from resource_backed_dask_array import resource_backed_dask_array
>>> safe_dsk_ary = resource_backed_dask_array(dsk_ary, fr)
>>> safe_dsk_ary.compute().shape
(4, 4, 4)
>>> fr.closed # leave it as we found it
True
The second argument passed to from_array
must be a resuable context manager
that additionally provides a closed
attribute (like io.IOBase). In other words,
it must implement the following protocol:
__enter__
method that opens the underlying resource__exit__
method that closes the resource and optionally handles exceptionsclosed
attribute that reports whether or not the resource is closed.In the example above, the FileReader
class itself implemented this protocol, and so was suitable as the second argument to ResourceBackedDaskArray.from_array
above.
This was created for single-process (and maybe just single-threaded?)
use cases where dask's out-of-core lazy loading is still very desireable. Usage
with dask.distributed
is untested and may very well fail. Using stateful objects (such as the reusable context manager used here) in multi-threaded/processed tasks is error prone.
FAQs
experimental Dask array that opens/closes a resource when computing
We found that resource-backed-dask-array demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
Research
/Security News
Undocumented protestware found in 28 npm packages disrupts UI for Russian-language users visiting Russian and Belarusian domains.
Research
/Security News
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.