
Security News
Vite Releases Technical Preview of Rolldown-Vite, a Rust-Based Bundler
Vite releases Rolldown-Vite, a Rust-based bundler preview offering faster builds and lower memory usage as a drop-in replacement for Vite.
databricks-filesystem
Advanced tools
Databricks Utils does not support few crucial file system operations like recursive directory listing, pattern-matching for files, listing only directories or files, and more. This package provides seamless execution of these tasks.
Databricks stands out as a leading service for big data processing, yet it lacks support for several essential file system operations commonly needed by developers. Consequently, developers often have to craft custom solutions to fill this gap. The missing operations include:
Fortunately, with the availability of this package, you can effortlessly execute these operations.
pip install databricks-filesystem
from databricks_filesystem import DatabricksFilesystem
adb_fs = DatabricksFilesystem(dbutils=dbutils)
filesystem_list
function of the package to recursively list files and directories. Below are examples demonstrating its compatibility with DBFS and various external storage systems such as Azure Data Lake Storage (ADLS), Azure Blob Storage, AWS S3, Google Storage, and more.# List DBFS directory
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/")
# List Azure Data Lake Storage directory (ADLS)
adb_fs.filesystem_list(filesystem_path="abfss://<container>@<storage-account>.dfs.core.windows.net/<directory>/")
# List AWS S3
adb_fs.filesystem_list(filesystem_path="s3a://<aws-bucket-name>/<path>")
# List Google Storage
adb_fs.filesystem_list(filesystem_path="gs://<bucket-name>/<path>")
filesystem_list(self, filesystem_path: str, recursive_flag: bool = True, list_directories: bool = True, list_files: bool = True, files_starts_with: Union[str, List[str]] = None, files_ends_with: Union[str, List[str]] = None, skip_files_starts_with: Union[str, List[str]] = None, skip_files_ends_with: Union[str, List[str]] = None, case_sensitive_comparison: bool = True, sorted_output: bool = True) -> list
Below are the parameters accepted by the filesystem_list
function:
filesystem_path (str - Mandatory): Specify the file system path for listing.
recursive_flag (bool - Optional (Default: True)): When set to True, this flag enables recursive listing of the file system path, including all subdirectories.
list_directories (bool - Optional (Default: True)): When set to True, this determines whether directories will be included in the output. If enabled, directories will be listed in the output.
list_files (bool - Optional (Default: True)): When set to True, this determines whether files will be included in the output. If enabled, files will be listed in the output.
files_starts_with (str or List[str] - Optional (Default: None)): The provided pattern or list of patterns dictates that only files starting with it will be listed in the output. This parameter operates exclusively when the "list_files" parameter is set to True, ensuring selective listing based on the specified pattern or list of patterns.
files_ends_with (str or List[str] - Optional (Default: None)): The provided pattern or list of patterns dictates that only files ending with it will be listed in the output. This parameter operates exclusively when the "list_files" parameter is set to True, ensuring selective listing based on the specified pattern or list of patterns.
skip_files_starts_with (str or List[str] - Optional (Default: None)): The provided pattern or list of patterns dictates that files starting with it will be excluded in the output. This parameter operates exclusively when the "list_files" parameter is set to True, ensuring selective listing based on the specified pattern or list of patterns.
skip_files_ends_with (str or List[str] - Optional (Default: None)): The provided pattern or list of patterns dictates that files ending with it will be excluded in the output. This parameter operates exclusively when the "list_files" parameter is set to True, ensuring selective listing based on the specified pattern or list of patterns.
case_sensitive_comparison (bool - Optional (Default: True)): When set to True, this parameter determines whether case-sensitive comparison will be applied for file pattern matching. It only functions when the "list_files" parameter is True and values are provided for "files_starts_with", "files_ends_with", "skip_files_starts_with", or "skip_files_ends_with".
sorted_output (bool - Optional (Default: True)): When set to True, this parameter determines whether the output will be sorted. If enabled, the output will be returned in a sorted manner, facilitating easier navigation and analysis of the results.
The function returns the list of file paths and directory paths.
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/")
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", recursive_flag=False)
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", list_directories=False)
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", list_files=False)
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", list_directories=False, files_ends_with=".csv")
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", list_directories=False, files_ends_with=[".csv", ".parquet", ".json"])
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", list_directories=False, files_starts_with="test")
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", list_directories=False, files_starts_with=["test", "temp"])
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", list_directories=False, files_starts_with="part", files_ends_with=".parquet")
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", list_directories=False, files_starts_with=["part", "test"], files_ends_with=[".parquet", ".json"])
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", list_directories=False, skip_files_ends_with=[".json", ".parquet"])
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", list_directories=False, skip_files_starts_with=["test", "temp"], skip_files_ends_with=[".crc"])
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", list_directories=False, files_starts_with="part", files_ends_with=".parquet", case_sensitive_comparison=False)
adb_fs.filesystem_list(filesystem_path="dbfs:/FileStore/", sorted_output=False)
You can get more information about this package here
FAQs
Databricks Utils does not support few crucial file system operations like recursive directory listing, pattern-matching for files, listing only directories or files, and more. This package provides seamless execution of these tasks.
We found that databricks-filesystem demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Vite releases Rolldown-Vite, a Rust-based bundler preview offering faster builds and lower memory usage as a drop-in replacement for Vite.
Research
Security News
A malicious npm typosquat uses remote commands to silently delete entire project directories after a single mistyped install.
Research
Security News
Malicious PyPI package semantic-types steals Solana private keys via transitive dependency installs using monkey patching and blockchain exfiltration.