
Research
SANDWORM_MODE: Shai-Hulud-Style npm Worm Hijacks CI Workflows and Poisons AI Toolchains
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.
fsql
Advanced tools
The fsql package's goal is to simplify the task of getting data from any file system, local or remote, possibly divided into multiple partitions, into a single data frame.
The package has querying capabilities, thus the name stands for "file system query language".
The core is installed just via pip install fsql. Additional filesystem support or output representation is installed via pip install fsql[s3] or pip install fsql[dask].
For examples of usage (and sort-of documentation), we use selected test files accompanied with explanatory comments:
The canonical usecase is that you have data on S3 stored e.g. as <table_name>/year=<yyyy>/month=<mm>/day=<dd>/<filename>.csv, and you want to fetch a part of it (e.g., a week from the last month, every Monday last year, ...) as a single Pandas or Dask DataFrame, via a short command -- without having to write the boto3 crawl, the bytes2csv, the csv2pandas, etc.
The crawl/query part is traditionally covered by metastores, such as Hive Metastore or Glue Data Catalog.
Why, then, would you use fsql?
fsql is faster and has no operation/maintenance costs.fsql is simpler.<columnName>=<value> is required, yet we often encounter just <value> with the column name provided externally.fsql is more efficient than manually fetching all partitions from metastore and evaluating locally.S3 and GDrive), and you don't have a unifying layer -- fsql changes between those just by changing the URL prefix. This can be particularly handy in some integration tests, if don't want to query real S3 but keep the same code (and point it at local filesystem or Minio instead... we find it less hassle and more value than with mocking, which fits more to unit tests).However, if you have your metastore and are happy with it, there is no reason not to use it.
There are some advantages which fsql will likely never cover:
fsql is not backed by any persistence to hold such data.There is also some overlap with pandas.io.sql -- that one, however, focuses solely on pandas whereas fsql can adapt to any data processing tool which allows partition-based input specification (e.g., Spark).
On the other hand, pandas.io.sql has good integration with sql-alchemy and traditional database queries, whereas fsql is focused on partitioned file systems only.
This package is based on fsspec -- anything supported by that can be plugged in.
At the moment, we have test coverage only for local filesystem and s3.
Adding a new one requires mostly ensuring that authentication and URL parsing will work correctly, and taking care of some weird cornercases such as caching in s3fs.
The supported output representations are at the moment Pandas, Dask and list[dict].
Adding a new one requires implementing a conversion from a Iterable[(Path, FileSystem)] to the desired object.
The query language is rather simplistic, so no proper parser & grammar & query optimiser is used at the moment.
FAQs
Metastore-like capabilities for various filesystems
We found that fsql demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
An emerging npm supply chain attack that infects repos, steals CI secrets, and targets developer AI toolchains for further compromise.

Company News
Socket is proud to join the OpenJS Foundation as a Silver Member, deepening our commitment to the long-term health and security of the JavaScript ecosystem.

Security News
npm now links to Socket's security analysis on every package page. Here's what you'll find when you click through.