Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
The FSDB "flat-file streaming database" is a structured data file that includes column names, formatting specifications (e.g. tab vs space vs comma), and a command history that generated each file. PyFSDB is a a python implementation of the original functionality that was implemented in perl. Both the perl and python version come with a long list of command line tools that can be used to quickly process datasets using traditional unix pipeline processing. There is also a C implementation and a Go implementation (ref needed) of FSDB.
Getting started documentation is below, but also see the full documentation over on readthedocs.
Using pip (or pipx):
pip3 install pyfsdb
Or manually:
git clone git@github.com:gawseed/pyfsdb.git
cd pyfsdb
pip install hatch
hatch build
pip install dist/pyfsdb-*.whl
The FSDB file format contains headers and footers that supplement the data within a file. The most common separator is tab-separated, but can wrap CSVs and other datatypes (see the FSDB documentation for full details). The file also contains footers that trace all the piped commands that were used to create a file, thus documenting the history of its creation within the metadata in the file.
Reading in row by row:
import pyfsdb
db = pyfsdb.Fsdb("myfile.fsdb")
print(db.column_names)
for row in db:
print(row)
#fsdb -F t col1 two andthree
1 key1 42.0
2 key2 123.0
import pyfsdb
db = pyfsdb.Fsdb(out_file="myfile.fsdb")
db.out_column_names=('one', 'two')
db.append([4, 'hello world'])
db.close()
Read below for further usage details.
pip3 install pyfsdb
The real power of the FSDB comes from the build up of tool-suites that all interchange FSDB formatted files. This allows chaining multiple commands together to achieve a goal. Though the original base set of tools are in perl, you don't need to know perl for most of them.
import sys, pyfsdb
db = pyfsdb.Fsdb(file_handle=sys.stdin, out_file_handle=sys.stdout)
value_column = db.get_column_number('value')
for row in db: # reads a row from the input stream
row[value_column] = float(row[value_column]) * 2
db.append(row) # sends the row to the output stream
db.close()
And then feed it this file:
#fsdb -F t col1 value
1 42.0
2 123.0
We can run it thus'ly:
# cat test.fsdb | ./mydemo.py
#fsdb -F t col1 value
1 84.0
2 246.0
# | ./mydemo.py
Or chain it together with multiple FSDB commands:
# cat test.fsdb | ./mydemo | dbcolstats value | dbcol mean stddev sum min max | dbfilealter -R C
#fsdb -R C mean stddev sum min max
mean: 165
stddev: 114.55
sum: 330
min: 84
max: 246
# | ./mydemo.py
# | dbcolstats value
# | dbcol mean stddev sum min max
# | dbfilealter -R C
All the command line utilities that come with pyfsdb
start with p
by convention so as not to conflict with the utilities from perl
package. The leading p
also serves to distinguish the CLI argument
differences as well (e.g. the python versions allow file names to be
specified on the command line, and most keys must be passed with a
-k
flag).
Wes Hardaker @ USC/ISI
The FSDB website and manual page for the original perl module:
FAQs
A python implementation of the flat-file streaming database
We found that pyfsdb demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.