
Security News
Potemkin Understanding in LLMs: New Study Reveals Flaws in AI Benchmarks
New research reveals that LLMs often fake understanding, passing benchmarks but failing to apply concepts or stay internally consistent.
github.com/caltechlibrary/datatools
datatools is a rich collection of command line programs targetting data conversion, cleanup and analysis directly from your favorite POSIX shell. It has proven useful for data collaberations where individual members of a project may prefer different toolsets in their analysis (e.g. Julia, R, Python) but want to work from a common baseline. It also has been used intensively for internal reporting from various Caltech Library metadata sources.
The tools fall into three broad categories
See user manual for a complete list of the command line programs. The data transformation tools include support for formats such as Excel XML, csv, tab delimited files, json, yaml and toml.
Compiled versions of the datatools collection are provided for Linux (amd64), Mac OS X (amd64), Windows 10 (amd64) and Raspbian (ARM7). See https://github.com/caltechlibrary/datatools/releases.
Use "-help" option for a full list of options for each utility (e.g. csv2json -help
).
The tooling around transformation includes data conversion. These include tools that work with CSV, tab delimited, JSON, TOML, YAML and Excel XML.
There is also tooling to change data shapes using JSON as the intermediate data format.
Various utilities for simplifying work on the command line.
datatools provides the string command for working with
text strings (limited to memory available). This is commonly needed when
cleanup data for analysis. The string command was created for when the
old Unix standbys- grep, awk, sed, tr are unwieldly or inconvient.
string provides operations are common in most language like, trimming,
spliting, and transforming letter case. The string command also makes
it easy to join JSON string arrays into single a string using a delimiter
or split a string into a JSON array based on a delimiter. The form of the
command is string [OPTIONS] [ACTION] [ARCTION_PARAMETERS...]
string toupper "one two three"
Would yield "ONE TWO THREE".
Some of the features included
See string for full details
See INSTALL.md for details for installing pre-compiled versions of the programs.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
New research reveals that LLMs often fake understanding, passing benchmarks but failing to apply concepts or stay internally consistent.
Security News
Django has updated its security policies to reject AI-generated vulnerability reports that include fabricated or unverifiable content.
Security News
ECMAScript 2025 introduces Iterator Helpers, Set methods, JSON modules, and more in its latest spec update approved by Ecma in June 2025.