
Security News
Nx npm Packages Compromised in Supply Chain Attack Weaponizing AI CLI Tools
Malicious Nx npm versions stole secrets and wallet info using AI CLI tools; Socket’s AI scanner detected the supply chain attack and flagged the malware.
= kfold
kfold creates K-fold splits from data files and assists in training and testing (useful for cross-validation in supervised machine learning)
== Command overview
help Display global or [command] help documentation.
split Split a data file into K partitions
test Apply trained models on a dataset previously split using kfold
train Train models on a dataset previously split using kfold
== Example usage
10-fold cross-validation of the standard MaltParser on a treebank named shuffled.c32.conll may be done as follows:
kfold split -f -i shuffled.c32.conll --fold -d '\n\n' kfold train -f --base shuffled.c32.conll -- java -jar ~/Tools/malt-1.4.1/malt.jar -c %B.model_%N -i %T -m learn kfold test -f --base shuffled.c32.conll -- java -jar ~/Tools/malt-1.4.1/malt.jar -c %B.model_%N -i %T -o %O -m parse eval07.pl -q -g shuffled.c32.conll -s shuffled.c32.conll.output
The MaltParser does not like to put its models in a subdirectory, so rather than using the standard model files suggested by kfold (%M), we construct custom non-nested model filenames using %B.model_%N.
== Command details
The following is simply the output of the built-in help commands.
=== Splitting data files
NAME:
split
DESCRIPTION:
Given the data file INPUT, the partitions are written to files named INPUT.parts/{01..K}
SYNOPSIS:
kfold split -i INPUT [options]
EXAMPLES:
# Split the file sample.txt into 4 parts
kfold split -k4 sample.txt
# Split the double-newline-delimited file sample.conll into 10 parts
kfold split -d"\n\n" sample.conll
OPTIONS:
-i, --input FILE
Data file to split
-k, --parts N
The number of partitions desired
-d, --delimiter DELIM
String used to separate individual entries (newline per default)
-g, --granularity N
Ensure the number of entries in each partition is divisible by N (useful for block-structured data)
-f, --overwrite
Remove existing parts prior to executing
--fold
Additionally, create K folds of K-1 parts in a another folder
--parts-name STRING
Use the given name as suffix for the partitions folder created
--folds-name STRING
Use the given name as suffix for the folds folder created
=== Training on the folds
NAME:
train
DESCRIPTION:
Given training data previously split in K parts and folds, train K models on the K folds
Certain keywords in the training command and its arguments are interpolated at runtime:
* %N - fold number, e.g. '01'
* %F - fold filename, e.g. 'brown.train/01'
* %I - alias for %F
* %M - model filename, e.g. 'brown.models/01'
* %B - basename (as specified on the command line), e.g. 'brown'
SYNOPSIS:
kfold train --base NAME [options] -- CMD [--CMD-OPTIONS] [CMD-ARGS]
EXAMPLES:
# Train MaltParser for cross-validation
kfold train -f --base shuffled.c32.conll -- java -jar ~/Tools/malt-1.4.1/malt.jar -c %B.model_%N -i %T -m learn
OPTIONS:
-f, --overwrite
Remove existing models prior to executing
--base NAME
Default prefix of training folds and model files
--folds-name SUFFIX
Look for folds {01..K} in the folder BASE.SUFFIX
--models-name SUFFIX
Yield model names as BASE.SUFFIX/{01..K} as interpolation pattern %M
=== Testing the models on their reciprocal data file parts
NAME:
test
DESCRIPTION:
Process K parts of a split datafile using K previously trained models.
Certain keywords in the testing command and its arguments are interpolated at runtime:
* %N - part number, e.g. '01'
* %T - part filename, e.g. 'brown.test/01'
* %I - alias for %T
* %O - output filename, e.g. 'brown.outputs/01'
* %M - model filename, e.g. 'brown.models/01'
* %B - basename (as specified on the command line), e.g. 'brown'
SYNOPSIS:
kfold test --base NAME [options] -- CMD [--CMD-OPTIONS] [CMD-ARGS]
EXAMPLES:
# Apply trained MaltParser models for cross-validation
kfold test -f --base shuffled.c32.conll -- java -jar ~/Tools/malt-1.4.1/malt.jar -c %B.model_%N -i %T -o %O -m parse
OPTIONS:
-f, --overwrite
Remove existing test output prior to executing
--base NAME
Default prefix of model files and test outputs
--parts-name SUFFIX
Look for parts {01..K} to be processed in the folder BASE.SUFFIX
--models-name SUFFIX
Yield model names as BASE.SUFFIX/{01..K} as interpolation pattern %M
--outputs-name SUFFIX
Yield output filenames as BASE.SUFFIX/{01..K} as interpolation pattern %O
--output-name SUFFIX
Put the concatenated output of all models in BASE.SUFFIX
FAQs
Unknown package
We found that kfold demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Malicious Nx npm versions stole secrets and wallet info using AI CLI tools; Socket’s AI scanner detected the supply chain attack and flagged the malware.
Security News
CISA’s 2025 draft SBOM guidance adds new fields like hashes, licenses, and tool metadata to make software inventories more actionable.
Security News
A clarification on our recent research investigating 60 malicious Ruby gems.