Corpus Replicator
Corpus Replicator is a corpus generation tool that enables the creation of multiple
unique output files based on templates. The primary intended use case is the
creation of a seed corpus that can be used by fuzzers. Support for additional output
formats can be added via the creation of Recipes
. If a desired format is unsupported,
support can be added via the creation of a CorpusGenerator
.
The goal is to create an efficient corpus that maximizes code coverage and minimizes
file size. Small unique files that execute quickly are preferred.
Currently four media types can be generated animation
, audio
, image
and
video
.
Requirements
Corpus Replicator relies on FFmpeg.
Installation
pip install corpus-replicator
Example
This is an example recipe
file.
base:
codec: "h264"
container: "mp4"
library: "libx264"
medium: "video"
tool: "ffmpeg"
default_flags:
encoder:
["-c:v", "libx264"]
resolution:
["-s", "320x240"]
variation:
resolution:
- ["-s", "640x480"]
- ["-s", "32x18"]
- ["-s", "64x64"]
monochrome:
- ["-vf", "hue=s=0"]
Running the recipe will generate a corpus:
$ corpus-replicator example.yml video -t test
Generating templates...
1 recipe(s) will be used with 1 template(s) to create 4 file(s).
Generating 4 'video/libx264/h264/mp4' file(s) using template 'test'...
Optimizing corpus, checking for duplicates...
Done.
Resulting corpus:
$ ls generated-corpus/
video-h264-libx264-test-monochrome-00.mp4
video-h264-libx264-test-resolution-01.mp4
video-h264-libx264-test-resolution-00.mp4
video-h264-libx264-test-resolution-02.mp4
A more complex corpus can be generated by using multiple Recipes
and Templates
at
once.
Recipes are stored in src/corpus_replicator/recipes.