
Security News
New Website “Is It Really FOSS?” Tracks Transparency in Open Source Distribution Models
A new site reviews software projects to reveal if they’re truly FOSS, making complex licensing and distribution models easy to understand.
This gem provides functionality for format identification and conversion. Converters may rely on the presence of 3rd party software, so read the class documentation carefully for the converters you want to use.
Add this line to your application's Gemfile:
gem 'libis-format'
And then execute:
$ bundle
Or install it yourself as:
$ gem install libis-format
The format database is the core of the format services. It stores information about all the known formats.
This class enables support for format identification. It will use 3 external tools to identify a file format: droid, fido and the unix file tool. Based on several criteria, the results of these tools will be given a score and the result with the highest confidence is returned. The output of all tools can also be returned. The format identification result contains a MIME-type and a PRONOM PUID if they are known. Also the calculated score, the tool that produced the outcome and recognition method are listed. Some tools also return more information like the format name and format version. If these are present on the most confident output, their values will also be part of the result info.
Unidentified files will be listed with MIME-type 'application/octet-stream' and PUID 'fmt/unknown'.
The main method #get that performs the format identification takes two parameters:
The options has accespts the following keys (Symbol types):
The result of the identification is a Hash with the following keys:
The identification outcome is a Hash of key-value pairs for each property returned. Keys are lower-case symbols. If the tools gave results that deffer in a significant part, their outcome will be added to a list as :alternatives (see example).
None of the tools used is able to identify an XML file by its content. In our context this is especially annoying for EAD XML files as we want to identify them in order to process them different from the other XML files. The Identifier solves this by performing an extra check on files identified as XML files. It will validate each XML file against a list of XML schemas and return the matching MIME-type in the result. If the Type Database contains an entry for this format, the result will be extended with the information from the Type Database.
With the :xml_validation option this behaviour can be turned off, for instance in cases where no EAD files are to be expected or no further identification of XML files is needed. Note that if no XML files are present, the Identifier will not spend any time on XML validation anyway.
The list of validation schemas can be extended with the class method #add_xml_validation which takes two parameters: a MIME-type and the path to an XML schema (XSD). If you want to also assign a fictuous PUID to the XML type you should add an entry to your Type Database with the same MIME-type.
{
:messages => [
[0] [
[0] :debug,
[1] "XML file validated against XML Schema: [...]/data/ead.xsd"
],
[1] [
[0] :debug,
[1] "XML file validated against XML Schema: [...]/data/ead.xsd"
],
[2] [
[0] :debug,
[1] "XML file validated against XML Schema: [...]/data/ead.xsd"
]
],
:output => {
"[...]/data/Cevennes2.bmp" => [
[0] {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "fmt/116",
:mimetype => "image/bmp",
:format_name => "Windows Bitmap",
:format_version => "3.0",
:tool => :droid,
:TYPE => :BMP,
:GROUP => :IMAGE,
:score => 7
},
[1] {
:puid => "fmt/116",
:format_name => "Windows Bitmap",
:format_version => "Windows Bitmap 3.0",
:mimetype => "image/bmp",
:matchtype => "signature",
:tool => :fido,
:TYPE => :BMP,
:GROUP => :IMAGE,
:score => 7
},
[2] {
:mimetype => "image/bmp",
:matchtype => "magic",
:tool => :file,
:TYPE => :BMP,
:GROUP => :IMAGE,
:score => 2
}
],
"[...]/data/Cevennes2.jp2" => [
[0] {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "x-fmt/392",
:mimetype => "image/jp2",
:format_name => "JP2 (JPEG 2000 part 1)",
:format_version => "",
:tool => :droid,
:TYPE => :JP2,
:GROUP => :IMAGE,
:score => 7
},
[1] {
:puid => "x-fmt/392",
:format_name => "JP2 (JPEG 2000 part 1)",
:format_version => "JPEG2000",
:mimetype => "image/jp2",
:matchtype => "signature",
:tool => :fido,
:TYPE => :JP2,
:GROUP => :IMAGE,
:score => 7
},
[2] {
:mimetype => "image/jp2",
:matchtype => "magic",
:tool => :file,
:TYPE => :JP2,
:GROUP => :IMAGE,
:score => 2
}
],
[...]
},
:formats => {
"[...]/data/Cevennes2.bmp" => {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "fmt/116",
:mimetype => "image/bmp",
:format_name => "Windows Bitmap",
:format_version => "3.0",
:tool => :droid,
:TYPE => :BMP,
:GROUP => :IMAGE,
:score => 7
},
"[...]/data/Cevennes2.jp2" => {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "x-fmt/392",
:mimetype => "image/jp2",
:format_name => "JP2 (JPEG 2000 part 1)",
:format_version => "JPEG2000",
:tool => :droid,
:TYPE => :JP2,
:GROUP => :IMAGE,
:score => 7
},
"[...]/data/NikonRaw-CameraRaw.TIF" => {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "fmt/353",
:mimetype => "image/tiff",
:format_name => "Tagged Image File Format",
:format_version => "TIFF generic (little-endian)",
:tool => :droid,
:TYPE => :TIFF,
:GROUP => :IMAGE,
:score => 7
},
"[...]/data/NikonRaw-CaptureOne.tif" => {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "x-fmt/387",
:mimetype => "image/tiff",
:format_name => "Exchangeable Image File Format (Uncompressed)",
:format_version => "2.2",
:tool => :droid,
:TYPE => :TIFF,
:GROUP => :IMAGE,
:score => 7,
:alternatives => [
[0] {
:puid => "fmt/353",
:format_name => "Tagged Image File Format",
:format_version => "TIFF generic (little-endian)",
:mimetype => "image/tiff",
:matchtype => "signature",
:tool => :fido,
:TYPE => :TIFF,
:GROUP => :IMAGE,
:score => 7
},
[1] {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "x-fmt/387",
:mimetype => "image/tiff",
:format_name => "Exchangeable Image File Format (Uncompressed)",
:format_version => "2.2",
:tool => :droid,
:TYPE => :TIFF,
:GROUP => :IMAGE,
:score => 7
}
]
},
"[...]/data/test-ead.xml" => {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "fmt/101",
:mimetype => "archive/ead",
:format_name => "Encoded Archival Description (EAD)",
:format_version => "",
:tool => :xsd_validation,
:TYPE => :EAD,
:GROUP => :ARCHIVE,
:score => 7,
:tool => :droid,
:match_type => "xsd_validation"
},
"[...]/data/test.doc" => {
:matchtype => "container",
:ext_mismatch => "false",
:puid => "fmt/40",
:mimetype => "application/msword",
:format_name => "Microsoft Word Document",
:format_version => "97-2003",
:tool => :droid,
:TYPE => :MSDOC,
:GROUP => :TEXT,
:score => 9,
:alternatives => [
[0] {
:matchtype => "container",
:ext_mismatch => "false",
:puid => "fmt/40",
:mimetype => "application/msword",
:format_name => "Microsoft Word Document",
:format_version => "97-2003",
:tool => :droid,
:TYPE => :MSDOC,
:GROUP => :TEXT,
:score => 9
},
[1] {
:puid => "fmt/111",
:format_name => "OLE2 Compound Document Format",
:format_version => "OLE2 Compound Document Format",
:mimetype => nil,
:matchtype => "signature",
:tool => :fido,
:score => 3
}
]
},
"[...]/data/test.docx" => {
:matchtype => "container",
:ext_mismatch => "false",
:puid => "fmt/412",
:mimetype => "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
:format_name => "Microsoft Word for Windows",
:format_version => "2007 onwards",
:tool => :droid,
:TYPE => :MSDOCX,
:GROUP => :TEXT,
:score => 9
},
"[...]/data/test.xlsx" => {
:matchtype => "container",
:ext_mismatch => "false",
:puid => "fmt/214",
:mimetype => "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
:format_name => "Microsoft Excel for Windows",
:format_version => "2007 onwards",
:tool => :droid,
:TYPE => :MSXLSX,
:GROUP => :TABULAR,
:score => 9,
:alternatives => [
[0] {
:matchtype => "container",
:ext_mismatch => "false",
:puid => "fmt/214",
:mimetype => "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
:format_name => "Microsoft Excel for Windows",
:format_version => "2007 onwards",
:tool => :droid,
:TYPE => :MSXLSX,
:GROUP => :TABULAR,
:score => 9
},
[1] {
:mimetype => "application/octet-stream",
:matchtype => "magic",
:tool => :file,
:score => -2
}
]
},
"[...]/data/test_pdfa.pdf" => {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "fmt/354",
:mimetype => "application/pdf",
:format_name => "Acrobat PDF/A - Portable Document Format",
:format_version => "1b",
:tool => :droid,
:TYPE => :PDFA,
:GROUP => :TEXT,
:score => 7,
:alternatives => [
[0] {
:puid => "fmt/19",
:format_name => "Acrobat PDF 1.5 - Portable Document Format",
:format_version => "PDF 1.5",
:mimetype => "application/pdf",
:matchtype => "signature",
:tool => :fido,
:TYPE => :PDF,
:GROUP => :TEXT,
:score => 7
},
[1] {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "fmt/354",
:mimetype => "application/pdf",
:format_name => "Acrobat PDF/A - Portable Document Format",
:format_version => "1b",
:tool => :droid,
:TYPE => :PDFA,
:GROUP => :TEXT,
:score => 7
}
]
}
}
}
git checkout -b my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature
)FAQs
Unknown package
We found that libis-format demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
A new site reviews software projects to reveal if they’re truly FOSS, making complex licensing and distribution models easy to understand.
Security News
Astral unveils pyx, a Python-native package registry in beta, designed to speed installs, enhance security, and integrate deeply with uv.
Security News
The Latio podcast explores how static and runtime reachability help teams prioritize exploitable vulnerabilities and streamline AppSec workflows.