
Security News
How Enterprise Security Is Adapting to AI-Accelerated Threats
Socket CTO Ahmad Nassri discusses why supply chain attacks now target developer machines and what AI means for the future of enterprise security.
This gem provides functionality for format identification and conversion. Converters may rely on the presence of 3rd party software, so read the class documentation carefully for the converters you want to use.
Add this line to your application's Gemfile:
gem 'libis-format'
And then execute:
$ bundle
Or install it yourself as:
$ gem install libis-format
The format database is the core of the format services. It stores information about all the known formats.
This class enables support for format identification. It will use 3 external tools to identify a file format: droid, fido and the unix file tool. Based on several criteria, the results of these tools will be given a score and the result with the highest confidence is returned. The output of all tools can also be returned. The format identification result contains a MIME-type and a PRONOM PUID if they are known. Also the calculated score, the tool that produced the outcome and recognition method are listed. Some tools also return more information like the format name and format version. If these are present on the most confident output, their values will also be part of the result info.
Unidentified files will be listed with MIME-type 'application/octet-stream' and PUID 'fmt/unknown'.
The main method #get that performs the format identification takes two parameters:
The options has accespts the following keys (Symbol types):
The result of the identification is a Hash with the following keys:
The identification outcome is a Hash of key-value pairs for each property returned. Keys are lower-case symbols. If the tools gave results that deffer in a significant part, their outcome will be added to a list as :alternatives (see example).
None of the tools used is able to identify an XML file by its content. In our context this is especially annoying for EAD XML files as we want to identify them in order to process them different from the other XML files. The Identifier solves this by performing an extra check on files identified as XML files. It will validate each XML file against a list of XML schemas and return the matching MIME-type in the result. If the Type Database contains an entry for this format, the result will be extended with the information from the Type Database.
With the :xml_validation option this behaviour can be turned off, for instance in cases where no EAD files are to be expected or no further identification of XML files is needed. Note that if no XML files are present, the Identifier will not spend any time on XML validation anyway.
The list of validation schemas can be extended with the class method #add_xml_validation which takes two parameters: a MIME-type and the path to an XML schema (XSD). If you want to also assign a fictuous PUID to the XML type you should add an entry to your Type Database with the same MIME-type.
{
:messages => [
[0] [
[0] :debug,
[1] "XML file validated against XML Schema: [...]/data/ead.xsd"
],
[1] [
[0] :debug,
[1] "XML file validated against XML Schema: [...]/data/ead.xsd"
],
[2] [
[0] :debug,
[1] "XML file validated against XML Schema: [...]/data/ead.xsd"
]
],
:output => {
"[...]/data/Cevennes2.bmp" => [
[0] {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "fmt/116",
:mimetype => "image/bmp",
:format_name => "Windows Bitmap",
:format_version => "3.0",
:tool => :droid,
:TYPE => :BMP,
:GROUP => :IMAGE,
:score => 7
},
[1] {
:puid => "fmt/116",
:format_name => "Windows Bitmap",
:format_version => "Windows Bitmap 3.0",
:mimetype => "image/bmp",
:matchtype => "signature",
:tool => :fido,
:TYPE => :BMP,
:GROUP => :IMAGE,
:score => 7
},
[2] {
:mimetype => "image/bmp",
:matchtype => "magic",
:tool => :file,
:TYPE => :BMP,
:GROUP => :IMAGE,
:score => 2
}
],
"[...]/data/Cevennes2.jp2" => [
[0] {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "x-fmt/392",
:mimetype => "image/jp2",
:format_name => "JP2 (JPEG 2000 part 1)",
:format_version => "",
:tool => :droid,
:TYPE => :JP2,
:GROUP => :IMAGE,
:score => 7
},
[1] {
:puid => "x-fmt/392",
:format_name => "JP2 (JPEG 2000 part 1)",
:format_version => "JPEG2000",
:mimetype => "image/jp2",
:matchtype => "signature",
:tool => :fido,
:TYPE => :JP2,
:GROUP => :IMAGE,
:score => 7
},
[2] {
:mimetype => "image/jp2",
:matchtype => "magic",
:tool => :file,
:TYPE => :JP2,
:GROUP => :IMAGE,
:score => 2
}
],
[...]
},
:formats => {
"[...]/data/Cevennes2.bmp" => {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "fmt/116",
:mimetype => "image/bmp",
:format_name => "Windows Bitmap",
:format_version => "3.0",
:tool => :droid,
:TYPE => :BMP,
:GROUP => :IMAGE,
:score => 7
},
"[...]/data/Cevennes2.jp2" => {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "x-fmt/392",
:mimetype => "image/jp2",
:format_name => "JP2 (JPEG 2000 part 1)",
:format_version => "JPEG2000",
:tool => :droid,
:TYPE => :JP2,
:GROUP => :IMAGE,
:score => 7
},
"[...]/data/NikonRaw-CameraRaw.TIF" => {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "fmt/353",
:mimetype => "image/tiff",
:format_name => "Tagged Image File Format",
:format_version => "TIFF generic (little-endian)",
:tool => :droid,
:TYPE => :TIFF,
:GROUP => :IMAGE,
:score => 7
},
"[...]/data/NikonRaw-CaptureOne.tif" => {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "x-fmt/387",
:mimetype => "image/tiff",
:format_name => "Exchangeable Image File Format (Uncompressed)",
:format_version => "2.2",
:tool => :droid,
:TYPE => :TIFF,
:GROUP => :IMAGE,
:score => 7,
:alternatives => [
[0] {
:puid => "fmt/353",
:format_name => "Tagged Image File Format",
:format_version => "TIFF generic (little-endian)",
:mimetype => "image/tiff",
:matchtype => "signature",
:tool => :fido,
:TYPE => :TIFF,
:GROUP => :IMAGE,
:score => 7
},
[1] {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "x-fmt/387",
:mimetype => "image/tiff",
:format_name => "Exchangeable Image File Format (Uncompressed)",
:format_version => "2.2",
:tool => :droid,
:TYPE => :TIFF,
:GROUP => :IMAGE,
:score => 7
}
]
},
"[...]/data/test-ead.xml" => {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "fmt/101",
:mimetype => "archive/ead",
:format_name => "Encoded Archival Description (EAD)",
:format_version => "",
:tool => :xsd_validation,
:TYPE => :EAD,
:GROUP => :ARCHIVE,
:score => 7,
:tool => :droid,
:match_type => "xsd_validation"
},
"[...]/data/test.doc" => {
:matchtype => "container",
:ext_mismatch => "false",
:puid => "fmt/40",
:mimetype => "application/msword",
:format_name => "Microsoft Word Document",
:format_version => "97-2003",
:tool => :droid,
:TYPE => :MSDOC,
:GROUP => :TEXT,
:score => 9,
:alternatives => [
[0] {
:matchtype => "container",
:ext_mismatch => "false",
:puid => "fmt/40",
:mimetype => "application/msword",
:format_name => "Microsoft Word Document",
:format_version => "97-2003",
:tool => :droid,
:TYPE => :MSDOC,
:GROUP => :TEXT,
:score => 9
},
[1] {
:puid => "fmt/111",
:format_name => "OLE2 Compound Document Format",
:format_version => "OLE2 Compound Document Format",
:mimetype => nil,
:matchtype => "signature",
:tool => :fido,
:score => 3
}
]
},
"[...]/data/test.docx" => {
:matchtype => "container",
:ext_mismatch => "false",
:puid => "fmt/412",
:mimetype => "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
:format_name => "Microsoft Word for Windows",
:format_version => "2007 onwards",
:tool => :droid,
:TYPE => :MSDOCX,
:GROUP => :TEXT,
:score => 9
},
"[...]/data/test.xlsx" => {
:matchtype => "container",
:ext_mismatch => "false",
:puid => "fmt/214",
:mimetype => "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
:format_name => "Microsoft Excel for Windows",
:format_version => "2007 onwards",
:tool => :droid,
:TYPE => :MSXLSX,
:GROUP => :TABULAR,
:score => 9,
:alternatives => [
[0] {
:matchtype => "container",
:ext_mismatch => "false",
:puid => "fmt/214",
:mimetype => "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
:format_name => "Microsoft Excel for Windows",
:format_version => "2007 onwards",
:tool => :droid,
:TYPE => :MSXLSX,
:GROUP => :TABULAR,
:score => 9
},
[1] {
:mimetype => "application/octet-stream",
:matchtype => "magic",
:tool => :file,
:score => -2
}
]
},
"[...]/data/test_pdfa.pdf" => {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "fmt/354",
:mimetype => "application/pdf",
:format_name => "Acrobat PDF/A - Portable Document Format",
:format_version => "1b",
:tool => :droid,
:TYPE => :PDFA,
:GROUP => :TEXT,
:score => 7,
:alternatives => [
[0] {
:puid => "fmt/19",
:format_name => "Acrobat PDF 1.5 - Portable Document Format",
:format_version => "PDF 1.5",
:mimetype => "application/pdf",
:matchtype => "signature",
:tool => :fido,
:TYPE => :PDF,
:GROUP => :TEXT,
:score => 7
},
[1] {
:matchtype => "signature",
:ext_mismatch => "false",
:puid => "fmt/354",
:mimetype => "application/pdf",
:format_name => "Acrobat PDF/A - Portable Document Format",
:format_version => "1b",
:tool => :droid,
:TYPE => :PDFA,
:GROUP => :TEXT,
:score => 7
}
]
}
}
}
git checkout -b my-new-feature)git commit -am 'Add some feature')git push origin my-new-feature)FAQs
Unknown package
We found that libis-format demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Socket CTO Ahmad Nassri discusses why supply chain attacks now target developer machines and what AI means for the future of enterprise security.

Security News
Learn the essential steps every developer should take to stay secure on npm and reduce exposure to supply chain attacks.

Security News
Experts push back on new claims about AI-driven ransomware, warning that hype and sponsored research are distorting how the threat is understood.