addext - npm Package Compare versions

+88

addext/pronom_xml_to_json.py

		#!/usr/bin/env python3

		"""
		Save selected information from PRONOM exports
		to JSON file, using PUID as key

		PRONOM exports available from Ross Spencer:
		https://github.com/exponential-decay/
		pronom-archive-and-skeleton-test-suite

		Positional args:
		pronom_export: Path to directory containing PRONOM XML exports
		json_path: Path for new JSON file
		"""

		import json
		from lxml import etree, objectify
		import os
		import sys


		def main():

		# Save abspaths for args
		pronom_export = os.path.abspath(sys.argv[1])
		json_path = os.path.abspath(sys.argv[2])

		# Create dict to store data
		puids = dict()

		# Walk pronom_exports and parse XML files
		for root_dir, _, files in os.walk(pronom_export):
		for file_ in files:
		# Save filepath
		file_path = os.path.join(root_dir, file_)

		# Skip file if not XML
		if not file_path.lower().endswith('xml'):
		continue

		# Open XML file and strip namespaces
		tree = etree.parse(file_path)
		root = tree.getroot()
		for elem in root.getiterator():
		if not hasattr(elem.tag, 'find'):
		continue
		i = elem.tag.find('}')
		if i >= 0:
		elem.tag = elem.tag[i + 1:]
		objectify.deannotate(root, cleanup_namespaces=True)

		# Create dict to save format information
		format_info = dict()
		puid = ''
		file_extensions = list()

		# Parse XML
		for target in root.findall('.//FileFormat'):

		# Save format and version to format info dict
		format_info['file_format'] = target.find('FormatName').text
		format_info['version'] = target.find('FormatVersion').text.strip()

		# Save PUID to variable
		for target1 in target.findall('.//FileFormatIdentifier'):
		id_type = target1.find('IdentifierType').text
		if id_type == 'PUID':
		puid = target1.find('Identifier').text

		# Save file extensions to list
		for target2 in target.findall('.//ExternalSignature'):
		signature_type = target2.find('SignatureType').text
		if signature_type == 'File extension':
		file_extensions.append(target2.find('Signature').text)

		# Add file extensions list to format info dict
		format_info['file_extensions'] = file_extensions

		# Add to dict with PUID as key
		puids[puid] = format_info

		# Write dict to file as JSON
		with open(json_path, 'w') as f:
		json.dump(puids, f, indent=2)


		if __name__ == '__main__':
		main()

+95

README.md

		## addext

		### Version: 2.0.0

		[![Build Status](https://travis-ci.org/timothyryanwalsh/addext.svg?branch=master)](https://travis-ci.org/timothyryanwalsh/addext)

		Python script to add file extensions to files without them, based on Siegfried identification of PUID.

		### Calling addext

		`addext.py` takes two positional arguments:
		* `target`: Path to target file or directory
		* `json`: Path to addext PRONOM JSON file (`pronom_v95.json` is included in this repository for convenience. See PRONOM JSON file section below for instructions on how to create a new JSON file in expected format from PRONOM XML exports)

		Options include:
		* `-d, --dryrun`: Perform dry run (print would-be changes to terminal instead of renaming files)
		* `-m, --manual`: Manually choose extension to add to files when PRONOM gives several options (not available in Windows)

		### Behavior

		#### Default mode

		In its default mode, `addext` adds file extensions to files if they meet a few conditions:
		* Siegfried can positively identify a PUID for the file
		* There is at least one file extension associated with the PUID in PRONOM
		* The file does not already have one of the extensions listed in PRONOM for that PUID (case-insensitive)

		If all conditions are met, `addext` adds the file extension to the file in-place. It is recommended that you try a dry run first to evaluate the proposed changes before renaming files.

		#### Manual mode

		In `-m, --manual` mode, `addext` follows the following logic:
		* If Siegfried cannot positively identify a PUID for the file, skip the file
		* If there is only one file extension associated with the PUID in PRONOM and the file does not already have this extension (case-insensitive), add the extension
		* If there is more than one file extension associated with the PUID in PRONOM and the file does not already have this extension, allow the user to choose which extension to add and then modify the filename in-place

		Note that for directories with many files, going through the files one-by-one in manual mode may take some time. Running `addext` as a dry run in manual mode may help give an idea of the extent of manual choices you will be asked to make.

		Due to its dependency on [Inquirer](https://github.com/magmax/python-inquirer), manual mode is not available on Windows.

		### Requirements

		* Python 3.6+
		* [Siegfried](https://github.com/richardlehane/siegfried)
		* [Inquirer](https://github.com/magmax/python-inquirer): For selection between extension options in `-m, --manual` mode (Linux/macOS only); installed with `pip install inquirer`

		### Installation

		#### Install Siegfried

		Install Siegfried following the instructions found [here](https://github.com/richardlehane/siegfried).

		#### Install via git clone/download

		The easiest way to use `addext` is to clone or download this repository and then run the script with `python3 /path/to/addext.py [options]`.

		If taking this route, install additional Python library dependencies: `pip install -r requirements.txt` or `pip install inquirer` (this may require sudo permissions).

		#### Install via PyPI

		`addext` can also be installed via `pip install addext`. This will install a script in the `/usr/local/bin` directory (assuming a Linux/macOS installation) so that `addext` can be called from anywhere with simply `addext.py [options]`.

		Note that following installation, you will need to download or create a PRONOM JSON file to use with `addext`.

		### PRONOM JSON file

		#### Description

		The PRONOM JSON file is a lightweight representation of information from PRONOM needed for addext to function. The file contains an object for each format described with a PRONOM ID (PUID), structured like the following example:

		```
		"fmt/858": {
		"file_format": "Navisworks Document",
		"version": "2010",
		"file_extensions": [
		"nwd",
		"nwc"
		]
		}
		```

		#### Updating the PRONOM JSON file

		`pronom_v95.json` is currently up-to-date with PRONOM release v95.

		To create a new PRONOM JSON file (for instance, after a new PRONOM release):
		* Get PRONOM XML export from Ross Spencer's [Release repository for The Skeleton Test Suite](https://github.com/exponential-decay/pronom-archive-and-skeleton-test-suite), which provides a set of DOIs for archives of PRONOM releases.
		* Run `addext/pronom_xml_to_json.py` to create a new PRONOM JSON file from the XML exports: `python3 pronom_xml_to_json.py /path/to/pronom/export/directory pronom.json`

		### Creators

		* Canadian Centre for Architecture
		* Tim Walsh

		This project was initially developed in 2016-2017 for the [Canadian Centre for Architecture](https://www.cca.qc.ca) by Tim Walsh, Digital Archivist, as part of the development of the Archaeology of the Digital project.

+4

-4

addext.egg-info/PKG-INFO

		Metadata-Version: 1.1
		Name: addext
		Version: 1.0.1
		Summary: Adds file extensions to files based on their PRONOM identifiers (PUIDs).
		Version: 2.0.0
		Summary: Adds file extensions based on PRONOM ID
		Home-page: https://github.com/timothyryanwalsh/addext
		@@ -22,6 +22,6 @@ Author: Tim Walsh
		Classifier: Operating System :: Microsoft :: Windows
		Classifier: Programming Language :: Python :: 2.7
		Classifier: Programming Language :: Python :: 3.5
		Classifier: Programming Language :: Python :: 3.6
		Classifier: Programming Language :: Python :: 3.7
		Classifier: Programming Language :: Python :: 3.8
		Classifier: Topic :: System :: Filesystems
		Classifier: Topic :: Utilities

+2

-3

addext.egg-info/SOURCES.txt

		@@ -1,7 +0,6 @@
		MANIFEST.in
		README.md
		setup.py
		addext/__init__.py
		addext/addext.py
		addext/pronom-xml-to-sqlite.py
		addext/pronom.db
		addext/pronom_xml_to_json.py
		addext.egg-info/PKG-INFO
		@@ -8,0 +7,0 @@ addext.egg-info/SOURCES.txt

+185

-189

addext/addext.py

		@@ -1,223 +0,219 @@
		#!/usr/bin/env python
		# -- coding: utf-8 --
		#!/usr/bin/env python3

		"""
		Adds file extensions to files based on their PUIDs.
		addext
		------
		CLI utility to add file extensions to files without them based on PRONOM ID

		Tim Walsh
		November 2017
		Script has three modes:
		* Default: Adds first file extension associated with PUID in PRONOM
		* Dry run: Preview changes from Defualt mode without making any changes
		to the files
		* Manual: Manually choose extension to add to files when PRONOM gives several
		options (Linux/macOS only)

		Requires Siegfried and inquirer. See README for installation instructions
		"""

		import argparse
		import csv
		import inquirer
		import logging
		import json
		import os
		import shutil
		import sqlite3
		import subprocess
		import sys
		import tempfile
		try:
		# python3
		from urllib.request import urlopen
		except ImportError:
		# fall back to python 2's urllib2
		from urllib2 import urlopen


		def _make_parser():
		parser = argparse.ArgumentParser()
		parser.add_argument("-d", "--dryrun",
		help="Perform dry run: print would-be changes to terminal",
		action="store_true")
		parser.add_argument("-m", "--manual",
		help="Manually choose extension to add to files when PRONOM gives several options (not available in Windows)",
		action="store_true")
		parser.add_argument("--droid_csv",
		help="Path to DROID CSV (created by DROID or Siegfried) for files",
		action="store")
		parser.add_argument("file",
		help="Path to file or files where extensions will be added")
		parser.add_argument(
		"-d",
		"--dryrun",
		help="Perform dry run: print would-be changes to terminal",
		action="store_true",
		)
		parser.add_argument(
		"-m",
		"--manual",
		help="Manually choose extension when multiple options (Linux/macOS)",
		action="store_true",
		)
		parser.add_argument("target", help="Path to target file or directory")
		parser.add_argument("json", help="Path to PRONOM JSON file")

		return parser

		def download_pronom_db():

		def _configure_logging():
		"""
		Download pronom.db from Github to script directory.
		Configure logging to write to logfile created in
		user's current directory and to stdout
		"""

		print("Addext could not find pronom.db file in script directory.")
		print("Downloading file now. This should only be necessary once.")
		logging.basicConfig(
		level=logging.INFO,
		format="%(asctime)s - %(levelname)s - %(message)s",
		handlers=[
		logging.FileHandler("addext.log"),
		logging.StreamHandler(sys.stdout)
		],
		)
		logger = logging.getLogger()
		return logger

		# url for pronom.db
		url = "https://github.com/timothyryanwalsh/addext/blob/master/addext/pronom.db?raw=true"

		# download file to current directory
		file_name = "pronom.db"
		u = urlopen(url)
		f = open(file_name, 'wb')
		block_sz = 8192
		while True:
		buffer = u.read(block_sz)
		if not buffer:
		break
		f.write(buffer)
		f.close()
		def _puid_or_none(sf_matches):
		"""
		From input list of dictionaries describing Siegfried
		matches for given file, return PUID or None
		"""
		puid = None
		for match in sf_matches:
		if match["ns"] == "pronom":
		puid = match["id"]
		return puid

		# check that file was successfully downloaded
		if os.path.isfile(file_name) and os.path.getsize(file_name) > 0:
		print("File successfully downloaded.")

		def _check_file_extension(filepath, extensions):
		"""
		Return True if file extension (case-insensitive)
		is present in list, and False if not
		"""
		# Get lower-cased file extension from path
		_, file_extension = os.path.splitext(filepath)
		file_extension_lower = file_extension[1:].lower()
		# Make lower-cased list
		extensions_lower = list()
		for item in extensions:
		extensions_lower.append(item.lower())
		# Check equivalency
		if file_extension_lower in extensions_lower:
		return True
		else:
		print("Error downloading database. Check permissions in script directory.")
		sys.exit(69)
		return False


		def _rename_file(filepath, new_file, new_filepath, logger):
		"""
		Rename file in place and log OSErrors
		"""
		try:
		os.rename(filepath, new_filepath)
		logger.info(f"{filepath} renamed -> {new_file}")
		except OSError as e:
		logger.error(f"Unable to rename {filepath}. Details: {e}")


		def _process_file(root, filepath, pronom_data, args, logger):
		"""
		Identify and rename file, respecting user args
		"""
		file_ = os.path.basename(filepath)

		# Attempt to determine PUID with Siegfried
		cmd = ["sf", "-json", filepath]
		try:
		sf_json = subprocess.check_output(cmd)
		except subprocess.CalledProcessError as e:
		logger.error("Unable to call Siegfried. Is it installed and on path?")
		sys.exit(1)
		sf_data = json.loads(sf_json)
		puid = _puid_or_none(sf_data["files"][0]["matches"])

		# Return if unidentified
		if not puid:
		logger.info(f"Skipping {filepath} - format not identifiable")
		return

		# Save file format
		file_format = pronom_data[puid]["file_format"]

		# Return if already has one of extensions listed in PRONOM
		extensions = pronom_data[puid]["file_extensions"]
		extension_in_place = _check_file_extension(filepath, extensions)
		if extension_in_place:
		logger.info(
		f"Skipping {filepath} - already has correct extension for {file_format} ({puid})"
		)
		return

		# Return if no extensions listed for format in PRONOM
		if not extensions:
		logger.info(
		f"Skipping {filepath} - no extensions listed in PRONOM for {file_format} ({puid})"
		)
		return

		# If manual mode and > 1 extension available, prompt for user input
		if args.manual and len(extensions) > 1:
		# Log all known extensions
		extensions_str = ", ".join([x for x in extensions])
		logger.info(
		f"{filepath} identified as {file_format} ({puid}). Possible extensions: {extensions_str}"
		)
		# If --dryrun, return
		if args.dryrun:
		return
		# Otherwise, prompt user for extension and rename file in place
		else:
		# Use Inquirer to let user choose from list
		questions = [
		inquirer.List(
		"extension",
		message="Which extension would you like to add?",
		choices=extensions,
		)
		]
		# Get chosen extension
		answers = inquirer.prompt(questions)
		extension_to_add = answers["extension"]
		# Rename file
		new_file = f"{file_}.{extension_to_add}"
		new_filepath = os.path.join(root, new_file)
		_rename_file(filepath, new_file, new_filepath, logger)
		return

		# If default (auto) mode or only 1 extension, use first extension
		extension_to_add = extensions[0]
		new_file = f"{file_}.{extension_to_add}"
		new_filepath = os.path.join(root, new_file)
		# If --dryrun, log change to make and return
		if args.dryrun:
		logger.info(
		f"{filepath} identified as {file_format} ({puid}). Rename {file_} -> {new_file}"
		)
		return
		# Otherwise, rename file in place
		_rename_file(filepath, new_file, new_filepath, logger)


		def main():

		# parse arguments
		# Parse arguments
		parser = _make_parser()
		args = parser.parse_args()

		source = os.path.abspath(args.file)
		# Store fs references as abspaths
		target = os.path.abspath(args.target)
		pronom_json = os.path.abspath(args.json)

		# connect to pronom.db
		THIS_DIR = os.path.dirname(os.path.realpath(__file__))
		db = os.path.join(THIS_DIR, 'pronom.db')
		# download copy of pronom.db if not in same directory as script
		if not os.path.isfile(db):
		download_pronom_db()
		try:
		conn = sqlite3.connect(db)
		conn.text_factory = str # allows utf-8 data to be stored
		cursor = conn.cursor()
		except:
		print("Error connecting to pronom.db database. Shutting down.")
		sys.exit(69)
		# Configure logging
		logger = _configure_logging()

		# create DROID CSV if user didn't pass one to script
		if args.droid_csv:
		droid_csv = os.path.abspath(args.droid_csv)
		else:
		# create tempdir for droid csv
		tmpdir = tempfile.mkdtemp()
		tmpdir_path = os.path.abspath(tmpdir)
		droid_csv = os.path.join(tmpdir_path, 'droid.csv')
		# create droid csv with siegfried
		subprocess.call("sf -droid '%s' > '%s'" % (source, droid_csv), shell=True)
		# Load PRONOM JSON as dictionary
		with open(pronom_json, "r") as f:
		pronom_data = json.load(f)

		# loop through files
		for rt, dirs, files in os.walk(source):
		for f in files:
		filepath = os.path.join(rt, f)
		puid = ''
		# search DROID CSV for path, get PUID
		with open(droid_csv) as droid:
		r = csv.reader(droid)
		for row in r:
		if row[3] == filepath:
		puid = row[14]
		fileformat = row[16]

		# if PUID found, carry on
		if puid != '':
		# if manual, give option to user whenever > 1 possible extension is found
		if args.manual:
		# get list of possible extensions using puid
		sql = "SELECT id from puids WHERE puid='%s';" % (puid)
		cursor.execute(sql)
		pk = cursor.fetchone()[0]
		sql = "SELECT extension from extensions WHERE puid='%s';" % (pk)
		cursor.execute(sql)
		file_ext_list = [item[0] for item in cursor.fetchall()]
		# if >= 1 extension found, carry on
		if file_ext_list:
		# check if dry run - if so, print results to terminal
		if args.dryrun == True:
		print("File %s is format %s (%s). Possible extensions: %s" % (filepath, fileformat, puid, ', '.join(map(str, file_ext_list))))
		else:
		# if only one possible extension, just add it and report to user
		if len(file_ext_list) == 1:
		# append filename to file in-place
		file_ext = "." + file_ext_list[0]
		new_filepath = filepath + file_ext
		new_filename = f + file_ext
		# check if file already ends in correct extension before adding
		if not filepath.lower().endswith(file_ext):
		try:
		os.rename(filepath, new_filepath)
		print("File " + filepath + " only has one possible extension. Renamed to " + new_filename)
		except OSError as err:
		print("Error renaming file " + filepath + ": ", err)
		else:
		print("File " + filepath + " already has correct extension. Skipping file.")
		# if > 1 extension, give control to user
		else:
		# get user input
		if (sys.version_info > (3, 0)):
		choice = input("File %s is format %s (%s). Possible extensions: %s. Add an extension? (y/n)" % (filepath, fileformat, puid, ', '.join(map(str, file_ext_list))))
		else:
		choice = raw_input("File %s is format %s (%s). Possible extensions: %s. Add an extension? (y/n)" % (filepath, fileformat, puid, ', '.join(map(str, file_ext_list))))
		# if input is yes, display options and apply change
		if choice.lower() in ['yes', 'y']:
		# use Inquirer to let user choose from list
		questions = [
		inquirer.List('extension',
		message="Which extension would you like to add?",
		choices=file_ext_list,
		),
		]
		# get chosen extension
		answers = inquirer.prompt(questions)
		file_ext = "." + answers['extension']
		# append filename to file in-place
		new_filepath = filepath + file_ext
		new_filename = f + file_ext
		try:
		os.rename(filepath, new_filepath)
		print("File " + filepath + " renamed to " + new_filename)
		except OSError as err:
		print("Error renaming file " + filepath + ": ", err)
		else:
		print("File " + filepath + " skipped.")
		# Check if target is file
		if os.path.isfile(target):
		root = os.path.split(target)[0]
		_process_file(root, target, pronom_data, args, logger)
		return

		else:
		print("File " + filepath + " identified as " + puid + ". No extensions are registered in PRONOM for this PUID. Skipping file.")
		# else, use default extension (first listed in PRONOM for PUID)
		else:
		sql = "SELECT default_extension from puids WHERE puid='%s';" % (puid)
		cursor.execute(sql)
		file_ext = cursor.fetchone()[0]
		if file_ext:
		new_filepath = filepath + "." + file_ext # filename + extension
		new_filename = f + "." + file_ext # new filename without path
		# check if dry run - if so, print results to stdout
		if args.dryrun == True:
		if not filepath.lower().endswith(file_ext):
		print("File %s is format %s (%s). Rename %s -> %s" % (filepath, fileformat, puid, f, new_filename))
		else:
		print("File " + filepath + " already has correct extension. Skipping file.")
		else:
		# check if file already ends in correct extension before adding
		if not filepath.lower().endswith(file_ext):
		try:
		os.rename(filepath, new_filepath)
		print("File " + filepath + " renamed to " + new_filename)
		except OSError as err:
		print("Error renaming file " + filepath + ": ", err)
		else:
		print("File " + filepath + " already has correct extension. Skipping file.")
		else:
		print("File " + filepath + " identified as " + puid + ". No extensions are registered in PRONOM for this PUID. Skipping file.")
		else:
		print("File " + filepath + " not identified. Skipping file.")
		# If target is dir, walk recursively
		for root, _, files in os.walk(target):
		for file_ in files:
		filepath = os.path.join(root, file_)
		_process_file(root, filepath, pronom_data, args, logger)

		# delete DROID tempdir if applicable
		if not args.droid_csv:
		shutil.rmtree(tmpdir_path)

		# close db, print finished message
		conn.commit()
		conn.close()
		print("Process complete.")

		if __name__ == '__main__':
		main()
		if __name__ == "__main__":
		main()

+4

-4

PKG-INFO

		Metadata-Version: 1.1
		Name: addext
		Version: 1.0.1
		Summary: Adds file extensions to files based on their PRONOM identifiers (PUIDs).
		Version: 2.0.0
		Summary: Adds file extensions based on PRONOM ID
		Home-page: https://github.com/timothyryanwalsh/addext
		@@ -22,6 +22,6 @@ Author: Tim Walsh
		Classifier: Operating System :: Microsoft :: Windows
		Classifier: Programming Language :: Python :: 2.7
		Classifier: Programming Language :: Python :: 3.5
		Classifier: Programming Language :: Python :: 3.6
		Classifier: Programming Language :: Python :: 3.7
		Classifier: Programming Language :: Python :: 3.8
		Classifier: Topic :: System :: Filesystems
		Classifier: Topic :: Utilities

+0

-1

setup.cfg

		[egg_info]
		tag_build =
		tag_date = 0
		tag_svn_revision = 0

+16

-20

setup.py

		from setuptools import setup

		setup(
		name = 'addext',
		version = '1.0.1',
		url = 'https://github.com/timothyryanwalsh/addext',
		author = 'Tim Walsh',
		author_email = 'timothyryanwalsh@gmail.com',
		name='addext',
		version='2.0.0',
		url='https://github.com/timothyryanwalsh/addext',
		author='Tim Walsh',
		author_email='timothyryanwalsh@gmail.com',
		packages=['addext'],
		package_data={
		'addext': ['pronom.db']
		},
		include_package_data=True,
		py_modules = ['addext'],
		scripts = ['addext/addext.py'],
		install_requires = ['inquirer'],
		description = 'Adds file extensions to files based on their PRONOM identifiers (PUIDs).',
		keywords = 'extensions identification',
		platforms = ['POSIX', 'Windows'],
		classifiers = [
		py_modules=['addext'],
		scripts=['addext/addext.py'],
		install_requires=['inquirer'],
		description='Adds file extensions based on PRONOM ID',
		keywords='extensions identification',
		platforms=['POSIX', 'Windows'],
		classifiers=[
		'Development Status :: 4 - Beta',
		@@ -25,3 +21,3 @@ 'License :: OSI Approved :: MIT License',
		'Intended Audience :: Developers',
		'Natural Language :: English',
		'Natural Language :: English',
		'Operating System :: MacOS',
		@@ -31,8 +27,8 @@ 'Operating System :: MacOS :: MacOS X',
		'Operating System :: Microsoft :: Windows',
		'Programming Language :: Python :: 2.7',
		'Programming Language :: Python :: 3.5',
		'Programming Language :: Python :: 3.6',
		'Programming Language :: Python :: 3.7',
		'Programming Language :: Python :: 3.8',
		'Topic :: System :: Filesystems',
		'Topic :: Utilities'
		],
		)
		)

-128

addext/pronom-xml-to-sqlite.py

		#!/usr/bin/env python
		# -- coding: utf-8 --

		"""
		Crawls XML output from Ross Spencer's pronom-xml-export
		(https://github.com/exponential-decay/pronom-xml-export)
		and writes selected info into a sqlite db.

		Tim Walsh
		November 2017

		"""

		import argparse
		import os
		from lxml import etree, objectify
		import sqlite3

		def _make_parser():
		parser = argparse.ArgumentParser()
		parser.add_argument("source",
		help="Path of PRONOM XML export directory")
		parser.add_argument("destination",
		help="Path of directory to write sqlite db")

		return parser

		def main():

		# parse arguments
		parser = _make_parser()
		args = parser.parse_args()

		# make abspaths for source and dest dirs
		source = os.path.abspath(args.source)
		dest = os.path.abspath(args.destination)

		# create sqlite db
		db = os.path.join(dest, 'pronom.db')
		conn = sqlite3.connect(db)
		conn.text_factory = str # allows utf-8 data to be stored
		cursor = conn.cursor()

		# create db tables
		cursor.execute("DROP TABLE IF EXISTS puids")
		cursor.execute("DROP TABLE IF EXISTS extensions")
		cursor.execute("CREATE TABLE puids (id integer PRIMARY KEY AUTOINCREMENT, puid text, fileformat text, version text, default_extension text);")
		cursor.execute("CREATE TABLE extensions (id integer PRIMARY KEY AUTOINCREMENT, extension text, puid text, FOREIGN KEY (puid) REFERENCES puids(id));")

		# for PUID XML file in export, write info to dict, then db
		for rt, dirs, files in os.walk(source):
		for name in files:
		file_path = os.path.join(rt, name)

		# skip if not xml
		if not file_path.lower().endswith('xml'):
		continue

		# open xml file and strip namespaces
		tree = etree.parse(file_path)
		root = tree.getroot()
		for elem in root.getiterator():
		if not hasattr(elem.tag, 'find'): continue # (1)
		i = elem.tag.find('}')
		if i >= 0:
		elem.tag = elem.tag[i+1:]
		objectify.deannotate(root, cleanup_namespaces=True)

		# create dict for PUID
		puid = dict()

		# parse xml
		for target in root.findall(".//FileFormat"):

		# add basic info to dict
		puid['name'] = target.find("FormatName").text
		puid['version'] = target.find("FormatVersion").text.strip()

		# add identifiers to dict
		identifiers = list()
		for target1 in target.findall(".//FileFormatIdentifier"):
		id_dict = dict()
		id_dict['identifier'] = target1.find("Identifier").text
		id_dict['id_type'] = target1.find("IdentifierType").text
		identifiers.append(id_dict)
		puid['identifiers'] = identifiers

		# add external signatures to dict
		ext_sigs = list()
		for target2 in target.findall(".//ExternalSignature"):
		sig_dict = dict()
		sig_dict['sig_id'] = target2.find("ExternalSignatureID").text
		sig_dict['signature'] = target2.find("Signature").text
		sig_dict['sig_type'] = target2.find("SignatureType").text
		ext_sigs.append(sig_dict)
		puid['ext_sigs'] = ext_sigs

		# parse info from puid dict
		format_name = puid['name']
		format_version = puid['version']

		# always only one PUID
		pronom_ids = [x['identifier'] for x in puid['identifiers'] if x['id_type'] == "PUID"]
		if pronom_ids:
		pronom_id = pronom_ids[0]
		else:
		pronom_id = ''

		# 0 to many extensions - keep all in list and save first value separately
		file_exts = [x['signature'] for x in puid['ext_sigs'] if x['sig_type'] == "File extension"]
		if file_exts:
		default_ext = file_exts[0]
		else:
		default_ext = ''

		# write into db puid table
		cursor.execute("INSERT INTO puids(puid, fileformat, version, default_extension) VALUES (?,?,?,?);", (pronom_id, format_name, format_version, default_ext))
		puid_pk = cursor.lastrowid # get pk of row written
		# write extensions into extensions table
		for ext in file_exts:
		cursor.execute("INSERT INTO extensions(extension, puid) VALUES (?,?);", (ext, puid_pk))

		# close db
		conn.commit()
		conn.close()

		if __name__ == '__main__':
		main()

addext/pronom.db

Sorry, the diff of this file is too big to display

-1

MANIFEST.in

include addext/pronom.db

addext - npm Package Compare versions

Worsened metrics