LibGuides Tools
A Golang package for working with LibGuides exported XML.
Table of contents
Introduction
There is a periodic need to work with exported LibGuides XML in Caltech Library. This is a Golang
package for working with the exported data. Go provides a robust may of mapping simple data structures
to and from XML (or JSON). This makes working with XML very easy in a consistent fashion. It seem time to move beyond my usual Bash/sed/python scripts.
One program is currently provided with springytools, lgxml2sjon which converts a LibGuides
XML export file into JSON.
Installation
This is a Golang package providing two commands for working with LibGuides' exported XML. To
compile you will need Go 1.16 or better, GNU Make and Stephen Dolan's jq for browser JSON output.
Steps to compile from source
- clone the repository
- change into the clone directory
- test
- build the command line tool lgxml2json
- use lgxml2json and test output with jq
- Replace "LibGuides_export_XXXXX.xml" with the file path to your exported LibGuides XML file
- install lgxml2json
Example commands to execute in the shell (e.g. Terminal on macOS, xterm on Linux)
git clone git@github.com:caltechlibrary/springytools
cd springytools
make
make test
make install
By default installation is to your $HOME/bin
directory. This directory should be in
your shell's "PATH".
You can get a brief description of the commands using the -h
option with the command.
lgxml2json -h
lglinkreport -h
Known issues and limitations
This library is currently written to perform the LibGuides link analysis.
It only provides the commands I needed to do the data analysis. It will grow as needed.
The exported XML output from the LibGuides may not be valid UTF-8. UTF-8 encoding
is required to successfully parse the export file. Looking at the raw XML markup in vim
I noticed a number of control code sequences. This corresponded to the errors on parsing
the unsanitized XML file. The problem characters appear as ^A, ^K, ^L, ^S, ^C, ^R
. These
maybe non-UTF-8 characters embedded as UTF-8 when the rich text documents were pasted in via
the LibGuides edit UI. My hunch is these were pasted in/imported from Word documents. Remove
the offending characters allowed the export to parse successfully. These edits are destructive
as some of the codes probably represent UTF-8 characters used in non-English European names or
terminology.
Getting help
File an issue on GitHub.
License
Software produced by the Caltech Library is Copyright © 2021 California Institute of Technology. This software is freely distributed under a BSD/MIT type license. Please see the LICENSE file for more information.
Authors and history
- R. S. Doiel, Software Developer, Digital Library Development, Caltech Library
Acknowledgments
This work was funded by the California Institute of Technology Library.
(If this work was also supported by other organizations, acknowledge them here. In addition, if your work relies on software libraries, or was inspired by looking at other work, it is appropriate to acknowledge this intellectual debt too.)