
Security News
TypeScript is Porting Its Compiler to Go for 10x Faster Builds
TypeScript is porting its compiler to Go, delivering 10x faster builds, lower memory usage, and improved editor performance for a smoother developer experience.
@fetsorn/js-gedcom
Advanced tools
Parser, serializer, type-checker, and validator for GEDCOM generally and v7 in particular
This repository has plain dependency-free JavaScript that can process GEDCOM in various ways and forms. It is designed to have various operating modes which are handled by separate module files to facilitate uses which do not need them all.
This repository does not currently handle different character sets. It assumes you have correctly parsed bytes into a JavaScript string before processing.
To use this project as a FamilySearch GEDCOM 7 validator, visit https://gedcom7code.github.io/js-gedcom/.
querySelector
and querySelectorAll
accepting "HEAD.GEDC"
-type tag paths.validate()
find
and findOrCreate
accepting arbitrarily-nested structure types and payload values (e.g. for finding a record with a given EXID
and EXID-TYPE
).So far, the testing has been limited to starting with maximal70.ged augmented with various extensions and verifying the following properties, mostly by hand, also checking that all warnings and errors issued are correct:
gedc = GEDCStruct.fromString(maximal, g7ConfGEDC)
maximal2 = gedc.toString()
// assert(maximal2 == maximal)
json_gedc = gedc.map(e=>e.toJSON())
gedc2 = GEDCStruct.fromJSON(json)
maximal3 = gedc2.map(e => e.toString('\n',-1,false)).join('')
// assert(maximal3 == maximal)
ged7 = G7Dataset.fromGEDC(gedc, g7validation)
gedc3 = ged7.toGEDC()
maximal4 = gedc3.toString()
// assert(maximal4 == maximal modulo some reordering and normalization)
json_ged7 = ged7.toJSON()
ged72 = G7Dataset.fromJSON(json, g7validation)
gedc4 = ged72.toGEDC()
maximal5 = gedc4.toString()
// assert(maximal5 == maximal4)
I've also done just a little ad-hoc testing to verify that if I create a G7Dataset programmatically it it populates its schema and otherwise serializes as expected.
Parses a GEDCOM dataset string into a sequence of GEDC structures. Each structure contains
tag
payload
, which is one of
null
for an encoded pointer with no destinationsub
, which is a list of other structuresFor internal use, we also track the following:
sup
, the (unique) structure that this structure is in the sub
list of, or null for top-level structuresreferences
, a (usually empty) list of other structures that this is the payload
ofid
, a recommended xref_id to use in serializing pointers to this structureBoth GEDCStruct
and the list of GEDCStruct returned by fromJSON
and fromString
have two utility methods, querySelect
and querySelectAll
,
modeled after the corresponding methods in DOM Elements
but using GEDCOM dot-notation paths instead. In particular,
XYZ
matches any structure with tag XYZ
.XYZ
matches any top-level structure with tag XYZ
ABC.XYZ
matches any structure with tag XYZ
that is a substructure of a structure with tag ABC
ABC..XYZ
matches any structure with tag XYZ
that contained within a structure with tag ABC
GEDC parsing takes care of converting xref_id to pointers and managing CONT and CONC pseudostructures; GEDC serializing handles these going the other way.
GEDC parsing and serializing both accept a configuration object with the following keys.
Parsing configurations:
len
= 0
{.js}
positive: limit lines to this many characters
zero: no length limit
negative: no length limit and no CONC allowed
tag
= /.*/
{.js}
A regex to limit the set of permitted tags.
Tags will always match at least /^[^@\p{Cc}\p{Z}][^\p{Cc}\p{Z}]*$/u
{.js}:
that is, 1 or more characters,
no whitespace or control characters,
and not beginning with @
.
xref
= /.*/
{.js}
A regex to limit the set of permitted cross-reference identifiers.
Cross-reference identifiers will always match at least /^([^@#\p{Cc}]|\t)([^@\p{Cc}]|\t)*$/u
{.js}:
that is, one or more characters,
no non-tab control characters,
no @
,
and not beginning with #
.
linesep
= /.*/
{.js}
A regex to limit what is considered a line separation. Line separations will always match at least /^[\n\r]\p{WSpace}*$/u: that is, a carriage return or line feed followed by whitespace.
delim
= /.*/
{.js}
A regex to limit what is considered a delimiter. Delimiters will always match at least /^[ \t\p{Zs}]+$/u: that is, linear whitespace.
A single space will always be used during serialization, regardless of the value of delim
.
payload
= /.*/
{.js}
A regex to limit permitted string payloads.
zeros
= false
If true
, allow leading zeros on levels (e.g. 00
or 01
)
Serializing configurations:
newline
= '\n'
{.js}
A string to insert between lines when serializing.
Should match linesep
.
escapes
= false
If true
, serialize payloads beginning @#
as @#
instead of @@#
.
Both always deserialize as the same thing.
Two special config objects are provided to match the GEDCOM 5.x and FamilySearch GEDCOM 7.x specs:
/** GEDCOM 5.x-compatible configuration */
const g5ConfGEDC = {
len: 255,
tag: /^[0-9a-z_A-Z]{1,31}$/u,
xref: /^[0-9a-z_A-Z][^\p{Cc}@]{0,19}$/u,
linesep: /^[\r\n][\r\n \t]*$/,
delim: /^ $/,
zeros: false,
escapes: true,
}
/** GEDCOM 7.x-compatible configuration */
const g7ConfGEDC = {
len: -1,
tag: /^([A-Z]|_[0-9_A-Z])[0-9_A-Z]*$/u,
xref: /^([A-Z]|_[0-9_A-Z])[0-9_A-Z]*$/u,
linesep: /^(\r\n?|\n\r?)$/,
delim: /^ $/,
payload: /^.+$/,
zeros: false,
escapes: false,
}
As of commit 34dd91ad90ce5e8301e943b4d559a603028b45c9 (2023-07-18), the implementation round-trips maximal70.ged
from https://gedcom.io/tools/; that is maximal70.ged → fromString → toJSON → fromJSON → toString == maximal70.ged.
Note that this does not constitute an exhaustive test
and the code may contain bugs.
Uses a parsed schema like g7validation.json to convert a GEDC dataset into a FamilySearch GEDCOM 7 dataset. Various FamilySearch GEDCOM 7 rules are embedded within the code, including extension handling, payload datatypes, and structure ordering rules.
A G7Structure contains
type
, a URI or unregistered extension tagpayload
, which may have many different types depending on the type
sub
, which is a map with type
keys and list-of-G7Structure valuesThe type
is omitted during JSON serialization as it is available in the G7Structure's containing structure.
For internal use, we also track the following:
sup
, the (unique) structure that this structure is in the sub
list of, or null for top-level structuresreferences
, a (usually empty) list of other structures that this is the payload
ofid
, a recommended xref_id to use in serializing pointers to this structureBecause some operations are handled centrally (such as determining which extension tags are in use), a G7Dataset is used to enclose the G7Structures; it contains
header
, a G7Structure
with type https://gedcom.io/terms/v7/HEADrecords
, which is exactly like G7Structure's sub
This code is released under both the MIT and UNLICENSE. The dual licensing is motivated by the following observations:
More specifically, this code and all its dependencies are compatible with this licensing choice. Any dependencies (direct and transitive) will always be limited to permissive licenses. This code will never depend on code that is not permissively licensed. This means rejecting any dependency that uses a copyleft license such as the GPL, LGPL, MPL or any of the Creative Commons ShareAlike licenses.
Reports of errors or gaps in the code are very welcome, preferably as issues on github. Pull requests extending functionality or fixing errors are also welcome.
FAQs
Parser, serializer, type-checker, and validator for GEDCOM generally and v7 in particular
The npm package @fetsorn/js-gedcom receives a total of 1 weekly downloads. As such, @fetsorn/js-gedcom popularity was classified as not popular.
We found that @fetsorn/js-gedcom demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
TypeScript is porting its compiler to Go, delivering 10x faster builds, lower memory usage, and improved editor performance for a smoother developer experience.
Research
Security News
The Socket Research Team has discovered six new malicious npm packages linked to North Korea’s Lazarus Group, designed to steal credentials and deploy backdoors.
Security News
Socket CEO Feross Aboukhadijeh discusses the open web, open source security, and how Socket tackles software supply chain attacks on The Pair Program podcast.