Security News
Maven Central Adds Sigstore Signature Validation
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.
Normalized sort key for sorting Library of Congress call numbers.
Sorting Library of Congress call numbers is tricky. This library generates a sort key for a LC call number, such that for a list of callnums, their sort keys will sort (natural byte order) in the same order the call numbers should sort in.
# It's often useful to store the sort_key in a db
sort_key = Lcsort.normalize(callnum)
If the input can't be recognized as an LC Call Number, nil
will be returned.
This code is intended for ascii-only input, if you have UTF-8 in your call numbers, we don't know what will happen.
# Or if you have a list of call numbers in memory, easy
# enough to just sort them in memory:
call_num_array.sort_by {|callnum| Lcsort.normalize(callnum) }
Call numbers are diverse, both in standard LC and local practice. We wouldn't have the hubris to say we can properly recognize and sort EVERY possible LC call number including local practices. But we sure can handle a lot, including:
R 169.1 .B59 1990
R 169.B59.C39
, R169 B59C39 1990
KF 4558 15th .G6
sorts after KF 4558 1st .G6
Q11 .P6 vol. 4 no. 4
sorts before Q11 .P6 vol. 12 no. 1
.R 179 .C79ab
. Common local practice, and also used in NLM call numbers. (No guarantee that every NLM call number can be handled by this library for LC call numbers, but it seems to work okay for NLM.)OCLC's docs on MARC 050 includes some information on possible LC call number components.
Once you have a bunch of Lcsort keys in your database, you may want to search
to find all call numbers beginning with, say, EG 101
. So that might include EG 101.5
, EG 101 .C23 1990
etc.
The truncated_range_end
method gives you a proper ending range to get what you want, say:
sort_key >= #{Lcsort.normalize("EG 101")} AND sort_key <= #{Lcsort.truncated_range_end('EG 101')}
This can also be used for finding a range of call numbers. Say you want all call numbers
from those beginning with AB 101
to AB 500
:
sort_key >= #{Lcsort.normalize("AB 101")} AND sort_key <= #{Lcsort.truncated_range_end('AB 500')}
truncated_range_end
works with as many or as few call number components as you want. Lcsort.truncated_range_end('AB 101.1')
will find AB 101.123
or AB 101.1 .A5
too. Lcsort.truncated_range_end("AB 101 .C45")
will find AB 101 .C456
, AB 101 .C45 .B5
, etc.
At the moment, truncated_range_end
actually pretty much just adds an ~
onto the end
of the normalized sort key. But it did more complicated things in past versions of
the normaliation algorithm, and we do have tests ensuring it finds what is expected.
Sometimes you want to add something on to the end of a normalized call number, as a payload, or to ensure normalized sort key uniqueness.
You can pass an :append_suffix to have it appended in a way that won't otherwise change the sort order of the original call number.
I use this to add the bib ID on to the end of the normalized sort key, because if two bibs have identical call numbers, I want to avoid normalized sort key collision, because my functions work better with all unique sort keys.
sortkey = Lcsort.normalize(callnumber, :append_suffix => bibID)
Original regex and code by Bill Dueber. Original port to ruby by Nikitas Tampakis. LC handling advice from Naomi Dushay and her code.
FAQs
Unknown package
We found that lcsort demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.