Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Automated Metadata Service
Manage metadata from different sources. The examples in the package are specific to Caltech repositories, but could be generalized. This package is currently in development and will have additional sources and matchers added over time.
You need to have Python 3.7 on your machine (Miniconda is a great installation option).
If you just need the python functions to write your own code
(like codemeta_to_datacite) open a terminal and type pip install ames
You need to have Python 3.7 on your machine
(Miniconda is a great
installation option). Test whether you have python installed by opening a terminal or
anaconda prompt window and typing python -V
, which should print version 3.7
or greater.
It's best to download this software using git. To install git, type
conda install git
in your terminal or anaconda prompt window. Then find where you
want the ames folder to live on your computer in File Explorer or Finder
(This could be the Desktop or Documents folder, for example). Type cd
in anaconda prompt or terminal and drag the location from the file browser into
the terminal window. The path to the location
will show up, so your terminal will show a command like
cd /Users/tmorrell/Desktop
. Hit enter. Then type
git clone https://github.com/caltechlibrary/ames.git
. Once you
hit enter you'll see an epxml_to_datacite folder. Type cd ames
Now that you're in the ames folder, type python setup.py install
. You can
now run all the different operations described below.
When there is a new version of the software, go to the ames
folder in anaconda prompt or terminal and type git pull
. You shouldn't need to re-do
the installation steps unless there are major updates.
The run scripts show examples of using ames to perform a specific update operation.
In the test directory these is an example of using the codemeta_to_datacite function to convert a codemeta file to DataCite standard metdata
Collect GitHub records in CaltechDATA, search for a codemeta.json file, and update CaltechDATA with new metadata.
You need to set an environmental variable with your token to access
CaltechDATA export TINDTOK=
Type python run_codemeta.py
.
Harvest citation data from the Crossref Event Data API, records in CaltechDATA, match records, update metadata in CaltechDATA, and send email to user.
You need to set environmental variables with your token to access
CaltechDATA export TINDTOK=
and Mailgun export MAILTOK=
.
Type python run_event_data.py
. You'll be prompted for confirmation if any
new citations are found.
Update media records in DataCite that indicate the files associated with a DOI.
You need to set an environmental variable with your password for your DataCite
account using export DATACITE=
Type python run_media_update.py
.
This will run checks on the quality of metadata in CaltechDATA. Currently this
verifies whether redundent links are present in the related identifier section.
It also can update metadata with DataCite.
You need to set environmental variables with your token to access
CaltechDATA export TINDTOK=
Type python run_caltechdata_checks.py
.
This will improve the quality of metadata in CaltechDATA. This option is broken up into updates that should run frequently (currently every 10 minutes) and daily. Frequent updates include adding a recommended citation to the descriptions, and daily updates include adding CaltechTHESIS DOIs to CaltechDATA.
You need to set environmental variables with your token to access
CaltechDATA export TINDTOK=
Type python run_caltechdata_updates.py
or python run_caltechdata_daily.py
.
This will harvest download and view information from matomo and format it into a COUNTER report. This feature is still being tested.
You need to set environmental variables with your token to access
Matomo export MATTOK=
Type python run_usage.py
.
Runs reports on ArchivesSpace. Current reports:
Example usage:
python run_archives_report.py accession_report accession.csv -subject "Manuscript Collection"
Perform update options using the Eprints API. Supports url updates to https for resolver field, special character updates, and adjusting the item modified date (which also regenerates the public view of the page).
Example usage:
python run_eprints_updates.py update_date authors -recid 83420 -user tmorrell -password
Runs reports on Caltech Library repositories. Current reports:
doi_report: Records (optionally filtered by year) and their DOIs.
thesis_report: Matches Eprints tsv export for CaltechTHESIS
thesis_metadata: Matches Eprints metadata tsv export for CaltechTHESIS
creator_report: Finds records where an Eprints Creator ID has an ORCID but it is not included on all records. Also lists cases where an author has two ORCIDS.
creator_search: Export a google sheet with the author lists of all publications associated with an author id. Requires -creator argument
people_search: Search across the CaltechPEOPLE collection by division
file_report: Records that have potential problems with the attached files
status_report: Reports on any records with an incorrect status in feeds
record_number_report: Reports on records where the record number and resolver URL don't match
alt_url_report: Reports on records with discontinure alt_url field
license_report: Report out the license types in CaltechDATA
Type something like python run_coda_report.py doi_report thesis report.tsv -year 1977-1978
Some reports include a -year option to return just the records from a specific year (1977) or a range (1977-1978)
Some reports include a -group option to return just the records with a specific group name. Surround long names with quotes (e.g. "Keck Institute for Space Studies")
Some reports include a -item option to return just records with a specific item type. Supported types include:
There are some additional technical arguments if you want to change the default behavior.
Adding -source eprints
will pull report data from Eprints instead of feeds. This is
very slow. You may need to add -username and -password to provide login
credentials
Adding -sample XXX
allows you to select a number of randomly selected records. This makes it
more reasonable to pull data directly from Eprints.
You can combine multiple options to build more complex queries, such as this request for reports from a group:
python run_coda_report.py doi_report authors keck_tech_reports.csv -group "Keck Institute for Space Studies" -item technical_report project_report discussion_paper
python run_coda_report.py people_search people chem.csv -search "Chemistry and Chemical Engineering Division"
FAQs
Automated Metadata Service: Manage metadata from different sources.
We found that ames demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.