the day the world changed

Tags:

The world can change on you without you noticing.

So, some time after December 10, 2022 9:19pm Eastern the "default" website changed from cdli.ucla.edu to cdli.mpiwg-berlin.mpg.de.

That's the Max Planck Institute for the History of Science in Berlin, Germany.

I emailed the support link to confirm my suspicious that this was a permanent redirect.

A Professor of Assyriology from University of Oxford / Wolfson College replied:

I understand that you and many other users had grown used to the old Framework. However, it could not be sustained any longer and we have migrated all data this Summer to the new Framework. If you explore search settings, advanced settings, and create a profile on the new framework you will be able to mimic very closely the experience of the old CDLI. Please do write if you have trouble finding certain texts or words etc.

So, my little art project world changed just like that.

I was used to the search interface and layout of the old site. I didn't relish updating my web-scraping code to compensate for the changes.

If I wouldn't my silly, little project to continue I had maybe a week to sort it out. That's how many days I had prepared.

Dear reader if you know me you know that I'm lazy and procrastinator. If you didn't before you do now. I did poke around the site some. Honestly, I was annoyed. You need javascript to do some of the more fancy stuff. Thankfully I really don't need to do fancy stuff.

The top level/landing page has a "simple" search text entry. I poked around, kept notes on the url used in the search, and the results pages. I also did lots of "view source" and "inspect element" stuff as one does.

I haven't done much "complex" searching in the past but I can adjust if I have to in the future. That's a "future gizmo" problem, and good luck to them.

I spent some time look at the toggle-able elements of an object page and figuring how to scrape that. I didn't really do much fiddling because I noticed an "export artifact" feature.

I don't recall the old framework having an "export" feature for an object. You can export into a variety of formats based on categories:

  • Metadata/catalogue
    • Flat catalogue
      • CSV
      • TSV
    • Expanded catalogue
      • JSON
    • Linked catalogue
      • TTL
      • JSON-LD
      • RDF/JSON
      • RDF/XML
  • Chemical Data
    • Seal Chemistry
      • ATF
      • JTF
  • Text/annotations
    • Text data
    • Linked annotations
      • TTL
      • JSON-LD
      • RDF/JSON
      • RDF/XML
  • Related publications
    • CSV
    • BibTex

Some of those are familiar to me and some most definitely aren't.

I looked at the JSON option. It also has a nicely consistent URL
https://cdli.mpiwg-berlin.mpg.de/artifacts/ARTIFACT_NUMBER/json

ARTIFACT_NUMBER is the CDLI number with the leading P removed along with any leading zeroes.

Looking at the data returned and it contained a lot of things I generally care about:

  • material
  • genre
  • subgenre, which is now "comments" under the genre key
  • object type, which is now artifact_type

The really best part is that any text that has been recorded for the artifact is in that data. Making it easy to look for English translations.

So, the change of framework was a decently sized disruption of my workflow to look for tablets to post. It even shortened it a bit since I create a file that has most of the information I review before adding a table to my list of tablets to post.

A net win even if it did take me some time to figure things out. I guess it keeps my brain occupied and malleable to change.