Friday 5 October 2012

Bringing the “Book of the Dead” Places to Pelagios

The Book of the Dead is a collection of spells which accompanied the deceased in the realm of the dead. The spells supplied information about the residents, places and incidents in the afterlife which helped the dead person to avert danger and to be accepted among the gods. Altogether, the Book of the Dead - as a text corpus - comprises c. 200 spells. The actual composition of a single book, in contrast, varies, so that each source is unique.


The instances of the Book of the Dead are transmitted on papyri, mummy wrappings, shrouds, coffins etc. In total, there are nearly 3000 objects. Their records and photographs (~ 20.000) have been gathered and worked on within the Book of the Dead Project which started at the University of Bonn in the 1990s. Recently, they have been integrated into a digital archive, in cooperation with the Cologne Center for eHumanities (CCeH).

There are two kinds of place references included in the data records: references to the object's current location (country, place and institution) and to their provenance (place of origin and specific locality). The latter are being integrated into the Pelagios network. 

Alignment to the Pleiades Gazetteer


In terms of granularity, the places of origin have been chosen for the mapping as their level of granularity corresponds roughly to the level of the precisely located places identified in Pleiades (e.g. Athribis or El Kurru).
The first step in the alignment process was to import the Pleiades+ dataset via Oxygen and convert it to XML. This was done because the digital archive is based upon an eXist database and the data thus reside entirely in the XML cosmos. An XSLT script was written to map the Book of the Dead place occurrences to the places in the Pleiades+ dataset. That way, about half of the places could be mapped automatically. The results were checked in consideration of the geographic coordinates given in the archive. There was only one mismap: "Theben" was mapped to the Greek Thebes instead of the Egyptian Thebai.
Most of the remaining places were identified manually by means of their corresponding Greek place names. In the archive, a great deal of the place names are German transliterations of Arabic names. Up to now about 90% of the place names have been identified. 


Annotation and Dataset Metadata Creation


A main component of the digital archive's data model is an external knowledge base simply called "Wissen" (knowledge) bringing together additional information such as geographic coordinates, year dates for periodization, canonical and selective lists for spells, degrees of kinship. Part of this and relevant to the Pelagios alignment is a distinctive list of places of origin, to which the Pleiades IDs have been assigned.


An XQuery module was created to provide single annotations, a dataset dump and a dataset metadata description in RDF/XML. Single annotations and the dataset dump are created on the fly by joining the data record's place occurrences to the Pleiades IDs in the knowlegde base. So far, the annotations are organized as one single dataset as their number is relatively small (currently, there are 1346 annotations).
As a next step the Pelagios tools and widgets will be explored in order to deploy them on the Book of the Dead project website.

Tuesday 2 October 2012

The Portable Antiquities Scheme joins Pelagios

Hacking Pelagios rdf in the ISAW library, June 2012
Earlier in 2012, the excellent Linked Ancient World Data Institute was held in New York at the Institute for the Study of the Ancient World (ISAW). During this symposium, Leif and Elton convinced many participants that they should contribute their data to the Pelagios project, and I was one of them.

I work for a project based at the British Museum called the Portable Antiquities Scheme which encourages members of the public within England and Wales to voluntarily record objects that they discover whilst pursuing their hobbies (such as metal-detecting or gardening). The centrepiece of this projects is a publicly accessible database which has been on-line in various guises for over 13 years and the latest version is now in the position to produce interoperable data much more easily than previously.

Image of the finds.org.uk database
The Portable Antiquities Scheme database

Within the database that I have designed and built (using Zend Framework, jQuery, Solr and Twitter Bootstrap), we now hold records for over 812,000 objects, with a high proportion of these being Roman coin records (175,000+ at the time of writing, some with more than 1 coin per record). Many of these coins have mints attached (over 51,000 are available to all access levels on our database, with a further 30,000 or so held back due to our workflow model.) To align these mints with a Pleiades place identifier was straightforward due to the limited number of places that are involved, with the simple addition of columns to our database. Where possible, these mints have also been assigned identifiers from Nomisma, Geonames and Yahoo!'s WOEID system (although that might be on the way out with the recent BOSS news), however some mints I haven't been able to assign - for instance 'mint moving with Republican issuer' or 'C' mint which has an unknown location.

Once these identifiers were assigned to the database, it allowed easy creation of  RDF for use by the Pelagios project and it also facilitated use of their widgets to enhance our site further. To create the RDF for ingestion by Pelagios, our solr search index dumps XML via a cron job cUrl request, which is transformed by XSLT every Sunday night to our server and uses s3sync to send the dump to Amazon S3 (where we have incremental snapshots). These data grow at the rate of around 100 - 200 coins a week, depending on staff time, knowledge and whether the state of the coin allows one to attribute a mint (around 45% of the time.) The PAS database also has the facility for error reporting and commenting on records, so if you use the attributions provided through Pelagios and find a mistake, do tell us!

At some point in the future, I plan to try and match data extracted from natural language processing (using Yahoo geo tools and OpenCalais) against Pleiades identifiers and attempt to make more annotations available to researchers and Pelagios.

For example, this object WMID-3FE965, the Staffordshire Moorlands patera or trulla (shown below):

Has the following inscription with place names:

This is a list of four forts located at the western end of Hadrian's Wall; Bowness (MAIS), Drumburgh (COGGABATA), Stanwix (UXELODUNUM) and Castlesteads (CAMMOGLANNA). it incorporates the name of an individual, AELIUS DRACO and a further place-name, RIGOREVALI. Which can further be given Pleiades identifiers as such:
  1. Bowness: 89239
  2. Drumburgh: 89151
  3. Stanwix: 967060430
  4. Castlesteads: 89133

Integrating the Pelagios widget and awld.js

Using Pleiades and Nomisma identifers allows the PAS database to enrich records further via the use of rdfa in view scripts and by the incorporation of the Pelagios widget and the ISAW javascript library on a variety of pages. For example, the screenshot below gives a view of a gold aureus of Nero recorded in the North East of England with the Pelagios widget activated:
The pelagios widget embedded on a coin record:  DUR-B4E094 
The javascript library by Nick Rabinowitz and Sebastian Heath also allows for enriched web pages, this page for Nero shows the libary in action:

These emperor pages also pull in various resources from third party websites (such as Adrian Murdoch's excellent talking head video biographies of Roman emperors), data from dbpedia, nomisma, viaf and the site's internal search engine. The same approach is also used, but in a more pared down way for all other issuer periods on our website, for example: Cnut the Great.


Integrating Johan's map tiles

Following on from Johan's posting on the magnificent set of map tiles that he's produced for the Pelagios project (and as seen in use over at the Pleiades site and OCRE), I've now integrated these into our mapping system. I've done it slightly differently to the examples that Johan gave; due to the volume of traffic that we serve up, it wasn't fair to saddle the Pelagios team with extra bandwidth. Therefore, Johan provided zipped downloads of the map tiles and I store these on our server (if you're a low traffic site, feel free to use our tile store):
Imperium map layer, with parish boundary. Zoom level 10.
The map zoom has been set to the level (10 for Great Britain) at which we decided site security was ensured for the discovery points (although Johan has made tiles available to level 11). This complements the other layers we use:

  • Open Street Map
  • terrain 
  • satellite
  • soil map
  • Stamen map watercolor
  • Stamen map toner 
  • NLS historic OS maps
Each find spot is also reverse geocoded for a WOEID and Geonames identifier to be produced, elevation to obtained and subsequently we link to Aaron Straup Cope's excellent woedb for further enhancement of place data.  We also serve up boundaries derived from the Ordnance Survey Opendata BoundaryLine dataset, split from shapefiles and converted to KML by ogr2ogr scripts. The incorporation of this layer allows researchers (over 300 projects currently use our data) to interpret the results that they get from searches on our database against the road network and settlement data much more easily and has already gathered many positive comments from our staff and research colleagues.

By contributing to the Pelagios project, we hope that people will find our resources more easily and that we in turn can promote the efforts of all the fantastic projects that have been involved in this programme. What we've managed to implement from joining the Pelagios project already outweighs the time spent coding the changes to our system. If you run a database or website with ancient world references, you should join too!


Monday 1 October 2012

2-Way Linked Data? It just, you know, works.


Another title for this short post could be "ISAW Papers now in Pelagios," but that's a little dry. And beyond announcing more data in the growing ecosystem, I do want to highlight the "2-way" part of this most recent addition.

 But first, what's ISAW Papers? That's easy. It's the online journal of NYU's Institute for the Study of the Ancient World (ISAW). Following the link will take you to more information.

 Here's another link, this one to the first ISAW Papers annotations, with more to come soon. And just FYI, those are currently all from ISAW Papers 2 by Catharine Lorber and Andrew Meadows so many thanks to them for being part of the fun.

 Next question is, "What do you mean by '2-way'?" In the list of annotations I linked to above, there is one to "Cyprus" that shows the URL:

Note the fragment identifier "#p8". The archival format of an ISAW Papers article is HTML, which makes it easy to assign an identifier to every paragraph. As part of its publication model, ISAW partners with NYU's library to deliver articles, and that relationship is the source of the link you see above. The library runs the 'dlib.nyu.edu' host. If you click above, it will take you directly to the eighth paragraph of ISAW Papers 2.

 But that's still just one-way linked data. Try hovering over the underlined reference to Cyprus. You should see a map in a pop-up, next to which is a link to "Further references at Pelagios". Follow that to the Pelagios page telling you there is a reference to Cyprus in ISAW Papers as well as in other resources.  It's 'two-way' in that you can go back-and-forth, back-and-forth on the basis of the stable identifier for Cyprus as provided by Pleiades. And as many of you may know, clicking through to the Pleiades page will show the link to ISAW Papers. Now we're talking N-way linked data, which is what we really want. As in, "Now we're talkin'! Sweet!!!"

 And just for further context, the pop-up is implemented by the "Ancient World Javascript Library," another ISAW project hoping to deliver usable tools to all who might be interested in them.

 Of course, the "just works" part of the title downplays all the effort by many people to make this seamless. But that's how it should look to users. With such ease-of-use coming into being, it will be cool to see what people do with all these links.