Friday, 7 March 2014

Greeking Out

This week marks a new and exciting milestone in the Pelagios 3 project - the start of work on the ancient Greek geographic tradition. There's more Latin to do of course: our work packages run on a staggered, overlapping 6-month basis, and, while we already have 19 documents in the system (some in both Latin and their modern language translation), future additions will include some major itinerary lists—including the Antonine Itineraries and Ravenna Cosmography—as well as a number of smaller but fascinating geographic sources such as the Haidra mosaic, some more inscribed vessels, and the Piazzale delle Corporazione at Ostia.

But from today we'll start introducing Greek documents into the system. Ancient Greek traditions of knowledge about geography extend far beyond Plato's "frogs around a pond" metaphor for Greek settlements around the Aegean Sea. From Homer's Odyssey, Greek texts push the boundaries of travel, exploration and knowledge, and Odysseus, the man who 'saw the cities of many men and knew their minds', stands as the archetypal explorer for Greeks who settled in places as far off as the Black Sea, Massalia (Marseille) and Libya. Later Greek authors like Hecataeus, HerodotusAristotle, PytheasEratosthenes, Hipparchus, Posidonius, Artemidorus and Ptolemy are largely responsible for the way we conceptualise geography today (indeed, Eratosthenes invents the discipline), and we still use the terms that they came up with—terms such as equator, meridian, parallel, latitude and longitude. At the same time, much Greek geography is almost cosmological in nature—an attempt to understand the form of the earth and its place in the universe.

Remarkably, however, given the number and detail of these ancient witnesses, almost no Greek maps survive, and it is debate whether maps were even a feature of Greek traditions of geographical knowledge. (A map documented in Herodotus's Histories, carried by a certain Aristagoras of Mytilene, becomes the site of contestation and debate, while Herodotus himself 'laughs at' the schematic representations of his contemporaries.) Instead Greek conceptualizations of the world were almost exclusively in a narrative form, from numerous periploi (sailing itineraries) to Strabo, whose Geografica remains central to our understanding of global geography in the transition to Empire. 

Working with Ancient Greek texts will introduce some new challenges for us to tackle. To begin with change in alphabet will take a little getting used to for some of the team! Fortunately recent work by Bruce Robertson, Greg Crane and others on OCRing ancient Greek means that we should be able to include a range of previously inaccessible texts. We can also draw on experience form the Hestia project and a promising new approach developed by Thomas Efer at the University of Leipzig that can identify toponyms in a Greek text by comparing it to a previously marked up English text. We don't yet know what will be the most efficient combination of methodologies but at least we have plenty to choose from.

We have enormously enjoyed working with the Latin texts and will continue doing so, but the possibilities for analysis opened up by annotating documents from these two strongly related yet radically divergent traditions are incredibly exciting.

Jerusalem depicted in the Madaba Mosaic (6th C. AD). Image from Wikimedia Commons.

Tuesday, 25 February 2014

Latin Groove

In our two previous posts we introduced Recogito, a tool we are developing in order to efficiently extract, annotate and verify geographic references in texts. The development of Recogito is still continuing at full steam, and the team (and Leif in particular ;-) is feeding our feature backlog with a steady flow of new ideas & requirements. But despite the fact that there’s still a slight ambience of a busy construction site around Recogito, we have not just been developing. We have also been using it heavily to annotate new documents.

Prior to the start of Pelagios 3, we assembled a list of potential ancient sources to work on in each content work package. The sources we selected are specifically geographical works, i.e. documents where the authors give accounts of their world in their time. For some of the more extensive sources (such as Pliny’s Natural History), we restricted ourselves to only the specifically geographical chapters.

At the moment, we are about halfway through our first content work package, dealing with the Latin tradition (3 months out of 6). It’s therefore a good time to share with you the progress we made so far. The first three documents – the Vicarello Beakers, the Bordeaux Itinerary and Pliny’s Natural History – we already introduced previously. We've since found our groove and the list has grown much longer. Here are some documents we are currently working on:

Fig.1. The Bordeaux Itinerary (Part 1) in Recogito (» View Map)

Pomponius Mela: De Chorographia (around 43 AD)

Pomponius Mela lived during the government of Claudius and presumably died around the year 45 AD. His most famous work, cited by other great geographers such as Pliny the Elder, was De Chorographia. This work was composed of three volumes and was developed during the decade of the 40s. Each of his books is dedicated to an area of the known Roman world. In the first volume, Mela generally describes the world and its regions, the Mediterranean coasts of Africa and the Near East, starting from the Strait of Gibraltar. The second volume describes the coasts from the Near East to Hispania, where he talks about Greece, Italy and Gaul. Finally, the third volume describes the Atlantic territories, Britannia, and all remote territories, such as the German Limes, Arabia and India. » Map in Recogito

Laterculus Veronensis (AD 304-324?)

The Laterculus Veronensis is a listing of the various Roman provinces that existed during the governments of Diocletian and Constantine. Its chronology is therefore located between the years 284 and 337. The work is named due to the origin of the single manuscript that has been preserved in the Library of Verona. This source describes twelve dioceses gathering a total of over 100 provinces. » Map in Recogito

Avenius: Ora Maritima (AD IV)

Rufius Avienus Festus was an Etrurian poet, astronomer and geographer who lived in the 4th Century AD. He wrote several books and poems, the most prominent was Ora Maritima. This work is based on the Greek journey of Eutimenes of Massalia from the sixth century. Avienus used other sources such as the work of the first century BC Greek historian Ephorus. The use of this kind of ancient sources has introduced much confusion, making some places difficult to locate, and resulting in a mix of parts originating from very different times. » Map in Recogito

Rutilius Namatianus: A Voyage Home to Gaul (AD 416)

Rutilius Namatianus was born in southern Gaul, probably at the beginning of V century AD. He was a poet, but his only preserved work is the poem De reditu suo libri duo. It must have been written between 416 and 420 AD, and is composed in elegiac meter. Originally written in two volumes, the poem describes a trip down the coast from Rome to Gaul. Unfortunately, however, many parts (especially from the second volume) are lost, and the extant text stops at the port of Moon. » Map in Recogito

Jordanes: Getica (AD VI)

Jordanes lived during the sixth century AD and was of partially Gothic origin. It is believed that during his public career he was a notary and that he might further have had a religious career, coming to be a Bishop. Jordanes' fame comes from two major works, De regnorum ac Temporum successione, a world history from the creation to the 6th century, and De Origine et Rebu Getarum Gestis, better known as Getica. The latter one we have included in Pelagios 3 (restricting to the chapters with geographic descriptions). It is the only preserved source that explains the origin and characteristics of the Goths. » Map in Recogito

Bede: The ecclesiastical history of our island and nation (AD 703)

Bede, also referred as a Saint Bede, was born in England in the seventh century AD. He was a monk in the kingdom of Northumbria. Bede is known for his work Historia Ecclesiastica gentis Anglorum, completed around the year 731 AD. This work consists of multiple volumes. It begins with the invasion of Caesar in 55 BC and ends with the fifth book, in the time of Bede himself. In Pelagios 3, we only have included the first chapters of this source, which are devoted to a geographical description of the British Isles. » Map in Recogito

Ammianus Marcellinus: Roman History (before 391)

This is a document we are currently starting to work on. Ammianus Marcellinus was a historian in the fourth century AD, probably born in Antioch. After developing his military career, he wrote one of the most famous stories of antiquity. His Res Gestae described the history of Rome from the government of Nerva in 96 to the Valeno’s death in 378. Unfortunately, the first thirteen books were lost, and the remaining eighteen contain missing parts. Only the last books survive, and are dedicated to the events between the years 353 and 378. Like in other cases, we only included those chapters where the geographic aspect was most prominent. » Map in Recogito

In numbers, we have already progressed to a total of 20.164 annotations (as of today), with an overall verification rate of 37.3% (which means we've confirmed more than 7.500 place references so far). But there are more Latin sources on our list which we yet have to address over the next three months. And our Greek content work package is about to start as well. So lots of exciting work ahead of us.

You can follow our progress live at http://pelagios.org/recogito!

- Ada, Pau & Rainer

Tuesday, 21 January 2014

There's Pliny of Room at the Bottom1 - Introducing Recogito Pt. 2

In our last post, we introduced Recogito, a tool we built to verify and correct the results of our automatic text-to-map conversion process. Last time, we've focused primarily on Recogito's map-based interface, in which we clean up the results of geo-resolution - the step that automatically assigns gazetteer IDs to toponyms.

In this post, we want to talk about Recogito's second view: the text annotation interface. And as usual, we'd like to seize the opportunity to introduce our next Early Geospatial Document along with it: the Natural History by Pliny the Elder.

Naturalis Historia

The Natural History (Naturalis Historia) by Pliny the Elder is an encyclopedia published ca. AD 77–79. This amazing work covers the Roman civilization's knowledge about astronomy, geography, zoology, botany, medicine and mineralogy. In total, it consists of 37 books, and builds on more than 400 sources from the Latin and Greek worlds. Books 3, 4, 5 and 6 focus on geography. In these books, Pliny describes the known world from the Atlantic to the Near East, and from the North of Europe to Africa. He records all the peoples and cities known, with all the geographic features prominent in each territory, such as rivers, mountains, gulfs, or islands.

Fig. 1. Pliny Books 3 and 4 - work in progress in Recogito.

Recogito Text Annotation UI

The Natural history is the largest text we have addressed so far. Fig.1 shows our current progress with it. (In numbers, we're through the toponyms of Book 3 by 98%, and have just started Book 4 - now at 5.5%). It also differs from our previous itinerary texts, in the sense that it's prose, and not structured into an almost 'tabular' format. Time to enter our 'reading view' in Recogito: the text annotation interface.

Fig. 2. Recogito text annotation interface.

The text annotation interface (see Fig. 2) is the place where we inspect and correct the results of geo-parsing - the automatic processing step that identifies toponyms in our source texts. Initially, when we start off with a new document, this view shows us our source text, marked up with grey 'highlights' wherever the geoparser thinks it has identified a toponym. We can then remove false matches, annotate toponyms the geoparser has missed, or modify things the geoparser got wrong (e.g. merge multiple identifications into one, turning separate consecutive identifications such as 'Mount' and 'Atlas' into a single toponym 'Mount Atlas').

Going through the source texts is a time-consuming task, and we have made every attempt to make the process as quick and painless as possible. The video above shows how the interface works in practice. Select text in the user interface as you would normally (using click and drag with your mouse, or double click), and confirm the action in the dialog window that pops up. Depending on what you select, the tool will automatically perform the appropriate action: either create a new annotation, delete one, or modify the annotation(s) in the selection. To speed up work even further, there is also an 'advanced' mode that skips the confirmation step.

There is one more thing you can see in Fig. 2: annotations are coloured to indicate their 'sign-off status'. We have already talked about this briefly in our previous post. It's a consequence of our practice to manually check every annotation before releasing it to the wild. Green annotations are those we have verified, and where we have confirmed a valid gazetteer ID). Yellow are the ones we've verified as valid toponyms - but for whatever reason we were yet unable to identify a suitable gazetteer ID for them. Grey are the ones we've either not looked at yet; or they are still 'work in progress' and we just haven't verified their gazetteer mapping.

Combined with the map-based interface you can think of this as creating the two parts of an annotation. The text annotation interface presents us with a reference to a place in a document (the 'target' of the annotation in Open Annotation terminology), while the map interface identifies a place in a gazetteer (the 'body' of the annotation). Although there are two steps to the process, they are fairly quick and easy. Maybe even fun!

1 "There's Plenty of Room at the Bottom" was a lecture given by physicist Richard Feynman in 1959. The talk is considered to be a seminal event in the history of nanotechnology, as it inspired the conceptual beginnings of the field decades later.