The Danger with Data: Common Pitfalls in Archaeological Data Collation

In the second blog instalment for my directed studies topic in maritime archaeology, I have decided to focus on data collation and the issues that I have personally experienced or could foresee for those conducting similar projects. The end result of my directed studies will be a professional field report from two field seasons of maritime archaeological work on Phillip Island, Victoria that will be used by my industry partner, Heritage Victoria. The idea for this blog topic arose out my many frustrations from trying to collate and make ‘usable’ the raw data collected over the course of two field schools over a three year period, 2012 to 2014. I was only involved in the 2014 field work as a supervisor, making that aspect of the data collation a little easier, the hard part is trying to make sense of the work from others during a field school I was not a part of. I imagine that this is a common problem amongst archaeologists; trying to piece together the relevant data collected by others, sometimes years prior. This exercise has not only made it possible for me to identify the common pitfalls for those who find themselves wading through the collective data of other researchers, but also how best to go about collecting and organising data in general in case it is to be used in the future by others.

The Round-Up

Wouldn’t it be great to show up to the archaeology lab and have every single bit of data from your field project sitting on a glowing silver platter as cherubs sprinkle flower petals down upon you while a harp plays in the background…? It is fun to dream, but here in ‘reality’ you have to find this information on your own. You may have a great professor who gives you a large bag of what at first appears to be waste bin material, but which later turns out to be the data recorded from the last field school. I was lucky to have most of the data given to me, although not all of it. I could foresee issues with having to track down people to retrieve field notes or photographs or pieces of information that ‘you know they know’ but that they never recorded. It would be wise to think about what data you have at the beginning, what might be missing, and then revisit it all just before completion to ensure all of the important ‘stuff’ is actually accounted for.

It takes longer than you think:

When I was first given the task to digitise all of the field reporting, drawings, maps, forms and log books from the two field schools I was not under the impression that it was a large task. I thought that I would be able to, in my own words, “knock this out in an afternoon”. I look back now and laugh at my naivety. It took four full days to digitise all of the logbooks and associated paperwork. That was just the physical act of scanning the pieces (my apologies for commandeering the scanner for four days to anyone I inconvenienced). I then had to organise the PDFs into easy to use files on a USB drive for archival purposes, creating a small database of keywords/file names. Once that was complete, I then did it all over again due to a series of unfortunate events (see next section). The point to be made here is to be realistic about how long the ‘housekeeping’ portion of data collation will actually take.

 Back it up, then back it up again:

Probably the largest and most dramatic problem that arose was when I fell asleep while typing in bed. I then proceeded to inadvertently kick my laptop off the bed and destroyed the hard drive, losing, not only my digitised copies of logbooks, drawings and maps that took me four solid days to scan, but also everything I have written in the report (along with my thesis work as well…but that’s a whole different can of worms). Needless to say, a well thought out back up system could have saved me a lot of time (and stress!). Then to make matters worse, my USB drive holding the only copies of some of my written work was stolen while I was at the university library using their computers while mine was getting fixed. The old adage “those who do not learn from their mistakes are doomed to repeat them” is completely at play in this scenario. I now have two USB drives, an internet based file storage system (Dropbox), and an external hard drive that is ONLY for backing up my system. It may also help to print off some hard copies of any major written sections just in case; a hard copy of my thesis work was a sight for sore eyes during what I call the “dark period” of this current semester.

Figure 1. The ‘Back Up Arsenal’. Photo by Chelsea Colwell-Pasch

Handwriting and Size Formats…Sigh

Another frustrating aspect of data collation is the little things that may not seem like a big deal to those who don’t have to deal with them, such as deciphering hand writing, scribbled drawings and trying to scan the ‘un-scannable’! When it comes to hand writing, unfortunately, unless you know the person and can track them down to translate, it usually comes down to best guess. Legibility is an absolute must when it comes to recording and if you take nothing else from this blog please remember to write as clearly and concisely as possible. The data is useless unless you can use it and researchers like me will thank you for it. The same goes for drawings, try to be clear and please, please, please put in a scale! As for size formats, this one is tricky. The only real issue is in trying to digitise the ‘wonky’ sized medium. It is impossible to get a great scan of a site plan that was drawn on an arbitrarily sized piece of hand cut mylar (plastic paper for underwater recording) that is too large for the scanners copying surface. Scanners are machines and machines can’t cope with anything they are not preprogramed for.

Make it your own

In this final section, I offer not an issue but a tip. When you collate the data, do it in a way that makes sense. Organise the files on your computer (and on your USB, external hard drive, and Dropbox) and compile data into easy to use and understand spreadsheets, graphs and diagrams. Look at every piece of data and think about how you would USE it and then present it in the best possible manner to fulfil its use. But beware! Too often archaeologists present data as end results, forgoing the analysis and interpretation processes. Data is only the means by which you make your interpretations and conclusions, they are not an end in themselves but a means to one.


3 responses to “The Danger with Data: Common Pitfalls in Archaeological Data Collation

  1. Robin Coles

    I totally agree Chelsea

  2. Susan Arthure

    Great blog post Chelsea, particularly your tip about rounding up all the data before everybody disappears. And your backup experience is a timely lesson to us all! Susan

  3. Andrew Wilkinson

    Great post Chelsea. Definitely learn to use spreadsheets and databases properly, and use them as you are collecting the data. Makes life so much easier in the long term.