PASIG 2016 talk
This serves as a placeholder for the talk I will be giving at PASIG later this week. The talk will be posted after I deliver it, since I have a tendency to revise all the way up to walking to the lectern.
This serves as a placeholder for the talk I will be giving at PASIG later this week. The talk will be posted after I deliver it, since I have a tendency to revise all the way up to walking to the lectern.
I’ve been meaning to put the text of this up since I spoke at Personal Digital Archiving back in the spring on the ethics of web archiving. A 5-minute talk was probably not the best vehicle to knit together both the ethics of web archiving and the huge-o topic of the right to be forgotten, but I gave it my best shot. You can see me blast through it somewhere in one of the Friday lightning talk videos. There is also an annotated bibliography I released shortly before the talk.
Side note: The work Ed Summers and Bergis Jules have done on web archiving with Ferguson and the terrorism in Charleston has made me massively re-assess my thoughts on the balance issue when it comes to work in the public interest and elevating voices too often missing from the archival landscape. With that in mind, I still think as a profession we need an ethical framework for determining what’s okay for us to accession into our repositories when we are working with materials for which we have no donor agreement.
The text below is what I prepared for my talk in New York, though I tend to ad-lib quite a bit once I’m at the lectern.
###
People often say, “Don’t post anything online you wouldn’t want to be seen or shared by the whole world,” which I suspect is an easy thing to say if your personal online content has never been used in ways you didn’t anticipate. Last year, news spread that a team of university researchers at the University of Southern California were studying the phenomenon of black Twitter. Many users initial reaction to the research study likened it to forms of historical surveillance activity against black Americans, and questioned the research ethics since there was no informed consent from those being studied {Kim, 2014, Kim, 2014, Newitz, 2014}.
The idea that if self-published personal content is publicly findable on the web, it’s fair game for journalistic, academic, or archival re-use is so common that few question it or consider the downstream effects. This deeply concerns me, particularly with large-scale archiving of personal content when we have not worked to secure permission from individual users. I realize this provocative position flies in the face of how archivists must race to save the ephemeral digital record before it’s lost.
I am not advocating to stop archiving others’ self-published personal content. Indeed, as many have pointed out, harvesting and archiving online content before it disappears is critical to preserving the voices that are often missing from traditional archival custody. However, I am asking us archivists to consider how we balance openness and privacy from the point of accession to access. For example, consider if we archive and publish content around a political disruption that has long-term ramifications. How should we respond if authorities subpoena the archives? Would our response be different if we learn that the subpoena came after activists removed their content offline, fearing for their safety? What are our responsibilities to either party?
While US courts have generally assumed that if you put something on the web, you’ve surrendered your right to privacy, user’s online privacy expectations are dramatically different from the court’s usual treatment of the public/private dichotomy.
Few users who engage in social media or other forms of online self-publishing view their output as fitting a definition of public consistent with the courts’ interpretation. Researchers have shown that users have different degrees of privacy expectations depending on their intended audience for disclosure {McNealy, 2011-2012}, and that they often rely on obscurity to stand in for privacy {Hartzog and Stutzman, 2013}.
If you sit outside and have a conversation with your friend about your horrible sister, it’s understood that the context, not the setting, means that it’s a private conversation. Saying public things online are fair use misunderstands that our understandings of privacy are not easily muted in an online environment.
Many research communities are beginning to formulate ethical best practices when working with self-published online content. It’s time for archivists to work through ensuring privacy in an environment where large-scale archiving of online user-content does not include a donor agreement, and where archivists don’t always seek user consent. We need to have this conversation now, because if the right to be forgotten gains traction, the legal landscape may force our hand before archival ethics have caught up.
The right to be forgotten is an idea gaining ground and is intended to give users the right to request removal of their content in many situations. The right to be forgotten was recently tested in court, when a Spanish citizen unsuccessfully attempted to get a newspaper to remove digitized back issues documenting his previously foreclosed home. The man felt that the Google search results linking to the digitized back issue arguably damaged his reputation, despite having cleared his debts. The European Court of Justice ultimately ruled against Google, requiring it to remove search result links to the Spanish newspaper story.
Even in the current EU proposal, a significant revision recently watered down the right to be forgotten to a right to erasure, and explicitly allows archives to process personal data in the public interest with a recommendation for further work on issues of archival confidentiality {European Parliament, 2014}.
Archivists are very familiar with how records over history have been abused to hurt those not cognizant of how their public statements could be captured and used out of context against them. It is imperative upon us to ensure that however we archive other people’s online public lives, we do it in a way that protects their right to privacy.
European Parliament. “European Parliament Legislative Resolution of 12 March 2014 on the Proposal for a Regulation of the European Parliament and of the Council on the Protection of Individuals with Regard to the Processing of Personal Data and on the Free Movement of Such Data (General Data Protection Regulation),” October 14, 2014. http://www.europarl.europa.eu/sides/getDoc.do?type=TA&reference=P7-TA-2014-0212&language=EN.
Hartzog, Woodrow, and Frederic Stutzman. “The Case for Online Obscurity.” California Law Review 101 (2013): 1.
Kim, Dorothy. “Social Media and Academic Surveillance: The Ethics of Digital Bodies.” Model View Culture, October 7, 2014. https://modelviewculture.com/pieces/social-media-and-academic-surveillance-the-ethics-of-digital-bodies.
Kim, Dorothy. “The Rules of Twitter.” Hybrid Pedagogy, December 4, 2014. http://www.hybridpedagogy.com/journal/rules-twitter/.
McNealy, Jasmine. “Privacy Implications of Digital Preservation: Social Media Archives and the Social Networks Theory of Privacy, The.” Elon Law Review 3 (2012 2011): 133.
Newitz, Annalee. “What Happens When Scientists Study ‘Black Twitter’?.” io9. Accessed April 21, 2015. http://io9.com/what-happens-when-scientists-study-black-twitter-1630540515.
In early August, I attended the Humanities Intensive Learning and Teaching institute (HILT) at the University of Maryland. I attended the Digital Forensics course, and kept a daily trip report during my week. It took me a while to clean it up, and while there’s still some informal language (and possible tense-switches!), I didn’t want to procrastinate any longer on getting the report up.
DAY 1:
Our instructors: Kam Woods (UNC-Chapel Hill) and Porter Olson (UMD PhD student, Community Lead for BitCurator).
Our group: seven archivists/librarians/students
The first part of our workshop was based loosely on the 2-day SAA workshop on Digital Forensics. We did lecture content for the first few days, and then moving on to doing hands-on digital forensics work with the disks we brought later in the week.
We started by reviewing the general concepts behind digital forensics, and how they apply to archival workflows. Digital forensics originated in the law enforcement community as a way to obtain legally admissible digital evidence. The methods have been adopted by the digital archives community in order to establish the authenticity of a record and to demonstrate what interactions took place with a record over time. Digital forensics techniques are used to capture a larger package of files (e.g. a disk image) than what can be see with “the naked eye” through the GUI. Capturing a disk image ensures that digital archivists have access to metadata, file structures, and hidden files critical to preserving the archival qualities of electronic records.
The second half of Day 1 dug into the many technical challenges associated with digital forensics. This was primarily about understanding how data is stored on disks. This took us on a whirlwind tour of thinking about different levels of digital representation (data as part of a group of digital objects, data as a single digital object, data in a GUI, data through the file system, data’s physical manifestation). As our instructors pointed out, even electronic records have some form of physical representation because of the method in which the data is recorded (e.g., pits on a CD).
Kam then gave us a long lesson on counting in binary. If you’ve only ever been used to counting in base 10 (e.g., each “place” in a number represents a power of 10 — 152 is 1 one hundred, 5 tens, and 2 ones), learning to count in binary feels like a real brain teaser at first. Data can be compressed by identifying the duplicative parts of the bitstream and making substitutions for their representation.
At the end of the day, we had a lecture by Tara McPherson who spoke about some of the digital humanities projects on her radar. It was during this talk where it really struck me how much the digital humanities and librarian/archivist communities need to talk to each other. McPherson talked about her regret that they did not talk to librarians sooner when starting Vectors — and admonished the audience to work with their librarians more. This caused some discussion on Twitter — I guess I am still a bit shocked that someone starting a big project involving questions of open access and data management would not think to consult librarians. This says a lot about the gap between what we know we can provide and others’ willingness (or even knowledge of!) to use our services.
DAY 2:
Day 2 of our workshop we finished off the lecture content. For a long time I’ve understood that taking disk images is a best practice for working with digital archives, but I’m not sure I could really articulate why until today. A disk image involves making an exact replica of the entire bitstream of a disk. Not having a computer science background, I never really thought about the ways in which data is stored on disks, much less what happens when you delete data.
I understand now that when data is deleted, it isn’t wiped (typically) clean automatically. When you “delete” something, that space on the disk is reallocated to be written over in the future, and the data sits there until its written over. This is probably CS101 for many, but was a big revelation for me. In addition to diving into the deep end on file system architecture and file allocation we talked about the nature of files themselves — for example, I did not know that names of files are not inherently part of the file itself, but essentially a directory entry. This crash course in computer science was something that I don’t think many archivists and librarians are exposed to on a regular basis, as was clear from our group’s discussion on the way to lunch, when we kept trying to remember how we would explain slack space to someone else.
When you begin learning about what’s really on a disk — whether it’s a USB thumb drive or a laptop’s hard drive — you quickly learn that what is seen through the GUI is only a small portion of what’s actually there. This is why digital archivists have widely embraced making disk images — much of the data and files needed to prove the authenticity of files over time, and to support the metadata and preservation needs of material — is simply not visible from the GUI. A disk image allows capture of that information that is normally running under the hood.
A point that our instructors made over and over was that while taking a complete disk image might not be necessary for all projects all the time, it is the best way to capture all the essential information of a potentially archival nature one might want now or in the future from digital materials. In addition, using digital forensics methods allows interaction with material without inadvertently writing over it. We heard many examples of how access/modification dates, file names, and file content can be significantly altered when directly interacting with materials. When using digital forensics methods, one would use some kind of write-blocker — hardware and/or software. A write-blocker is a physical device or software that allows the archivist to reads the source material, but not to write over it.
The best analogy our instructor shared was this — simply copying and pasting files off a drive without a disk image would be like accepting a box of photocopies of manuscripts instead of getting the original documents. This means that when we do disk imaging, we are not making “copies” per se, we are getting the originals — not some kind of partial material. We were encouraged to image materials first, and do analysis later. This really flips the archival process of appraisal and accessioning on its head — appraisal is usually done prior to accessioning in traditional archival workflows. In a model of digital forensics, the archivist makes an early appraisal decision when deciding which disks are worthy of imaging, imaging (i.e. accessioning) the data, and then doing additional appraisal post-accession to decide how to handle various files comprising the disk image.
We ended the day by taking an early journey with BitCurator, and learning about forensic disk image file formats (AFF is being phased out, E01 is the most commonly-used, and while it remains proprietary, it has been reverse-engineered), and comparing checksums. We made some disk images, and learned how to compare checksums.
HILT Ignite was the final event of the day. It was similar to a set of lightning talks. Here were some of the presentations:
French pamphlets translations and digitization at UMD College Park, funded by http://mith.umd.edu/research/project/digital-humanities-incubator/
@keenera of Northwestern on Digital Apparatus for Renaissance texts
Nabil Kashyap of Swarthmore on translation of Russian texts and creating middleware to visualize translation activities
Arden Kirkland (@ardeninred) of Vassar on various projects — she started with Vassar costume collection — she’s building CostumeCore for a metadata profile for historic clothing http://www.ardenkirkland.com/costumecore/
George Williams @georgeonline on Accessibility in digital environments — will be having a series of workshops, the next two are in Nebraska and Atlanta http://www.accessiblefuture.org/
Jim McGrath on the Our Marathon Boston marathon “crowdsourced archive” — will eventually become part of Northeastern’s SpecColl. Uses Neatline, add-on tool for Omeka http://marathon.neu.edu/bca This looks super duper awesome too — http://www.northeastern.edu/nulab/
Chip Oscarson from BYU — ecological networks and linkages — “topic modeling” — this seems to be a form of text cluster analysis (to what degree are words and terminology showing up in text?)
Priscilla Pena Ovalle of University of Oregon — idea for a pedagogical tool on hair — how does the appearance of hair influence his/her agency in media depictions? SO AWESOME! This was in the Idea stage, she is considering phone or website app
Raffaele Viglianti from UMD “Performing the Digital Edition” — performing a digital edition of a music score — scores that can listen to you to figure out where to turn the page automatically. Music Encoding Initiative — like TEI. Plotting breath marks over a digital score.
DAY 3:
Today we worked with bulk extractor to get a glimpse behind what was going on with our disk images. By doing this, we were able to see what sorts of URLs, emails, PII, or other sensitive information might be in our files. If one were to make a large disk image, these things turn up with surprising frequency. Knowing where on the disk this information resides allows archivists to make redaction or embargo decisions regarding content or files that might otherwise be made public.
This probably says a lot about my priorities while traveling away from home, but one of the highlights of the week was eating at the Maryland Food Cooperative — aka the cooperatively owned sandwich shop in the UMD student center. It was everything I hoped it would be — funky and delicious. Yum. Definitely check them out if you’re ever in College Park.
During Wednesday afternoon, we attended a number of field trips to check out local cultural heritage organizations. I went to the Holocaust Museum — during Q&A with the curators, I was able to ask a question that has often been on my mind at various points during my career, which is how cultural heritage professionals deal with intensely disturbing and traumatic materials encountered in their work. At my last institution, I often handled plantation records that had evidence of profoundly violent things done to enslaved people and their families, as well as vivid descriptions of scenes during the Civil War. More than once while working with these materials I had horrifying nightmares. I’ve always wondered how others handle these issues, and I’m very grateful that the USHMM staff shared their thoughts about this with me.
DAY 4:
This was the day we really got in the weeds and put all our legacy media we brought with us to work. We started off with thinking about a very common question digital archivists might encounter— if we get a big stack of floppy disks, where do we start? Floppy disks have a variety of formats, sizes, encodings, operating systems, etc. There is no single source that can tell you exactly what you have in hand, so it’s important to look for whatever clues are available on the disk itself. Wikipedia has an extensive list of disk formats, which is critical information when making disk images. Some of the forensics tools require the user to instruct it how to read the disk, meaning you must have the disk information, including capacity, number of tracks, density, and so on.
Much of this day we spent in the UMD MITH lab, in the basement of Hornbake Library. MITH is a great space, and our group set up at several computers containing all manners of drives. The middle of this article on building a forensics workstation has a good picture of the set-up. I brought a massive bag of legacy media, and a partner and I tried imaging the following items: a 5.25” floppy with a finding aid, several 3.5” floppies, an optical disk (i.e., CD-R, which are deceptively easy to image, though I didn’t appreciate how susceptible they are to damage until I tried to image one. Thousands of sectors were identified as damaged — though it was still readable), and a USB drive. Where necessary, we used write blockers to prevent accidentally writing over the data. It is still pretty easy to image most 3.5” floppy disks (new and cheap 3.5” floppy to USB drives are available online), but 5.25” drives are no longer made. This means the archivist must buy a used one, and many libraries use a device called FC5025, which is a controller that allows connection of a 5.25” floppy drive to USB. This will likely reveal my age, but I had never handled an 8” floppy until today — though apparently they’re still quite popular for US nuclear capability. Unfortunately finding a way to image these has proven a significant challenge for the profession.
The Special Collections at University of Maryland has a FRED machine that we also visited, though we did not use it. FRED machines are widely used for law enforcement purposes, though an increasing number of cultural heritage institutions are buying them. Although they do many cool things, the machine still requires purchase of external floppy drives, and FRED machines are expensive. Many institutions choose to create their forensics workstations iteratively by starting with a DIY workstation, and adding on components gradually with an eventual purchase of a FRED if the situation warrants it.
After our class, I spoke with Trevor Munoz from MITH about their efforts at beginning a Digital Humanities incubator that targeted librarians for its first rounds of programming. A major topic in general at the conference has been how digital humanities projects factor into RPT criteria. I believe a related concern for librarians is how they acquire new skills, and receive the required support, to be successful in these new areas of digital work.
DAY 5:
On our last day, we reviewed a few of the additional tools in BitCurator. One of the pretty cool ones that can be used in the command line is sdhash , which essentially compares the content of two differently-hashed files to assess the similarity between the file contents.
One of the highlights of this day was our discussion with Matt Kirschenbaum. Kirschenbaum is a UMD faculty member, Associate Director of MITH, and co-PI on BitCurator. Kirschenbaum has done significant work with digital forensics in born-digital archives. We discussed the nature of this changing set of skills, and how access to digital archives may change over time. I really respect the connections Matt has actively cultivated among archivists, and it was a great way to cap off our coursework.
At the final event of the week, all the groups were asked to prepare a brief (5 minute!) show and tell.
Concluding thoughts:
I went into this course not feeling very confident about the hands-on work associated with digital forensics. As a result of the several days we spent together, I feel much more comfortable returning to my institution and doing some early groundwork on recovering material. Of course, this means that I will be assembling a proposal to build a DIY digital forensics workstation. Even though this course exposed me to a lot, it’s clear I still have a lot to learn in this area. We covered so much of the “first steps” work of accessioning materials, but it seems there is still a lot of question about the access aspect of born-digital archives.
Overall, the entire HILT MITH experience was phenomenal. Many thanks to our instructors, Kam Woods and Porter Olsen for a superb job. The folks at HILT put together a hell of a week for us, and next year it sounds like the show will be on the road towards my neck of the woods — look for the next round to be held at IUPUI. This was a wonderful event that had many librarians and archivists in attendance — I hope y’all will consider putting it in your conference and travel budget requests for the upcoming year.
Bibliography
The recommended pre-readings for our course
Readings that were directly or indirectly referenced during the Digital Forensics workshop — some recommended, some I found Googling around on my own
Things I got to through internet rabbit holes that have me thinking about the intersection of DH and archives/libraries
Interesting things I saw on the #hilt2014 Twitter feed
The Professional…
A little bit over a year ago, I celebrated finishing my MLIS by romping around in Berlin for a week over Christmas. At the time, I had no clue that the following year, I would be celebrating Christmas back in my hometown of Cincinnati – and not as a guest at my parent’s house, but as a resident with an address of my own. I had been in New Orleans since 2008, working at Tulane University’s Louisiana Research Collection, and not really sure where I’d ultimately end up after grad school.
Last spring, I decided to go on the job market, primarily targeting the Midwest for a variety of personal and professional reasons. Being on the job market was incredibly difficult in many ways, because there is so, so much that is simply out of your hands. I also had my partner in the picture (who, thankfully, found a great job in the area and in his field shortly after I accepted my current position). A few years ago at Tulane I served on a search committee that received well over 100 applications. I’ve always been grateful for that experience, because it helped me adjust my own expectations when I went on the market. Still, even knowing what’s happening on the other side of the table does not do a lot to temper the anxiety of the unknown.
This autumn, I was very (unbelievably) lucky to be offered a tenure-track faculty position at the University of Cincinnati, my alma mater and where I got my first library experience, as a student worker. My official title is Digital Archivist/Records Manager. I’m continuing the work of the long-running university records management program, and also planning for UC’s electronic records and digital archives workflows. UC is now a Hydra partner, and it is very exciting and humbling to be in the same room with my incredibly smart colleagues during our planning meetings. UC is undergoing some interesting transitions right now, and I’m glad I’m here.
During SAA’s annual meeting, I was approached to run for SAA Nominating Committee. I don’t know if I have hardcore impostor syndrome or what, but I still can’t quite believe I was asked. I’m thrilled to run, and the slate is full of people I deeply respect and whose company I enjoy. SAA has made progress on several fronts, but there’s still work to be done. Hopefully I’ll have a hand in some of it over the next year. More information on the election is over here.
The Personal…
I’ve always been a big enthusiast for fun projects with a semi-distinct finish line (long ago, I took a picture everyday for a year). This year my big fun project was starting All the Bonds, a blog in which I’m reviewing every James Bond movie. I’m almost towards the end of the Roger Moore era. Hopefully if I get back on a consistent schedule, I can get to Skyfall by the summer.
This year’s big project is going to be reading more books. Over the last several years, I’ve tended to only read about 8-10 books for personal pleasure each year. This year I really want to be more deliberate about my reading habits, and read at least 25 books. I’m planning to read only female authors this year, with the exception of book club selections and work-related reading (and the Keith Richards autobiography I started before Christmas).
I’m really happy to be back in Cincinnati. So much of this city has changed in the last 5 years, and I think most of the changes are very good – particularly the explosion of craft breweries. We’re living in East Walnut Hills, which has some pretty awesome stuff going on and some of the best views of the Ohio River. Of course I miss New Orleans (especially with today being Twelfth Night, the official start of Carnival season), and will carry the Crescent City in my heart until the end of my days. Luckily the cost of living in Cincinnati is excellent, so we can sock away more money for our NOLA-vacation funds.
Oh, and it’s also great to be back in a city where it’s only 84 days until this.
Down here in New Orleans we’re a few weeks out from the Council of State Archivists/Society of American Archivists joint annual meeting. I’m coordinating the local host blog and looking forward to the conference, where I’ll be sitting for the Certified Archivist exam and chairing Session 301 about the digital divide within the archival profession.
In addition to preparing for the many archivists visiting the Crescent City, I’m winding down my assignment on the Society of American Archivists’ Communications Task Force. We’ve been looking at every aspect of how SAA communicates with members – I’ve learned a lot about the inner workings of SAA, and hope that our final recommendations will be adopted.
Just as SAA winds down, preparations for another conference begin: I’m serving as Vice-Chair of local arrangements for the next Society of Southwest Archivists meeting, to be held in New Orleans in May 2014.
This past spring I participated in a BBC4 radio documentary on the many attempts to translate A Confederacy of Dunces into a movie (the Louisiana Research Collection, where I currently work, has the papers of author John Kennedy Toole). I had a fantastic time talking to the producers and sharing some of the more colorful letters from the collection.
At the end of 2012, I completed my MLIS through San José State University’s School of Library and Information Science.
At the beginning of 2013, I attended THATCamp AHA. See my guest post on ArchivesNext.
I’m currently serving on the SAA local host committee for the 2013 meeting in New Orleans. I’m coordinating the local host blog.