The AEOLIAN project, or Artificial intelligence for cultural organizations, is a collaboration among digital humanities researchers in the United States and the United Kingdom. In the US, the effort is being led at the University of Illinois by the HathiTrust Research Center (HTRC, https://www.hathitrust.org/htrc) jointly with the Luddy School of Informatics at Indiana University Bloomington. On the UK side, the AEOLIAN project is being led by Loughborough University, with major partners at Durham University and the University of Glasgow. The project has received funding both stateside from NEH, as well as the Arts and Humanities Research Council in the UK.
Part of the output of the grant will be a series of workshops on important AI topics. I was fortunate to join in on the first workshop in early July which was excellent, and be sure to check their project page (https://www.aeolian-network.net/) for more upcoming workshops this fall. I recently interviewed AEOLIAN project PI Mr. Glen Worthey (Associate Director for Research Support Services, HathiTrust Research Center, University of Illinois, Urbana-Champaign) via email.
Your grant focuses on how artificial intelligence in cultural organizations can make born-digital and digitized collections more accessible. Can you talk about how this grant will address this work?
Although we generally think of digital collections as automatically more accessible than print ones — accessible remotely and around the clock, easily searchable, etc. — they also have some inherent traits that can effectively make them less accessible than print.
Perhaps the most obvious of these is size: while there are of course many human-scale uses and research questions that can be supported by digital collections, the temptation to make new uses, to ask new research questions, of large-scale collections is strong, and (at least for many researchers) a positive one. Such uses require reading a scale far beyond human capacities of time and mortality, so it’s only natural that we’d want to enlist tools that extend those capacities — and it happens to be the case that the set of tools and approaches
There are many, many ways and means in which artificial intelligence and access are related, some experimental, others well established, and some merely speculative. Our contention is that all AI methods and uses, particularly in the cultural heritage sphere, demand special thought, focused community discussion, shared best practices, critique, and caution. Rather than focusing purely on the development of AI methods themselves, the main purpose of our grant is to create a sort of social infrastructure to ensure that we get these things right: a network of people working with AI in the cultural heritage who can think through, discuss, critique, and support one another in their work.
The activities funded as part of our AEOLIAN grant are all intended precisely to establish this network of people engaged in the use of “Artificial Intelligence for Cultural Organizations” (which is what “AEOLIAN” is meant to stand for). We will establish this network by gather together in workshops to share our AI-enabled work and our critiques of it; to publish case studies of that work; and to draw some appropriate collective, community conclusions about it.
Can you talk a bit around the particular challenges around privacy in AI technologies?
The most obvious challenges are amply described (if never sufficiently addressed) in the popular press: advanced facial recognition with obvious surveillance potential; mining of personal data for unwelcome commercial or nefarious purposes; unsupervised algorithms used uncritically in ways that can negatively impact human lives; and so forth. In the cultural heritage sector, many of these same concerns abide (though often in different or more hidden modalities), and we should of course be mindful of them.
At the same time, our project is based in part on the premise that AI technologies, AI-enhanced research methods, when used thoughtfully, can actually help mitigate privacy challenges! One important application in this vein is the machine-enabled study of born-digital archival materials — for example, a lifetime’s worth of emails that are part of an individual’s archival collection of interest to researchers. There is no doubt that collections like these are already essential, and will grow increasingly prevalent, for the study of cultural figures of our times. And yet there’s also no doubt that most email archives, whether they’re knowingly included as part an archival acquisition or unintentionally, indiscriminately swept up as part of an archival subject’s computer, box of diskettes, etc., can’t help but be contain multitudes of sensitive, private bits of data such as personally identifying numbers of various kinds, correspondence intended as private, and so forth. When human readers are given access to such things, it can be an unintended violation of privacy. But if a machine can be trained to identify and flag things like this, it can be used to filter them out and hide them from the human gaze.
It’s not so much that the technology is “neutral” — but rather that we should develop and use our technologies critically and thoughtfully and to ethical ends, while retaining a healthy skepticism about their potential abuses, and loudly protesting their actual ones.
What would your advice be to practitioners looking to balance AI technologies with equitable access?
It seems to me that putting AI technologies and equitable access in the same balance is a bit misleading, an apples-to-oranges sort of a comparison: AI is (or can be) a means to some end, whereas access is one such an end. But I think I know what you’re getting at! Perhaps a better way to frame it is to say that our institutions, as they experiment with and implement AI technologies, should seek not so much to balance their more esoteric uses (which are fascinating, and can provide lots of material for research) with their more equity-focused uses — such as reducing cataloging or archival processing backlogs through machine-learned labeling.
By the way, our first workshop program abounded in examples of this sort of “balance” — as well as in thoughtful critique of where attempts to find this balance might fail. Among many memorable presentations, one of the most vivid examples of this idea was that of John Stack, Digital Director of the Science Museum Group in the UK. John presented a really charming (and rather addictive) game-like interface called “What the Machine Saw” that juxtaposes AI-assigned and curator-assigned labels for photographs of items in Science Museum collections. On the one hand, it may be useful to have an algorithm tag a backlog of uncatalogued images, just to provide some sort of “better than nothing” search access. But on the other hand, the “machine” is hilariously wrong for a shockingly high proportion of these photographs! This was a vivid reminder to all that, no matter how interesting or useful artificial intelligence can be, the most salient thing about it is that is most certainly not human intelligence.
How will the five case studies (Wellcome Trust, Frick Collection, The National Library of Wales and History of Parliament, HathiTrust, The National Library of Scotland) be used in this project?
The specific content and subject of these case studies (that is, whatever is more specific than “AI and cultural heritage”) will be largely up to the discretion of their authors, all partners in our growing network. Our hope is that, as the authors draft them, and as the community discusses and digests them, we’ll be able to identify common themes to include in the projects final white paper. These may include, as I mentioned above, values and cautions commonly held by members of the cultural heritage community; best practices that have developed in our collective labors; important critical questions that are shared among the case studies; and other ideas and challenges that we hope to identify in the course of our work.
Are their specific types of collections identified at these institutions that the grant team will be most interested in researching?
We’re fairly agnostic as to the collection types our network participants will be discussing, and the uses they make of AI. There are archival collections in the form of digital or digitized photographs in need of tagging (as described above from the UK Science Museum Group); massive digital library collections that are largely intelligible only through machine-enhanced methods (such as those presented by my own organization, HathiTrust); AI-enhanced public and visitor-facing services (such as are being explored by the Frick Collection); and many others that will arise in the course of the six international workshops we have planned.
For the final report, what kind of output do you expect will be included?
As described earlier, the final report will depend strongly on what our case studies, our workshops, and the conversations surrounding them happen to reveal. We don’t have a predetermined set of outcomes or set agenda, aside from our shared philosophy of exploration, access, critique, and a strong desire to ensure that our uses of AI are ethical and true both to the missions of our various cultural heritage organizations and to the norms of humanities scholarship.
Would you have any suggestions for best practices to recommend for practitioners looking to embark on AI integration into their work?
Come to our workshops, and stay tuned for our case studies and final white paper as they’re published over the next 18 months!
Although many of our institutions may seem very “advanced” in their use of AI to those who haven’t yet embarked on it, “advanced” is a concept that I try always to use cautiously, and often even ironically: I don’t believe in a sort of teleological end-goal or prescribed “path forward” for AI that all cultural heritages should strive to follow.
That also means that nobody is actually “behind” in any of these activities! Embark on “AI integration” if you will — if the topic and the methods themselves are of interest, or if you truly believe that your collections and their users can benefit, or even if you just want to experiment to find out whether they can benefit! But do it thoughtfully, critically, and in collaboration with the growing AI and cultural heritage community of which our AEOLIAN Network is a part.
I’m also interested in how this project came about, and if there are any related projects or research you would like to include here.
Our PI in United Kingdom, Dr. Lise Jaillant of Loughborough University, leads a similarly-themed project in the UK and Republic of Ireland called AURA: Archives in the UK and Republic of Ireland & AI. The AEOLIAN Network is very much a trans-Atlantic expansion of the AURA Network, and a continuation of its work and mission.
Special thank you to Mr. Worthey for his time participating in the email interview, and I am quite excited to participate in the remaining workshops.
Virginia Dressler is the Digital Projects Librarian at Kent State University. Her specialty areas are project management and digitization, working primarily with the university’s unique collections. She holds a Master’s of Library and Information Science from Kent State University (2007), a Master’s of the Arts in Art Gallery and Museum Studies from the University of Leeds (2003) and a certificate in advanced librarianship (digital libraries) from Kent State University (2014). Her research areas include privacy in digital collections and the Right to be Forgotten. She is author of Framing Privacy in Digital Collections with Ethical Decision Making (Morgan & Claypool, 2018).