From the Wayback Machine: Do we need a Systems Analysis of the Digital Humanities?

This post was first published at ideasunderground.com on 24 May, 2009. I’ve reproduced it here partly because that blog no longer exists, partly because it’s a lazy-but-efficient way of offering an idea I’ve been mulling over for some time to a new audience, and partly because I’m (sadly, perhaps) still quite taken with it. It fits well with my belief that scholars – especially in a post Edward Snowden world – need to understand the engineered nature of the tools they use in their work (regardless of whether they want to build digital outputs or not). Achieving a robust level of scholarly self-consciousness in the digital age is a challenge that most people have (I suggest) given up on, in the face of technological advance rather than methodological choice or epistemological orientation. This has huge implications for the integrity of future scholarship, but opens up equally fascinating areas for research and analysis.

I’m working with a team of systems analysts at the moment and it has got me thinking about what kind of ‘business intelligence’ digital humanists have at our disposal. The humanities have developed organically, in a kind of ‘conversational’ or maybe dialectical process that remains opaque and resistant to formal analysis, but I’m wondering if the convergence of our disciplines with technology necessitates more than this. I accept that I’m flirting with nonsense here: technology is just a delivery mechanism and needn’t determine future directions in the humanities, so why waste time analysing it? We should just use it to deliver our products. It’s a fair attitude to adopt if you’re more focussed on what I’ll call ‘traditional products’ (books, journal articles etc), but I think it’s a bit short-sighted for those us interested in pressing the digital humanities towards what they might worthily be.

So what am I suggesting? The idea would be to undertake a formal systems analysis of the engineered sub-structure to our new field, combined with an analysis of the ‘interface’ between traditional humanist outputs and this sub-structure, potential new directions made possible by that sub-structure, and problems imposed by that sub-structure. It would perhaps need to be done at a disciplinary level initially, with those findings combined into a meta-analysis of the entire field when we’d gathered enough information. I am sure, for instance, that an analysis undertaken by an English professor would be quite different to one undertaken by a historian. Sounds like an idea that’s far too heavy to get off the ground, eh.

Ignoring the leaden quality of the idea, though, what do I mean by ’sub-structure’? Commercial systems analysts would probably suggest it’s a woolly idea (as it may well be), but the idea would be to offer digital humanists a basic overview of the underlying ‘wiring’ of the internet, including domain name servers, routers, key data centres and ’suchlike’. The next layer of analysis would perhaps be to describe the governing bodies like ICANN which control (for want of a better word) the flow of information across the internet, and point out their interactions with international law, national governments and, by extension, humanists sitting in their offices producing their digital outputs. Wikipedia offers a decent overview and there will be very good descriptions offered in various other places, but I’m suggesting that humanists need to do it themselves as well, and that a collaborative approach would suit such a project quite well. We all know how descriptive analysis tends to track the interests of the inquirer, and I have a feeling humanists could offer some interesting perspectives.

Part of my interest in this topic stems from a 1960 article by Ernest Nagel in Philosophy and Phenomelogical Research titled ‘Determinism in History’, where he brilliantly explores the parallels between tightly defined scientific articulations of a ’system’, and the way that historians conceive historical ’systems’. The article was written in the heat of the history-as-system debates prompted by Marxist class analyses, which claimed that History was a closed and definable system determined by economic forces. As is well known, the idea that history was determined in this way sparked furious debate and eventually led to an unspoken (if not uniformly accepted) consensus that this couldn’t possibly be the case: it revolted human sensibilities to suggest that our destiny was in the hands of impersonal systemic forces. Nagel’s analysis was brilliant because, although arguing against naive determinism in historical interpretations, he was brave enough to dive right in and argue that, properly conceived, History is indeed systemically determined (just like a chemical reaction, no less) but in an indeterminate way. Building on the very small / very large anomalies observed in the quantum world, Nagel noted that although basically deterministic, historical systems were simply too large (or small in some cases) and complex to identify the deterministic forces at work. The article isn’t particularly well known, but it still stuns me to think that at a time when most mainstream historical theorists were doggedly refusing to acknowledge that History was determined to any degree at all, lest Marxists leverage the gap, Nagel was capable of offering a sophisticated analysis which allowed for what is, after all, a basic truism of all systems – historical or otherwise.

So, to my point: humanists now work in the context of a large, complex system, which determines (a la Nagel) their work and their future to an unknown degree. It’s still very early days in the digital humanities, so it makes sense to engage in a formal systems analysis to start to work out to what degree this system imposes itself on our work. Leaden, I know, but I think potentially fascinating too.

References

Nagel, E., 1960. Determinism in History. Philosophy and Phenomenological Research, XX(3), 291-317.

Requirements for a New Zealand Humanities eResearch Infrastructure

This is the text of a talk given at eResearch 2013, University of Canterbury, New Zealand, July 03, 2013.

I can only offer a very formative overview of this subject here, but I’m keen to at least put it on the radar. As everyone knows, vast amounts of our cultural heritage are either being digitized and put online or being born online, and this has significant implications for the arts and humanities. In particular, it forces us to start increasing our understanding of, and capability with, the engineered technologies that deliver resources to us online. It will always be difficult getting the balance right – we’re never going to be engineers – but we need to start working through the issues. In this talk I’ll give you a quick overview of the international context, try to convey something about what eResearch in the Humanities actually is, describe where we’re at nationally, and suggest some very formative requirements that might help us work out what direction we need to go in.

I find myself complaining about how far behind the rest of the world we are in New Zealand, but sometimes I think I’m being a bit harsh. Humanities Computing has been around for decades, and New Zealand researchers have never really taken a strong interest in it, but the development of humanities eResearch infrastructure is in its infancy everywhere. That said, the United States, United Kingdom, and to some extent Australia, have been building their capability (and their actual infrastructures) for some time now. New Zealand has basically the same component parts that could be used for a Humanities infrastructure as those countries, but the US, UK and Australia have benefited from ongoing strategic conversations that have allowed them to leverage their assets and develop roadmaps far better than we have.

The conversation in the US was given an additional prompt in 2006 with the American Council of Learned Societies’ report “Our Cultural Commonwealth” , which started to identify areas where requirements for the Humanities and Social Sciences differ from the hard sciences and engineering. This was in the context of the development of the HathiTrust Digital Library and mass digitization efforts by Google, Microsoft and the Internet Archive. In 2006 the US NEH established what came to be known as the Office of Digital Humanities, which has funneled significant amounts of funding into both digital humanities and infrastructure and capability-building projects. Australians often suggest they’re behind the US and UK, but you’ve just seen what fantastic work is happening with the HuNI project [the previous talk was by Richard Rothwell, from the Australian Humanities Networked Infrastructure (HuNI) project]. Very few New Zealand humanists are even aware of HuNI, but I suspect they’ll get a wake-up call when they realize what their Australian colleagues have available, especially if high quality research articles start being produced.

What do humanities researchers need in terms of infrastructure? HuNI provides an excellent model for us, and in many ways I’d suggest we should be modeling requirements for our infrastructure from them, but we need to think about the fundamentals for ourselves as well. This is because in broader terms – and this is why cultural change within our humanities research communities will be needed alongside improved infrastructure – we’re talking about systems capable of supporting a fundamental change in the way *some humanities researchers* approach sources.

In brief, that change involves a shift from thinking about their sources (documents, books, images, audio, video etc) as objects, to viewing them as data. This creates a parallel shift from lone scholars engaging with individual objects one by one, to inter-disciplinary teams of researchers analyzing and making connections between data held in a variety of datasets. Not all humanities researchers will work like this, but we need to accept that programmatic access to our humanities datasets will become more and more important for humanities researchers in the coming decades. Upgrading our infrastructure to allow for this, and providing opportunities for researchers to develop the skills needed to undertake programmatic analysis of large datasets, is something we need to start now.

The Office of Digital Humanities in the US, along with the United Kingdom’s Joint Information Systems Committee and funding agencies in Canada and the Netherlands feel the same way, and established an annual ‘Digging into Data’ event to help surface and resolve the issues involved in this shift. The Digging into Data challenge invites teams to choose a large dataset and engage in humanistic research on it, reporting back on their problems and successes. It’s one important way the humanities community can assess what kind of impact this shift towards programmatic analysis is going to have.

In June of last year a report was published detailing preliminary findings based on early Digging into Data projects. Two findings that struck me relate to the data-centric nature of Humanities eResearch: aside from noting the massive opportunities opened up by eResearch, the report authors noted that

The Digging into Data Challenge presents us with a new paradigm: a digital ecology of data, algorithms, metadata, analytical and visualization tools, and new forms of scholarly expression that result from this research.

They also noted that

It is the combination of algorithmic analysis and human curation of data that helps humanists…

Humanities eResearch occupies a fuzzy place in in the broader eResearch landscape. It is both programmatic and subjective; it involves some researchers with solid programming skills and others who are more comfortable working with word clouds; and most importantly, the data is messy. It isn’t only Google Books OCR that’s the problem – most humanities datasets would make a scientist give up after a first look: nineteenth century ships logs, medical records from early twentieth century leper colonies, you name it, it’ll be hard to read, incomplete and possibly inaccurate. We have traditional methods for dealing with issues like that, but the problems are multiplied exponentially when we port the sources into data formats. Curation, services and tools help a lot, but we also need time to identify and work through the issues, and develop appropriate training schemes for academic staff and their graduate students. So where are we at in New Zealand?

Our biggest problem is awareness. New Zealand academic humanists have had little interest in either humanities computing or digital humanities, and because of this simply aren’t aware what opportunities exist, or even what other countries are doing. Because of this our ability to contribute to a national conversation around eResearch infrastructure is low. I suspect most humanists have out-sourced these issues to scientists, social scientists and engineers without understanding just what a big impact decisions made outside their disciplines could have on them. We do have the component parts for an excellent Humanities eResearch infrastructure, though:

  • DigitalNZ has been the envy of Australian colleagues for years now;
  • The National Digital Heritage Archive is ahead of even the British Library in preservation of our digital heritage;
  • We have a range of large-scale digital assets in the National Library’s Timeframes archive, the Appendices to the Journals of the House of Representatives, and Papers Past.
  • A significant government digital archive is about to come online.
  • [It was noted, quite correctly, afterwards that we have a wide variety of other assets as well. The list above merely offers a snapshot of central government assets].

These could all be leveraged, and added to, to great effect if we can come up with a sensible strategy that connects these central government resources with humanities research communities. So, in short order, what we need is:

  • Education: Efforts are underway to get some regular digital humanities training from a Canadian team who are starting to provide global services, but we also need to increase our efforts to develop digital humanities at undergraduate and graduate level in order to offer the next generation of humanists the skills they need to engage with these issues.
  • Capability building: We need to develop more digital projects, and provide guidance and feedback to humanists who would like to start their own. We also need to develop peer review mechanisms for student projects and PBRF (New Zealand’s ‘Performance Based Research Fund’) so we can begin to understand what a ‘quality’ humanities digital output actually looks like.
  • International collaboration will help us develop more quickly. The international DH community is very welcoming, and many opportunities exist to collaborate.
  • We need to start talking to humanities researchers about what they want from eResearch infrastructures. We could make some educated guesses, but as primary stakeholders they need to be consulted.
  • We also need to work with NeSI [the National eScience Infrastructure team who organized eResearch 2013] and central government to turn any wish-list into strategically aligned and well defined requirements that can realistically be implemented.

Above all else, of course, we need funding. And this isn’t funding only for the final infrastructure but for the education and capability building we need to do first. Compared to the final $ amounts required to implement a Humanities eResearch infrastructure this will be paltry, but it’s important we make a start.

Theory, Systems and Vino

I’ve been watching the current Theory Debate via Digital Humanities Now this past week or so with interest but have only just found the time to write down my reaction to it. It’s a topic that has been dear to my heart for some time now. It touches on the question of where the digital humanities stand in relation to the core tradition, and what direction it’s going to take as a practice (I’m not sure I’m keen for it to become a ‘discipline’ in the traditional sense of the term). I’ve often said that if DH is to be taken seriously by the analog humanities it will need to begin to engage with some core humanities practices, develop some kind of theoretical framework(s), identify some core methodologies, and  generally produce some writing that has recognizable intellectual ‘grunt’. We’re developing a new community of practice within a 2000+ year tradition that includes some rather weighty names, after all. We should attain to the development of a sub-tradition that draws on everything at our disposal. And that implies ‘a whole bunch of stuff’. A few years ago, while I was working in London, I actually started a website called d-hist, that aimed to united digital history with the core methodological and theoretical traditions taught in post-graduate History classes. I wanted to write a series of essays along the lines of ‘The Historical Dialectic and Digital History’ – the idea was just to get in the kitchen and start throwing ideas together. Unfortunately I was working in IT rather than academe and it went nowhere. My point here is that I’m by no means against Theory, which is crucial to the long-term viability of digital humanities.

As one of my professors used to say, though, ‘Theory is like wine. It’s best enjoyed when it’s of high quality, and even then only in moderation’. The problem with humanistic theorizing, of course, is that it can have a stultifying effect. I love it, but it’s also a good part of the reason why the humanities are in such a dire position in today’s intellectual world. I won’t open that can of worms any further, except to note that my understanding is that ‘DH Culture’ (such as it is) has been resistant to theory partly for this reason. Many early DHers fled to the practice precisely to avoid overly-theorized colleagues arguing in ever-decreasing circles. We enjoyed being incubated from the culture wars of the eighties and nineties, and it’s possible to argue that if this hadn’t been a guiding assumption in those post-Netscape years we wouldn’t be in as good a position as we are today. The argument would go that DH is fresh, new, and untainted from the (I think necessary, if also painful and destructive) battles of the 1980s and 1990s. If we provide hope to embattled colleagues and can be described as a healthy and rare green shoot for the humanities now, it is at least in part because we’re free of the theoretical baggage that is sinking (or has sunk) other humanities disciplines. We need to tread carefully, and although we should encourage digital humanists to emphasize humanities content over computer science, that doesn’t necessarily mean we should subvert our vital connection with computer science by theorizing for the sake of theorizing.

To perhaps grossly over-simplify things, the current conversation seems to me to hinge around two poles that can be broadly characterized as ‘Theory versus Code’, on the one hand we have an understandable desire to make our practice more recognizably humanistic by wrapping it in the kind of critical analysis that is grist to our mill (which requires Theory as the bedstone), on the other we have a commitment to code because our facility with this is the thing that marks DH as a fundamental and exciting departure from the main tradition. Natalia Cecire has represented the former position very well over the past couple of weeks, and it is one that deserves respect. The other side of the equation is perhaps best represented by Tom Scheinfeldt’s comment that ‘DH arguments are encoded in code‘ and don’t need elaboration. As the paragraph above might suggest, I’ve got a lot of time for this position. DHers who believe we should focus on tools and code take a pragmatic stance that doesn’t proscribe Theory, but implies that ‘the code is the theory’, or at least ‘the code is all we need’. The implication is that the empowering feature of DH is that it puts the means of production into the hands of humanists by developing skill at a code level. Such a position implies that DH is primarily about making and hacking: I suspect there’s a deep-seated concern across the community that if we lose that focus in an attempt to theorize the domain we risk sliding back into the insufferable cultural, professional and intellectual situation that code allowed us to escape. I’m by no means suggesting that every person at the Code end of the spectrum thinks this is the case (there’s nothing in their writing that suggests hostility), but I certainly agree with those who feel the development of Theory is risky for DH. To state things baldly, it threatens a return to everything we stand against: partisan politics, ideological wars, normative rather than inclusive politics. The worst case scenario is that it acts as a Trojan horse, importing all that is wrong with the humanities into all that is good.

My ‘worst case scenario’ could also be described as ‘paranoid anti-theoretical nonsense’, of course. As I suggested at the start of this post, the application of high quality theory and method to the digital humanities is essential if we are to develop as an intellectual community and be taken seriously by our peers. The significant point here for me, is that the way to do this is to scale out our focus on tools and code towards a broader, dare I say it more holistic view, that encompasses governance and infrastructure alongside theory and method, content and philosophy. This is the position I put forward in my piece (currently still in preview) in Digital Humanities Quarterly. In that article I argue that if we’re going to create a robust theoretical and methodological apparatus for DH we’re going to need to look to some IT approaches like enterprise architecture, and adapt them to our purposes. While I certainly wouldn’t describe myself as having anything more than rudimentary coding skills, therefore, I have no doubt that code should remain at the center of our community and our practice: it is the lodestone to which  we’ll need to return to again and again to keep ourselves on track. This doesn’t imply we all need to become computer scientists, though. Patrick Murray-John’s suggestion that DHers should at least be able to intelligently discuss a data model, or (to clumsily paraphrase) understand why developers can’t turn lead into gold, strikes me as a sensible approach to take. We need to understand code enough to use it and hack at it for basic purposes. Some will go further and become experts in a particular language, thereby empowering the community as a whole with deep channels of knowledge, but we’re too open for this to become the sine qua non of acceptance. We need basic coding skills so we can do some things ourselves and can move beyond the veil of new media surfaces to understand how digital products function ‘behind the curtain’ – this much is essential for both practical and critical reasons. We also need to understand the technical and creative constraints code places on designers and developers, and be capable of contributing intelligently and usefully to a software development team, but beyond that I suspect anything goes. As an aside, and not that any DHers have suggested it yet, it’s important to remember that if the IT industry demanded coding proficiency for entrance to the profession developers would have to do their thing without the support of a plethora of project managers, business analysts, technical writers and testers (not to mention managers). These people don’t necessarily code, but they sure understand IT.

I think we need to go even further, though, and avoid the simplistic Code versus Theory bind I outlined (and perhaps erroneously identified, too) above. While code should remain our lodestone, Murray-John’s comment brought home to me that our value to the humanities is broader than this. It lies in the fact that we understand how software and web applications work at not only the code level but also at the systems and infrastructure level.  I don’t think we should leave the digital humanities to either a reliance on //code comments, or to purely humanistic traditions derived from literary or historical practice. We need something more radical and broad-ranging than that: we need theoretical and methodological frameworks (note the plural) that can describe the intellectual, cultural and (this is of vital importance) systemic nature of our craft. Humanistic theories and methods imported from literary and historical analysis will of course be central to this, but those approaches will not stop code breaking, or ensure that future humanistic infrastructures are well integrated and sustainable. Theory, then, will occupy a central position in the digital humanities, and we need more of it rather than less, but we also need to remind ourselves that unlike other disciplines we’re building systems: systems of code, systems of metadata……the list goes on. Unlike other humanities disciplines, our theory needs to be what I will clumsily term ‘systemically functional': it needs to support not only our cultural and intellectual needs, but our technical needs. This is no small task, in fact I suspect it’s a ‘big hairy goal’ that will remain aspirational for a long time in lieu of anyone getting a grasp on what’s really required, but it strikes me that it’s a task worthy of our attention. Right now, it’s probably time for a glass of wine.

Academic AMIs: Ready to Eat Digital Humanities Infrastructure

A few comments (specifically from @jasonaboyd) about infrastructure at the recent Victoria THATCamp sparked an idea, and I’ve thrown together a site called Academic AMIs: Ready to Eat Digital Humanities Infrastructure. The idea is that, while Amazon Web Services might not be suitable for all (or even many) digital humanities projects, and the platform isn’t exactly user friendly to people uncomfortable with the command line, it does offer an extremely scalable cloud infrastructure and a nice way to package up web application stacks for distribution. Hopefully it will offer digital humanists an easy introduction to the Amazon platform, and perhaps an interesting exercise for DH classes. I hope it’s seen as a step in the right direction, even if it isn’t the final destination.

The first Amazon Machine Image (AMI) on the site is a LAMP server running the Open Journal System from the Public Knowledge Project. I’ll package up new ones when time permits, hopefully at the rate of about one a week. Please read the FAQs and About before jumping in. If the interest is there I’d be interested in contributions.

Follow

Get every new post delivered to your Inbox.