Sometimes 2: About Data About Data

Meta/Data, Infra/Structure, and Other THINGS-BESIDE-THEMSELVES

The Rules of Meta/data

That web search for “diane wilson barn” that I screenshotted in the previous segment worked to find that photograph of her in her barn amidst tumbling piles of stuff because of metadata. Because the designers and maintainers of PECE wanted that photograph to be accessible, as qualitative data, to anyone who looked for it, we made PECE so that, as archival infrastructure, it outfitted that image file with metadata that would make it FAIR – findable, accessible, interoperable, and re-usable.

In trying to understand what qualitative data are, and what metadata are, we’re at risk of being misled by language. That previous sentence casts two distinct entities, data and metadata; data are the first noun-object, and then a different noun-object named infrastructure was said “to outfit” (verb) that first noun-object with a second noun-object, metadata. So when we are talking or writing like normal people talk or write, we write that “metadata is data about data:” two different things, and one is (made to be) “about” the other.

It's all well and good to talk normally. You can accomplish a lot, and no one gets confused or frustrated. But because data and metadata are a little bit not normal, their somewhat perverse nature will require a bit of perversity on the part of language as well.

###

The title above writes data and metadata together, divided, as meta/data. In this instance as in so many others, we follow Gayatri Spivak’s advice to “honor the slash”: respect this foreign entrant, awkward and a bit haughty, treasure what it’s there to remind you of.

The slash of meta/data reminds us, first, that “data” is not a self-sufficient entity; data can’t stand alone and needs, absolutely has to have, metadata.  This is not a controversial or debatable claim; not only would no librarian or archivist disagree, no scientist would either. Data about data is essential to data. If you don’t want to go so far as to say that interpretation is essential to data, how about: data without data about data is meaningless. Data without metadata is useless. Data is essentially relational, and that necessary relationality introduces movement, a difference slashing the relational unit: meta/data.

To emphasize the stringency of these matters, we can write them as a formal principle, with three versions:

Version 1: If data, then metadata.

Version 2: Data if and only if metadata

Version 3: Only ever meta/data.

The photograph is data because, unlike the stuff in Diane’s barn, it is available and discoverable, and it is available and discoverable because it has metadata. Hence from the axioms it follows logically that:

From which it follows logically that:          

Availability if and only if (IFF) meta/data.[1]

But by the additional proposition –

Meta/data IFF archive.

–we can then conclude that

Availability IFF archive

Availability of meta/data requires that there be a place prepared for it, a place that someone must have readied and now maintains. (This by the Law of Meta/data Hospitality and its Domestic Care Clause, which I have proven elsewhere.)

Given all of the above, then by the Law of Archive Fever (also proven elsewhere):

Availability IFF archon (power/authority/ruler/State/G-d/SysAdmin).

 ______________________________________________________


[1] I could have written an essay on data and data science entirely around marks like “/” that are (absolutely) fundamental to (post)structuralist language ideologies yet annoy the shit out of anyone operating with a representationalist language ideology (which includes all of us, to some meaningful degree). More importantly but not unrelatedly, data science has difficulty accommodating and working with slashed entities that do not coincide with themselves, concepts under erasure, parenthetical traces of meanings, disseminated and transmuting senses, etc. etc. Tl,dr: computers don’t (yet)  queer very well and “sense,” as renowned logician Charles Dodgson has shown, is essentially queer, or at least curiouser and curiouser.

Infra/structure and Interpretation Are Perverse

The National Bridge Inventory makes important data available concerning the state of these infrastructural structures, findable and accessible as interoperable ASCII files. Normal data, and in this case, normal data is good data. My former student (and Lead Platform Architect of the PECE Platform) Lindsay Poirier requires students in her “Intro to Data Studies” class at Smith College to find that available data, download it, and begin to work with it.  They quickly find that that data is only meaningful as data because the Federal Highway Administration also makes available numerous documents, such as  Revision of Coding Guide, Item 113 - Scour Critical Bridges, detailing the evaluative judgments beneath, after, within, or simply about—some of the possible readings of the meta- of metadata-- the coded values in the data set. Her students must then avail themselves of this necessary supplement, one among numerous other such documents and data sets that together constitute an extensive and elaborate disseminatory structure of metadata, the interpretive apparatus enveloping even the most ostensibly positivist data and data systems. These interpretivist moves are both central and disruptive to positivist structures, a “‘deviation’ from the norm and yet more compatible with positive social goals.”

 

Now you may say that such interpretations of bridge data is not what we really mean by interpretation.  I find such constraints perverse, but whatever; as feminist psychoanalyst Muriel Dimen notes, “Perversion may be defined, after all, as the sex that you like and I don’t.” Real interpretation, some anthropologists will object, is what happens when my self, as an embodied research instrument, encounters people and situations in all their near-ineffable complexity and nuance, residing with them over lengthy periods of time, and then interpreting out of that rich context into a published text likewise shaped by subtle connections to rich literatures and a complex intellectual genealogy.  No metadata can adequately capture that and make it, too, available.

            To which our reply is: maybe so, but why don’t we try? Why not undertake a few experiments, undoubtedly crude at first but surely refineable and extensible, to see how annotations and related metadata (infra)structures can serve to create this kind of scholarly provenance, archiving it and making it available along with other data?

            This is what we are working on and towards.  The real perversity is: we can’t really know if our elaborate, expensive, and often frustrating and hair-pulling efforts will in fact be worth it.  But this is the perverse risk of the experimental style in general, even in its most positivist forms: if it is in fact experimental, the outcome is uncertain.

 

Part of such a shift is a related perversion of availability, where the key concerns and questions pertaining to any and all data are not matters of degree or quantity, but matters of (infra)structure and style: what data is made available, and how is it made available—through what structures and relationalities? In other words: it’s the metadata, stupid.

 

So I am excited about any initiative that makes qualitative data shareable, like the Qualitative Data Repository, and think it a great thing that these are now multiplying. And we are happy to see an increasing (albeit still relatively small) number of cultural anthropologists become data-curious, work and advocate hard for more of that openness, and have designed PECE to support the goal of making as much new ethnographic data available as possible within ethical limits, situationally constituted. The first step is providing that metadata-structured place where an ethnographer can place the interview she just recorded and/or transcribed, the scan of the piece of ephemera he picked up at last week’s clinic, the field sketches they drew in their notebook from the shareholder’s meeting.

(Even this kind of availability involves a lot of really hard work, collective if not collaborative in nature, and is resource intensive – i.e. the expensive work of infrastructuring. )

But such availability alone isn’t enough for a more perverse interpretivist positivism – one harkening back. Here I agree with I much of what Andrew Moravcsik writes about availability -- not only data availability, but analytic availability as well:

Qualitative research’s distinctive epistemology implies that to track the interpretation and analysis, a reader requires more than just access to a source. One must specify where within a descriptive or causal narrative each piece of evidence fits, and which specific textual passage in the source is critical. As historians, legal academics, and interpretivist social scientists insist, an informed reader needs to know not just what a scholar cites, but why. [i]

In these kinds of systems, the fundamental units or structures are not data objects, then, but readings. The act of reading, or figuration, confounds interpretivism and positivism.

____________________________________________________________

[i] Moravcsik, A. (2019). Transparency in Qualitative Research. In P. Atkinson, S. Delamont, A. Cernat, J.W. Sakshaug, & R.A. Williams (Eds.), SAGE Research Methods Foundations. doi: 10.4135/9781526421036863782