I know you've already seen the octopus.  I set it up that way, like the tried and true linguist's trick -- don't think of an elephant! -- for showing the difference between the information content of a locution ("This is an order to forbid your brain from cogitating about a pachyderm.") and its illocutionary force and perlocutionary effect (Elephant!).  I wouldn't have chosen to start this essay there, to the extent this essay can be said to have had a start.  But Nature forced my hand, and I'm grateful for it.  The immediacy of the octopus figure, the way it would have framed, set up, or pre-figured every Nature reader's understanding of the issues and content of the article they were just about to read, is what I most wanted to draw out here: how data, in all its apparently primal, pre-figured glory, is always already (I'm so old fashioned!) figured, framed and lit slantingly from the margins.  Including data about how data is re-shaping scientists: who they are, what they do, and whether we and they think that who they are and what they are doing is good or bad...

I don't think it's just me -- although that's always the ethngraphic risk, isn't it? -- but you can't be a reader from the U.S. or UK, encountering this scientist-octopus at the top of a page in Nature, and not flash on Thomas Nast's 1904 depiction of Standard Oil as an octopus.  Or maybe Joseph Keppler's 1890 image for Puck of the octopus of gambling.  Just Google around, as I did, and you should be able to quickly confirm that, although we know now that octopi are smart and clever and perhaps playfully mischievous creatures, they still signify the slimy, slippery, inescapable threat of something submerged, oceanic, and worst of all, multiple...

So although the scientist-octopus figured here in Nature is smiling and kindly, probably aged in his (my reading) bespectacled overseeing of his busy-ness, and even though all the figures at the tips of his tentacles are also smiling and seem to welcome the intervention of his appendages -- pointing out things on the screen, tapping on a keyboard, waving a magic wand, signing authorizations or publications -- I can't help but feel there's something unseemly going on in this scene.  There's something fishy here...would seem to be the submerged message.

I found my way to this Nature article only secondarily, through a footnote in another scientific article I was reading, describing a new scientometric analysis of genome-wide association studies (GWAS).  I was really interested and somewhat surprised by one of its findings in particular: that the most frequently cited scientist in the last fifteen years of these kinds of genomics studies is Kári Stefánsson of deCODE Genetics.  Kári requires literally an entire book, so all I will say here is: if any scientist deserves to be figured as an octopus, I promise you it is Kári Stefánsson.

But it is precisely because I feel justified in making such a promise about Kári - an award-winning, widely respected and influential scientist, I will also point out, in addition to being an octopus -- that I feel the blanket figuration of the scientist-collaborator as octopus is an injustice. It's an injustice, furthermore, that can be challenged by some of the very data which powers the article's analysis and figuration, once we surface that data from the depths to which it has been relegated... 

That GWAS article, shortly after discussing Kari and other deCODE actors, makes the following statement:

In a recent Nature article describing hyperprolific authors, Uitterlinden provides a candid explanation of his authorship. In addition to making long hours he attributes his success to the richness of the phenotypes and diseases available in the data at his disposal. Regarding his high number of co-authorships, he argues that it is not problematic, but rather reflects the sheer magnitude of the network and effort required to achieve these types of scientific discoveries (Supp Mat)32. (p. 5)

Reference 32 points to the article in Nature (2018) by John Ioannidis et al., "Thousands of scientists publish a paper every five days" - the octopus article. "Candid explanation" was the phrase that really caught my eye -- sounded like qualitative data to me! Plus, I really like the work of John Ioannidis, one of the most prominent and respected meta-analysts of science who, perhaps more than anyone, has drawn attention to the so-called "reproducibility crisis" in at least some scientific fields.  So I went and found that article...

I'm not going to try to engage or analyze this article ("comment," technically) more fully here, but only make a broad argument to get more quickly to issues of the underlying, marginalized, qualitative data.  The authors try to make a point, in their second paragraph, that they are not pre-judging their data or the situation that they portray through it: 

 We must be clear: we have no evidence that these authors are doing anything inappropriate. Some scientists who are members of large consortia could meet the criteria for authorship on a very high volume of papers. Our findings suggest that some fields or research teams have operationalized their own definitions of what authorship means. (p.167)

In other words: No octopi here! Please ignore any illustration looming above to the contrary and anyway we have no control over the graphics department at Nature...

But the opening of the article, the first sentence in the first paragraph preceding the paragraph quoted above, reads: "Authorship is the coin of scholarship — and some researchers are minting a lot. " Ambiguous, but the implication seems to be either that some people have gotten underservedly rich, or that they are counterfeiters. Either way, something "inappropriate" is suggested.

A sense that something about the current publication situation is indeed "inappropriate"--despite the claim that there is "no evidence" of anything inappropriate -- can also be found in how the article has been covered in the wider scientific media, where it made something of a splash; try googling "hyperprolific authors": https://www.google.com/search?q=hyperprolific+authors&rlz=1C5CHFA_enUS757US761&ei=1LA7XKDoBYLR8AP2p5i4Aw&start=10&sa=N&ved=0ahUKEwigosH33OvfAhWCKHwKHfYTBjcQ8tMDCGc&biw=1342&bih=669

Feel free to disagree, of course, but, with confidence, I would characterize the overall media tone as: Octopi at large!  Inappropriateness on the loose!

Finally, then, let's consider some questions of "evidence."  Ioannidis et al. are playing a double game here (and let me stress that in my view, they [and I and anyone else for that matter] cannot not play a double game): on the one hand, they stress they have "no evidence that these authors are doing anything inappropriate." And, truthfully, the only evidence they have that is supposed to actually count as evidence are a set of changing numbers -- quantitative data. They desceribe how careful they've been to make those numbers as appropriate as possible: they excluded high-energy physicists from the list, since their field has its own norms for handling large numbers of authors, and they excluded Chinese and Korean names since they have "disambiguation issues."  (Which is a kinda weird set of delimitations, but whatever.)  That left 265 authors to analyze, and email; 81 wrote back.  Here is how that qualitative data is summarized in the paper, before being relegated to the "supplement":

We e-mailed all 265 authors asking for their insights about how they reached this extremely productive class. The 81 replies are provided in the Supplementary Information. Common themes were: hard work; love of research; mentorship of very many young researchers; leadership of a research team, or even of many teams; extensive collaboration; working on multiple research areas or in core services; availability of suitable extensive resources and data; culmination of a large project; personal values such as generosity and sharing; experiences growing up; and sleeping only a few hours per day. (p. 168)

That Dangerous Supplement

To get only a minimal sense of the kinds of data relegated to the "Supplement" as marginal, trivial, or of uncertain value but so...promising to the anthropologist, begging for interpretation, consider the comments from Rinaldo Bellomo:

My first comment is that some investigators seem impossibly short and some impossibly tallJ

Some investigators seem impossibly non-productive and others impossibly prolific. No mystery for either: it’s the normal distribution curve with people at each tail end.

To people in the middle, each tail end will look improbable. They are right. By definition, they are. Gauss would be proud.

As to why I am in the prolific tail end, I could say that it is because I work 80hrs/week and have done so for 35 years (and my wife, God bless her, lets me), or because I absolutely love research or because I have created networks of collaborators to expand the reach of what we do, or because I really enjoy writing and explaining things. In truth, all of these explanations never even get close to the “core reason” and are fundamentally flawed because of hindsight bias.

How do I feel about being in this tail end? Don’t think about it much. Too busyJ

Is Rinaldo Bellomo kind of...octopussy? Maybe. I could interpret some of the stuff he says there as indicative of a tendency to octopussiness. I'd also say it's funny, insightful, provocative, and worth talking further about with him.

Or UCLA's Matthew Budoff:

I have over 40 people working in my lab, each of them working on different projects. As I have supervisory roles with all of them, I help design the papers, contribute to the manuscript both in preparation and final editing, and take full responsibility for the results. Many of my post-docs are trying to get into residency or fellowship, so they are very motivated to write papers to help with their personal chances of furthering their chances of getting into a US based residency or cardiology fellowship. Further, I have masters students I supervise who are REQUIRED to write papers and do research, so they also publish regularly. Finally, I am the core lab director for cardiac CT for many NIH based trials, and thus have a responsibility to help with data collection, manuscript preparation and the methods section for these NIH based papers, and thus tend to be included when my labs work is used as the basis of the investigation. You should know that those of us in epidemiology have many papers that are derived from our phenotyped work, and thus able to publish on a few lines of investigation, getting multiple papers out of these observations. I am fully funded by the NIH and thus am glad to be considered a hyperprolific author. 

Real octopus potential there, for sure. But I'd take him seriously.  As I would Cyrus Cooper:

The main reason for my "hyper-prolificity" seems to be the large number of collaborators in NCD epidemiology and genetics who use our invaluable cohort resources (questionnaire; physical exam; intensively phenotyped; and DNA/omic samples). I would put myself mainly in the category of UK Biobank or Framingham/Olmsted County as a metaphor, where I would expect quite a high output rate also. I am usually part of a large consortium of authors; often do not lead the paper but contribute/comment; and am asked to join the listings. My government and charitable funders seem to welcome this approach. Clearly, it is hard to see exactly how I can be viewed as equivalent in authorship of these papers to those that I actually lead in research terms, or contribute to as a significant co-investigator. In my highly cited listing (say top 125 in an h-index of 125) there are a large number of these sorts of papers and looking at the highly cited listings recently, there are many authors like myself in there. Finally, I am relatively late career (over 60ys) and that enhances productivity/reputation etc. I certainly see no adverse consequence to being in such a listing if interpreted appropriately (I would say I actively contribute importantly to around a third to 50% of the papers that actually have me named, but sample/data provision is the criterion most used by external collaborators) and I would happily decline authorship if that became the convention. Please feel free to use this response in your paper if it helps, and do send me a blind copy if you get a moment, as it is hard to comment without seeing how pejoratively you view the finding. To me, it is a reflection of successful construction of internationally renowned data resources, and successful leadership of a large multi-disciplinary research unit. If the result is that convention dictates that I should decline authorship often, to the many leads worldwide who approach me for it, I should happily comply. It is the science, rather than the authorship, that actually stimulates me over this last 35 years. 

Or quite possibly my favorite octopus, Enrico Drioli:

Pleased to be in your list.

I am a senior Researcher and a Professor still active and interested to continue to learn and to teach daily. I decided many years ago to work on membrane systems, to understand better membrane phenomena, to develop new membrane operations of interest for solving strategic problems for an advanced industrial Society trying to reproduce what membranes have been and are doing in Nature. I promoted the creation of the first Institute on Membrane Technology by the CNR in Italy, in the late 80ies.A very multidisciplinary and multinational structure where hundreds of students and young researchers have been very active in the last 20-25 years. I promoted and coordinated the first Erasmus Doctorate School on Membrane Engineering sponsored by the European Union, where around 45 students from all around the World have been educated in Membrane Science and Engineering. Today Membrane Science and Membrane Engineering are attracting more and more attention in a large variety of industrial areas, in medicine, in biotecnology, in energy. In desalination, in waste waters treatments and reuse, in fuel cells, in some artificial hybrid organs etc, membrane systems are dominant technologies already. The large number of students and younger colleagues from Italy, China, Korea, Saudi Arabia and more who are interacting and collaborating with me, covering different expertise and topics, but all educated and attracted by the potentalities of Membrane Engineering is at the origin of our productivity. Their enthusiasm, expertise and visions made possible to solve all problems and obstacle present in promoting new ideas and new solutions.

Hopefully more will come!

Ioannidis 2018 "supplemental" information

There are two supplements to the Ioannidis et al 2018 octopus article: this one is titled "Supplementary text" -- i.e., it's not considered "data" and not even "information," as it is called out on the main article page itself ("Supplementary  information").  The other supplement is called "...Read more

I will count again more carefully, but of the 81 respondents, only three were women.  I quote from one of them here her response in its entirety, because...it's righteous:

I am not amused by the content of your email.

Your apology upfront that ‘you have no evidence’ reads like a waiver and suggests that you are convinced that hyperprolific authors are doing inappropriate things. Being an author that always fulfils the authorship requirements, this feels very unfair. I do understand that every hyperprolific author will say the same thing and I admit from previous experience that this is often not true. However, it will be impossible to proof ‘innocence’ to your accusation. In fact, you should check at the individual author level and case-by-case, and such an approach is obviously not in your interest. I invite you to contact all my co-authors of any paper you want in order to check about my contribution.

The sad thing is that I even think (when writing I have not seen your paper yet) that I understand what you are trying to make clear about authorship and I will admit that I probably largely agree. But your generalizations may harm individual authors and cause collateral damage. I find it the more painful to be (an unintended) part of your criticism, since I read many of your papers, use them for educational purposes and often agree with their content.

I want to provide a few explanations for my hyperprolific output:

- I have an academic affiliation and I spend 80% of my time (more than fulltime hours) to clinical research. This is uncommon in many countries but definitely not in the country I live and work in (the Netherlands)

- My field of expertise is methodology of outcome measures. These outcome measures are widely applied, and I am often asked to consult about their application in clinical studies, be involved in the study subsequently, and this is finally leading to co-authorships.

- I have a large network, spanning academia, professional organisations and pharmaceutical companies, leading to many collaborations that result in many co-authorships.

- I have always published many papers: 2010-2016 an average of 55 per year

- As far as I could see I hit your cut-off (n>70 per annum) in 2014 only (n=81). This was even artificial since the Annals of Rheumatic Diseases, a journal I often publish in, had extra pages in 2014 to publish the backlog of ePub-ahead papers from 2013. In 2013 my number of publications was 38 and the average of 59, within my normal range, would not even have been noticed by you.

- I am afraid your criticism pertains to pharmaceutical industries’ ‘key-opinion-leaders’ that piggyback on the work of medical writers from industry and get their publications for free. I can assure you that -in my case- you have made a mistake: 53 of my publications in 2014 were academic papers without pharma influence (own and collaborative research); 12 publications were in collaboration with professional organizations, not being industry (such as guidelines); and 16 publications were in collaboration with pharmaceutical industry.

- I can also assure you that I have never accepted co-authorship solely for my contribution of patients or data to studies or trials; When being co-author it is because of involvement in the design, analysis and interpretation of the study (in addition to the manuscript drafting and final approval).

I understand the definition of hyperprolific author you have used, and I understand the need to set a cut-off. At first sight, ‘one paper every 5 days’ sounds impossible, but you should realize that this is a ‘frame’. Indeed, if I had to write all these papers by myself, it would have been impossible. Many of the papers have been written by people I supervise or by other co-authors. Only the papers in collaboration with industry have been written by medical writers, and this is always mentioned in the paper. If so, I contribute extensively to various drafts (after the phase of design, analysis and interpretation of data).

If it is your intention to criticize scientifically inappropriate authorship, you have missed many that are indeed inappropriate authors (for instance since they have only contributed by including patients in trials) but did not meet your artificial threshold for being hyperprolific (these authors will never publish too many papers as they cannot be part of too many trials).

What you are also unable to judge is how many authorship-offers I refuse, either immediately as I judge the study insufficiently sound or not interesting, or during the process, when I disagree with the content of the paper.

To answer your question on ‘how I feel about belonging to this class’: I can safely tell you that belonging to your ‘exceptional class’ does not give me any satisfaction, and I do not need it for maintaining or improving my academic position.

To this end, for me scientific success is not based on how other people judge my output quantitatively, but rather on its quality and the pleasure I have in my daily work. I do enjoy my work, and I vouch for its integrity. I will try to accept your frames by referring to a Dutch saying: ‘High trees catch a lot of wind’ (such as ‘Big trees fall hard’ or ‘Big ideas make a loud noise when they land’ by Shawn). 

To re-iterate:

1. Matters of propriety should indeed be central to how we as a culture think about and handle authorship, but these are matters requiring complex interpretative analysis in addition to quantitative analysis.  Marginalizing as "supplementary" the very qualitative data that is in fact central to such complex interpretive efforts is, while understandable and to some degree inevitable, nevertheless inappropriate.

2. This case also shows that scientists themselves are willing and even eager to be involved in the elicitation, production, and publication of rich, nuanced, qualitative data pertaining to their data and publication practices, including when such data is personally identifiable.  Was such rich, nuanced qualitative data also overdetermined by egoisms, privilege, and the pretty severe limitations of survey tools in general and this one in particular, to name but a few? Oh yes.  But as the statistician's aphorism goes: all models are bad, some models are useful. I found the limited and rich, structured and nuanced, qualitative data in the "Supplement" to be useful for extending our thinking about authorship, work, credit, data sharing, norm and their routine transgressions, and other such issues. We need better mechanisms for discovering such data as already exists in marginalized, supplementary, or otherwise difficult to access sources, and curating and archiving it for further re-use and re-interpretation.  Which is what we're trying to do with PECE