Judging by the twitter feed that came out of last week’s #osc2018 in Berlin, open data was, unsurprisingly, at the heart of much of the talks and discussion, be it the current hot topic of “data on request” or the much anticipated call for the European Data Platform.
What caught my attention was an aside about another kind of data, the less attractive stepchild if you wish: metadata
The inherent value of good metadata bypasses many actors involved in the scholarly publishing life cycle; the trouble is, nobody really wants to deal with it. As the tweet by @najmehshaghaei suggests, some folks think that librarians should take on the responsibility. However, how can librarians create decent metadata, which are crucial to discoverability, if they are unfamiliar with the content and perhaps even the author?
We discussed this in our open science team and opinions varied widely. Here are the main three tenets that emerged so far:
- Authors have to make sure that their work has good metadata. Not only do they know all the pertinent details—from funding instruments, to affiliations of author(s), to keywords and abstracts. There is literally nobody who can provide as much information ABOUT research output as the author. It is also in the authors’ interest to provide this information to ensure the widest possible visibility and discoverability for their publication. The trouble is that many authors are unsure about what metadata really is, how it helps them and what to do to ensure its high quality. It would be interesting to find out how many active researchers at any given institution are aware of the metadata standards in their fields, or ever heard of Crossref.
- If authors do not look after their metadata the task will fall to the libraries. This is what seems to have transpired at #osc2018. What does that entail? Here in Bern all our researchers posit their bibliography (and copies wherever possible) in our repository BORIS. The entered information is used for internal evaluation, so compliance is high, if not enthusiastic. With every entry round, there are problems ranging from simple typos to incorrect or missing information. Unsurprisingly, the manpower needed to fix this metadata consumes considerable resources.
- Let’s face it: most authors won’t provide decent metadata or engage with the issue. Additionally, libraries often lack the resources to curate metadata on top of everything else they have to do. There is one player in the game, however, who is ideally placed to get the job done: the publisher. A publisher is—at least in theory—extremely close to the work at hand, long before it is available online or in print. A publisher deals with the author, ushers the work through its various stages of quality control, and may even be involved in copyright clearance or the hosting of the research data that informs the manuscript. Additionally, a publisher has—again, in theory—a vital interest in decent metadata: the bigger the audience, the more clicks they receive and/or money they rake in. So in contrast to the author, who may not realize or even care about the value of the metadata that anchors her work in the endless sea of scholarly publications, the publisher very much has a clear incentive to get the cleanest possible metadata for their products.
In the end, it will take everyone to chip in. As CrossRef’s Executive Director Ed Pentz said: “Everybody in scholarly communications has a responsibility to improve metadata.” We will have to watch initiatives like http://www.metadata2020.org/ to see how “everybody” can be galvanized to chip in.