Consultation on Guiding Principles on Management of Research Information and Data

For the sake of transparency and to stimulate discussion we share our submitted responses to the Consultation on Guiding Principles on Management of Research Information and Data, held in June 2020 in the Netherlands by the Dutch association of universities (VSNU). Below, please find the responses by:

Jeroen Bosman orcid.org/0000-0001-5796-2727 (@jeroenbosman)
Bianca Kramer orcid.org/0000-0002-5965-6560 (@MsPhelps)
Jeroen Sondervan orcid.org/0000-0002-9866-0239 (@jeroenson)

The responses were drafted independent of each other and were submitted on 20200619.

A detailed annotation of the guiding principles, with comments from all three of us, can be accessed here:
https://tinyurl.com/vsnu-gp-comments

Finally, in the appendix we share information on a discussion session on the topic at the 2020 Open Publishing Fest.


Consultation response Jeroen Bosman

Do the principles offer clear and effective guidance for Dutch research institutions?

  • The principals come too late. They have been spurred by and drafted during the most important negotiation for which they should apply. That means that they have lost a chance to make a difference and perhaps more important that the issues dealt with in the principles are potentially too much oriented towards the type of deals and collaborations of that specific deal. They are molded for that (Elsevier) deal but not applied to that Elsevier deal.
  • Overall the principles are too vague in content and language. Terms like community and knowledge institutions, scholarly capital etc. should be used more consequently and should be better defined.
  • Overall it should be made much more clear who has which role in the process and who is to be held to which principles.
  • It should be made more clear to what type of projects/services, with what type of partners, these guidelines pertain. Currently that scope is not exactly clear.
  • The decision process, roles and timeline for an OKB should be made more transparent. There is reference to an ambition, but it is unclear where that comes from. 
  • Many aspects of the guidelines pertain to scholarly metadata. It is common practice for most publishers and certainly for institutions practicing open science to share data fully openly, often with a CC0 license. For those (meta)data you cannot strive to be in control, because you already set the data free. So the guidelines partially conflict with the institutions’ own open science practices.

Are there any significant aspects missing within the principles?

  • The decision process, roles and timeline for an OKB should be made more transparent. There is reference to an ambition, but it is unclear where that comes from. 
  • Many aspects of lock-in and oligopolistic market behaviour are not addressed by these principles. Especially package deals, UX-compatibility, procedure adaptation and collaboration opportunity still make it difficult to switch to another vendor of combine offerings from vendors. See Figure 1 below with forms of lock-in.
  • Principles currently do not preclude public investment to be commercially appropriated, while they allow creativity and IP created collaboratively to be become fully owned by the commercial partner, without the right for institutions to re-use its own creativity and investment in future collaborative endeavours. The principles should very clearly reserve the rights of publicly generated IP to the institutions, allowing them to share in any way they like, preferably fully open with a very liberal license allowing reuse for any purpose by any party.
  • The principles should make it clear that collaborations to create infrastructure or contracts to buy services should not be an integral part of a publish & read contract for content, but dealt with separately and publicly procured. That is important to make sure there will not be too much pressure to creatively search for options that are barely compliant with the guiding principles just to not jeopardize the deal as a whole.
lock-in

Figure 1 – Six forms of vendor lock-in


Consultation response Bianca Kramer

Do the principles offer clear and effective guidance for Dutch research institutions?

There is unclarity regarding both the scope of the principles and how they are to be implemented:

  • It is unclear whether the principles apply to infrastructure regarding research information (metadata on research outputs) or also to infrastructure supporting the creation of research output (e.g. data analysis and archiving, publication from preregistration to peer review). 
  • More specifically, it is unclear whether the principles apply only to the creation of a ‘Dutch Open Knowledge base’ (or projects that could contribute to that) or also to collaboration on, or procurement of, other research tools/platforms – either by individual institutions and/or consortia of institutions.
  • It is unclear whether the principles apply only to collaboration with commercial parties, or equally to collaborations with non-profit and/or public parties. This is not simply a distinction between ‘buy’ and ‘make’ – as collaborations with external non-profit and public third parties can be considered and invested in similar to agreements with commercial third partners.
  • It is unclear to what extent the main aim of the principles is to ensure open availability of metadata (including provenance) for any party to use and build upon (consistent with open science practices), or conversely, to control access to and use of metadata. If the former, it could be helpful to separate requirements for provenance and openness of metadata from ownership and governance of the infrastructures themselves. 

The above points are fundamental questions that are, in my opinion, not sufficiently answered in the document. Part of this is due to ambiguous use of terminology in the document. More clarity on scope, and specificity in use of terminology, would be helpful. 

Are there any significant aspects missing within the principles?

  • In the principles, focus seems to be on transparency and interoperability at the level of (meta)data: “data in-data out” (including attention to enriched/derivative data, which is good). However, what is lacking is attention for open source, open algorithms, and IP for creation of the infrastructure.
  • It is not sufficiently outlined what the role of public procurement in the selection of third parties will be, and what measures are envisioned to prevent soft vendor lock-in, for instance uncoupling unique content provision from exchangeable service provision.
  • The participation of third parties in infrastructure governance (GP6) should not jeopardize the principle of community-owned governance. In particular, public contributions to infrastructure should not be allowed to be commercially enclosed, but should remain open to the community to use and build upon, also after a contract with a third party has ended.
  • The fact that these principles have been developed as an extension of an initial set of principles agreed upon with Elsevier for collaboration on services for a Dutch Open Knowledge Base remains deeply uncomfortable. It would have been far preferred if the Dutch Research Institutions would have independently drafted and consulted on such principles, and defined ambitions for a project like an Open Knowledge Base, prior to negotiations with any third party.

Consultation response Jeroen Sondervan

Do the principles offer clear and effective guidance for Dutch research institutions?

This set of principles is a good and important start for the urgent discussions on this topic and which we should be addressing continuously within academia for the next few years. But this is only a start. It seems that important aspects (e.g. ownership, interoperability and ‘community’ governance) are being addressed, but what is confusing in the entire structure of the document is how ‘metadata (and research information)’ and ‘research data’ are as it seems being used as similar entities. 

The introduction (heading 2) is focusing on metadata and the importance of this type of data to be open for others to ‘access, reuse, enrich and describe according to existing, open standards, identifiers, ontologies and thesauri’. Principles for research data, which are very much needed and may have similarities with, but also specificities compared to metadata, are not made explicitly. 

However, it’s important to make these principles meaningful for both categories from the start to leave out any ambiguity in following discussions. This must be made much clearer across the entire document. Are these principles applicable to both categories? Are there any exemptions, and why? Will they differ regarding the use of third party infrastructure and services? Are there principles missing, which can only be applied to one of the categories?

Another important issue, which can lead to a tunnel vision in the discussion(s) is the use of the term ‘commercial third parties’. This is too narrow. These principles should be applied to every entity (profit, non-profit, governmental, etc.) academia will be dealing with outside its own premises in regard to developing infrastructure and publishing services. 

The way in which the ‘Open Knowledge Base’ is presented in the document reads as it is already in jugs and jars. A sentence like ‘the Dutch research institutions have the ambition to create themselves an Open Knowledge Base (OKB)’ could be read as if the idea of the OKB was first, and we needed these principles to back it up. 

Is this idea of the OKB consolidated amongst the institutions? Have discussions being held in faculty and amongst universities? This is unclear. I’m not aware of any consolidation of this idea of an OKB other than two public blog posts published recently that address the concept of an  OKB. The discussion should be the other way around. We first work on a common ground and broad acceptance of these principles and then we should start thinking about an OKB and its features.

Are there any significant aspects missing within the principles?

  • It’s important to have clarity on definitions that are being used throughout the document. What do ‘we’ mean with specific terms being used. What do we mean by e.g. ‘community’, ‘ownership’, etc. etc. A list of definitions could help clarification.
  • The word licensing, or better ‘open’ licensing is nowhere to be found. The document would gain strength if this is explicitly stated as part of the principle(s) (e.g. in the ownership and/or interoperability sections). The scholarly metadata should ideally be licensed under a CC0 license in order to be reusable as much as possible. How does this relate to the ownership as stated in GP6 – Community owned governance? This principle seems to focus on research data only. For this type of data other licenses could or should apply. Here you see the evident importance of being absolutely clear about the typologies of ‘data’.    
  • Make more explicit under what conditions the future infrastructure (and/or publishing services) would be operating (e.g. open source, open standards/APIs, data licensed under CC0, etc.etc.). Important to define the bare minimum. 
  • Add ‘control’ to ownership, so it is clear that academia not only owns but also keeps control (under these guiding principles) of the research (information) data. 
  • How to achieve transparency in the entire process should be made more explicit. Something like the ‘transparent agreement system’ (GP3) is too vague or even more so unclear and should be explained. So, not only transparency measures for ‘technical, legal and operational agreements for metadata sharing’ but for the entire governance on these guiding principles. 

Appendix:

OPF

Fig. 2 Open Publishing Fest session

On May 28, we organized and moderated a 1-hour panel discussion on the proposed guiding principles during the Open Publishing Fest, with participants including researchers, non-profit and for-profit tool providers, and proponents of open infrastructure. A video registration of the session is available on YouTube.

Below some of the points that were raised in this discussion:

  • It is important to take a values-driven approach, which can then be translated into what is built and how
  • Focus should be on providing rights, rather than on regulating/restricting collaboration
  • Opportunities and barriers for (smaller) players, and the need for clarity on the criteria for participation
  • Do the principles represent the needs of the research community?
  • More clarity is needed on what is wanted, also in terms of openness
  • Better alignment with existing principles, like the SPARC NA Good Practice Principles for Scholarly Communication Services
  • Tools from external parties can be used, but implementation and control of infrastructure has to remain in an academic-controlled organization