Consultation response – NPOS2030 Ambition Document

In November-December 2021, the Dutch National Program Open Science (NPOS) set up an open consultation (archived link) to give all Dutch stakeholders the opportunity to provide input on the NPOS2030 Ambition Document, comprised of NPOS’ vision for 2030, the guiding principles underlying this vision, and the proposed program framework and key action lines.

Here we share our submitted response to the consultation. The response was drafted in collaboration and submitted on 2021-12-22.

Jeroen Bosman  (@jeroenbosman)
Bianca Kramer (@MsPhelps)
Jeroen Sondervan  (@jeroenson)


NPOS2030 Ambition document infographic (source)

Part 1: NPOS Guiding Principles

General remarks:

  • The document lacks a sense of urgency. Apart from the Citizen Science aspect it is not sufficiently clear in what ways this ambition document adds to or deviates from the previous NPOS programme. We also miss reflection on the previous programme. Has that been successful in all respects and if not, how does the current ambition address that? Will those acting in this space (esp. on open access and FAIR data) do anything differently because of this ambition? Will they feel inspired or supported by guidance offered in the abition?
  • We are somewhat disappointed by the relatively narrow scope of the ambition and by the lack of concrete proposals to make real steps forward in practising open science. Especially so, because this should be an ambition for the next 8 years. A period in which it is expected to see lots of developments in different aspects regarding Open Science.

    Just a few suggestions of the type of actions and goals that we miss:
    • in open science in education: embed the open science skill set and mindset in all bachelor and master programmes at universities and universities of applied sciences;
    • in open access: move towards 50% diamond article publishing by 2027 and create a national open source and publicly governed ORE type publication platform;
    • also in open access: create a national campaign to deposit all retrospective output of current  affiliated researchers using Taverne;
    • in rewards & recognition: foster a culture in which journal/publisher level evaluation of publications/researchers is no longer desired;
    • in public engagement: create support to help researchers to add plain language summaries to all their publications and deposit those with the articles in repositories and also create infrastructure to leverage those summaries in public engagement
    • also in public engagement: explicitly reward researchers for using 1% of their research time to check and improve wikipedia on their topics
  • Overall the structure of the entire document could be better and we suggest shortening of the document. Parts of chapter 1 (especially section 1.3) could be integrated in the following chapters 2, 3 and 4. As a reader it is confusing to first read quite elaborately about these topics and then see them detailed even more in separate chapters. We suggest to integrate 1.3 into the three following chapters, thus dealing with the vision, mission and action lines in a coherent way in one place for each of the three domains.
  • The structure of Chapter 1 is unbalanced. It begins with the UNESCO Recommendation on Open Science, But on further reading, this definition seems to keep standing on its own. Throughout the document, little or no connections are made with the recommendation and to what extent the NPOS adopts this definition in full. By making this more clear and more importantly justify the choices being made that are guided by the UNESCO Recommendation, it would make the NPOS statement much stronger and ‘embedded’. By doing so, it is probably also much easier to connect principles to vision and action lines. 
  • In line with the previous comment, the ‘Guiding Principles’ as they are currently presented seem to be a selection. Moreover, it is not well motivated why especially this set of principles is chosen. This should be explained/justified more. Now section 1.1.1 works as a perfect hook, but the following section(s) are not using that hook sufficiently.
  • We suggest adding a glossary with definitions of the main concepts, to make the document more accessible to readers that are new to open science discussions. It will also help with the consistent use of definitions throughout the document (e.g. now confusion is created by using all kinds of alternatives to the ‘as open as’ adage). 

Remarks on specific Guiding principles:

  • The concepts of digital and academic sovereignty need much more explanation, especially as they are relatively new concepts and because they play an important role in various parts of the document. Also, it should be made more clear to what extent these are considered a driving force for open science (‘the interest of transparent, inclusive and reliable knowledge creation’) or, conversely, as a barrier to openness.
  • The concept of subsidiarity also needs further clarification. It is implied that it will guide the process of the transition and which stakeholder will take up which role. It invokes the question of who this document is for. Is the document voicing the ambition of all stakeholders in NPOS? Also, the implications of this principle should be made more clear. What type of issues do require more national or centralized action even if that might deviate at some points from when and how all individual stakeholders would have acted (as for instance has been done in the case of national read and publish deals). The report mentions several instances of national initiatives, esp. for a number of platforms, OKB and such, as if decisions have already been taken. How does that fit into the subsidiarity principle?

Part 2: NPOS Vision for 2030

  • We miss a more elaborate and contextualised vision for those aspects in the ambition that are new compared to the previous programme. It should be made clear why aspects like citizen science, digital/academic sovereignty, and the ambition to look at public values and to become less dependent on (commercial) publishers are so important. It should be clear to the reader what would go wrong if these were not addressed in the ambition. That is the case as the document is now. We also miss reference to the latest version of the Guiding Principles on Management for Research Information and its recommendations.
  • We appreciate the aspect of making open science normative, but would suggest to not only leave this as something for open science communities to address. There is a clear need for leadership and influential role models to explicitly state a preference for practicing open science and expecting it from others. These could be deans, prize winners, and others.

Part 3: Programme lines and requirements

  • The justification for the programme lines should be made much more prominent. Further on in the document, it is stated that the three programme lines are not the only developments that are deemed important, but that these are the ones where central coordination is considered crucial. The question why this is crucial is unanswered. This should be made much more prominent as an explanation of the choices made for NPOS2030. Also the reason for disregarding other aspects of open science (Open Education, Research Integrity and Reproducibility of scientific results are mentioned) is not very strong. It is stated that these are either already being taken up or relatively new. In our view both do not justify leaving these aspects out
  • The central role of recognition and rewards in the transition to open science should be mentioned early on in the document, and a justification given as to why it is not in itself considered part of NPOS2030. Currently, it is only explicitly linked to Open Access. Similarly, research integrity gets a mention in relation to the organization of publishing, but in the current ambition document it does not seem to be considered an integral aspect of open science.
  • We support the suggestion for alternative programme lines as proposed by the Open Science Communities Netherlands in their feedback (available at https://docs.google.com/document/d/1B4XGJQQSGwvGy1LzevB6n2nUoHiB2qDz/ ).
  • The programme line Citizen Science seems to take a specific view on the relation between Open Science, Citizen Science and RRI (Figure 3). The emphasis on these as separate developments with only limited overlap encourages compartmentalization, rather than considering open science as an integrated approach towards more relevant, robust and efficient research. 
  • The requirements are not well integrated with the rest of the document. In the introduction to the requirements, it is mentioned that ‘(…) the Programme Lines will address a set of essential requirements needed for this culture change‘ – but this is not reflected in the description of the programme lines themselves.

Part 4: Key lines of action for the programme lines:

We only highlight the most pressing issues here. We won’t go into specific details (issues around consistent use of definitions, terminology and (business) models, etc.) 

Open Access

  • The action line on applying open access to all output is great but there is not a beginning of ideas on how to realize that. Also it is unclear whether the 100% OA goal is also applied to all outputs.
  • The importance of metadata is lacking in the mission and action lines for open access, though it is said to be part of the OA programme line. Explicit attention for metadata is important in relation to OA of all scholarly output, as well as openness of this metadata. It could be part of negotiations with publishers as well as a consideration in creating publishing infrastructure.

FAIR data

  • We regret that there is no action line on increasing the amount of publicly shared research data sets. It is somewhat disappointing that there is no guidance or ambition on what is expected regarding making data open and open-licensed, beyond just making it FAIR. Additionally we regret that the adage ‘as open as possible, as closed as necessary’ has been watered down to ‘Open as early as possible, and closed when necessary’. We welcome that early opening up is seen as important but would advise to maintain the gradual nature of closedness, as in the original adage. In that context the concept of protected sharing introduced in this ambition should be framed as a way to share data that would otherwise remain closed and not as an excuse to not share in a fully open manner in cases where that is perfectly viable.

Citizen science

  • We suggest adding a few lines defining citizen science. Also it would help to make clear how citizen science and public engagement are related, but also how they are different concepts.
  • Currently the Citizen Science sections feels not very well integrated with the other sections. We would welcome a vision on the relation between FAIR and open data, open access and citizen science.

Green OA: publishers and journals allowing zero embargo and CC-BY

Jeroen Bosman and Bianca Kramer, Utrecht University, July 2020
Accompanying spreadsheet: https://tinyurl.com/green-OA-policies

Introduction

We witness increased interest in the role of green open access and how it can contribute to the goals of open science. This interest focuses on immediacy (reducing or eliminating embargoes) and usage rights (through open licenses), as these can contribute to wider and faster dissemination, reuse and collaboration in science and scholarship. 

On July 15 2020, cOAlition S announced their Rights Retention Strategy, providing authors with the right to share the accepted manuscript (AAM) of their research articles with an open license and without embargo, as one of the ways to comply with Plan S requirements. This raises the question to what extent immediate and open licensed self archiving of scholarly publications is currently already possible and practiced. Here we provide the results of some analyses carried out earlier this year, intended to at least partially answer that question. We limit this brief study to journal articles and only looked at CC-BY licenses (not CC0, CC-BY-SA and CC-BY-ND, which can also meet Plan S requirements).

Basically, there are two possible approaches to inventorize journals that currently allow immediate green archiving under a CC-BY license:

  • policy-based – by checking journal- or publisher policies, either directly or through Sherpa Romeo or Share Your Paper from Open Access Button.
  • empirically – by checking evidence for green archiving with 0 embargo and CC-BY license (with potential cross-check against policies to check for validity).

Here we only report on the first approach.

A full overview of journal open access policies and allowances (such as will be provided by the Journal Checker Tool that cOAlition S announced early July 2020) was beyond our scope here. Therefore, we carried out a policy check for a limited set of 36 large publishers to get a view of currently existing options for immediate green archiving with CC-BY license, supplemented with anecdotal data on journals that offer a compliant option. We also briefly discuss the potential and limitations of an empirical approach, and potential publisher motivations behind (not) allowing immediate sharing and sharing under a CC-BY license, respectively.

Our main conclusions are that:

  1. Based on stated policies we found very few (18) journals that currently allow the combination of immediate and CC-BY-licensed self archiving.
  2. Based on stated policies of 36 large publishers, there are currently ~2800 journals with those publishers that allow immediate green, but all disallow or do not explicitly allow CC-BY.

Large publishers – policies

We checked the 36 largest non-full-OA publishers, based on number of 2019 articles according to Scilit (which uses Crossref data), for self archiving policies allowing immediate sharing on (institutional) repositories. Of these 36 publishers, 18 have zero embargo allowances for at least some of their journals for green sharing of AAMs from subscription (incl. hybrid) journals in institutional or disciplinary repositories. Overall that pertains to at least 2785 journals. Elsevier only allows this in the form of updating a preprint shared on ArXiv or RePEc. From these large publishers, those with the most journals allowing zero embargo repository sharing are  Sage, Emerald, Brill,  CUP, T&F (for social sciences), IOS and APA. Notably, though not a large publisher in terms of papers or journals, the AAAS also allows immediate sharing through repositories.

None of these policies allow using a CC-BY license for sharing in repositories. Three explicitly mention another CC-license (NC or NC-ND), others do not mention licenses at all or ask authors to state that the copyright belongs to the publisher. Sometimes CC-licenses are not explicitly mentioned, but it is indicated that the AAM shared in repositories are for personal and/or non-commercial use only. 

For the data see columns F-H in the tab ‘Green OA‘ in the accompanying spreadsheet.

Other evidence

From the literature and news sources we know of a few examples of single publishers allowing zero embargo sharing in repositories combined with a CC-BY license:

  • ASCB:
    • Molecular Biology of the Cell (PV OA (CC-BY) after 2 months,
      AAM 0 embargo with CC-BY)
  • MIT Press:
    • Asian Development Review (full OA but PV has no open license)
    • Computational Linguistics (full OA but PV=CC-BY-NC-ND)
  • Microbiology Society
    • Microbiology
    • Journal of general Virology
    • Journal of medical Microbiology
    • Microbial genomics
    • International Journal of Systematic and Evolutionary Microbiology
    • JMM case reports
  • Royal Society
    • Biology Letters
    • Interface
    • Interface Focus
    • Notes and records
    • Philosophical Transactions A
    • Philosophical Transactions B
    • Proceedings A 
    • Proceedings B 

A check of the long tail of smaller publishers could yield additional examples of journals compliant with 0 embargo / CC-BY sharing from smaller publishers. 

Empirical analysis of green archiving

Empirical analysis of actual green archiving behaviour (e.g. using Unpaywall and/or Unpaywall data in Lens.org) could also provide leads to journals allowing early sharing.

Since Unpaywall data do not contain information on the date a green archived copy was made available in a repository, a direct empirical analysis of zero-embargo archiving is not readily possible. As a proxy, a selection could be made of articles published in a period of 3 months before a given database snapshot, and then identifying those that are only available as green OA. A period of 3 months, rather than 1 month or less, would allow for some delay in posting to a repository. 

The benefit of using Lens.org for such an analysis is the availability of a user-friendly public interface to perform queries in real time. The disadvantage is that, although Lens sources OA information from Unpaywall, no license information for green OA is included, and no distinction is made between submitted, accepted and published versions. Analyses could also be done on a snapshot of the Unpaywall database directly, which includes license information for green OA (where available) and provides version information.

Gap analysis report

In our previous gap analysis report that gave a snapshot of publication year 2017, we did harvest policies from Sherpa Romeo systematically for the subset of journals included in the gap analysis (journals in Web of Science publishing articles resulting from Plan S-funded research). As explained above, updating this approach was beyond our scope for this exercise. 

In our original gap analysis data, we found no examples of journals that allowed 0 embargo in combination with CC-BY. 

Journal policies for green OA: embargo lengths and licenses
(source: Open access potential and uptake in the context of Plan S – a partial gap analysis)

Potential publisher motivations 

From checking policies and behaviour, different publisher approaches emerge regarding embargoes and licenses for self-archived article versions. It seems that the reluctance of publishers to allow immediate sharing is weaker overall than the reluctance to allow CC-BY for green OA. That may have to do with the reasons behind these two types of reluctance. 

The reason to not allow immediate sharing may concern fears of losing subscription income and perhaps also a dwindling effect on visitors to their platform. However, several publishers have noticed that this fear may be ungrounded, as libraries do not unsubscribe yet just because some percentage of articles is also immediately available as AAM, not only because of incomplete open availability but also because of the wish to provide access to published versions in their platform context. Some publishers (e.g. Sage) have also publicly stated that they do not witness a negative effect on subscriptions. 

For the reluctance to allow CC-BY licenses we expect other reasons to be at play, primarily the desire to be in control over how, where and in what form content is shared. This relates to  protecting income from derivative publications (reprints, printing-on-demand, anthologies etc.) and also to preventing others having any monetary gain from including content on competing platforms. 

Another aspect is the inability of publishers to require linking back to the publisher version in cases where the CC-BY licensed AAM in the repository is reused, rather than depending on community norms to provide information on and links to various versions of a publication.

Looking at the empirical evidence and these considerations, it can potentially be expected that across publishers, a move towards shorter embargoes might be easier to achieve than a move towards a fully open license for green-archived versions. It should be noted that while there are examples of publishers allowing shorter embargoes in response to specific funder mandates (e.g from Wellcome, NIH), to our knowledge there has not, prior to Plan S, been funder or institutional pressure to require open licenses for green archived AAMs. Thus, it will remain to be seen whether publishers would be inclined to move in this direction in response. The reactions to the letter cOAlition S sent to a large number of publishers to inform them on the cOAlition S Rights retention Strategy should provide clarity on that. 

In addition to funder policies, institutions and governments could further support this development through policies and legislation relating to copyright retention, as well as zero embargoes and licenses for green OA archiving of publications resulting from publicly funded research. This could provide authors with more rights and put pressure on publishers to seriously reconsider their stance on these matters. 

Linking impact factor to ‘open access’ charges creates more inequality in academic publishing

[this piece was first published on May 16, 2018 on the site of Times Higher Education under a CC-BY license]

The prospectus SpringerNature released on April 25* in preparation of its intended stock market listing provides a unique view into what the publisher thinks are the strengths of its business model and where it sees opportunities to exploit them, including its strategy on open access publishing. Whether the ultimate withdrawal of the IPO reflected investors’ doubt about the presented business strategies, or whether SpringerNature’s existing debts were deemed to be too great a risk, the prospectus has nonetheless given the scholarly community an insight into the publisher’s motivations in supporting and facilitating open access.

In the document, aimed at potential shareholders, the company outlines how it stands to profit from APC (article processing charge)-based gold open access in an otherwise traditional publishing system that remains focused on high-impact factor journals. From this perspective, a market with high barriers to entry for new players is a desirable situation. Any calls for transparency of contracts, legislation against exclusive ownership of content by publishers, public discussion on pricing models and a move towards broader assessment criteria – beyond impact factors – are all seen as a threat to the company’s profits. Whether this position also benefits the global research community is a question worth asking.

The open access market is seen by SpringerNature as differentiated by impact factor, making it possible to charge much higher APCs for publishing open access in high impact factor journals. Quite revealing is that on page 99 of the prospectus, SpringerNature aims to exploit the situation to increase prices: “We also aim at increasing APCs by increasing the value we offer to authors through improving the impact factor and reputation of our existing journals.”

First, this goes to show that APCs are paid not just to cover processing costs but to buy standing for a researcher’s article (if accepted). This is not new: other traditional publishers such as Elsevier, but even pure open access publishers such as PLoS and Frontiers, tier their market and ask higher APCs for their more selective journals.

Second, this prospectus section shows SpringerNature interprets impact factors and journal brands as what makes a journal valuable to authors and justifies high APCs – and not aspects such as quality and speed of peer review, manuscript formatting, or functionality and performance of the publishing platform.

Third, and most striking, is the deliberate strategy to raise APCs by securing and increasing impact factors of journals. SpringerNature admits it depends on impact factor thinking among researchers and seeks to exploit it.

The explicit aim to exploit impact factors and the presumed dependence of researchers on journal reputation is in sharp contrast with SpringerNature (to be precise BioMedCentralSpringerOpen and Nature Research) having signed the San Francisco Declaration on Research Assessment (DORA). By signing, these SpringerNature organisations agree with the need to “greatly reduce emphasis on the journal impact factor as a promotional tool” as the declaration states.

Additionally, in their 2016 editorial, “Time to remodel the journal impact factor” the editors of SpringerNature’s flagship journal Nature wrote: “These [impact factor] shortcomings are well known, but that has not prevented scientists, funders and universities from overly relying on impact factors, or publishers (Nature’s included, in the past) from excessively promoting them. As a result, researchers use the impact factor to help them decide which journals to submit to – to an extent that is undermining good science.”

The information revealed through the prospectus now raises the question whether signing DORA and the Nature editorial statements were in effect merely paying lip service to appease those worried by toxic effects of impact factor thinking, or whether they have real value and drive policy decisions by journal and publisher leadership. It could be argued that commercial publishers are foremost responsible for their financial bottom line, and that if enough researchers (or their institutions or funders) are willing and able to pay higher APCs for high impact factor journals, then that is a valid business model.

However, scientific publishers do not simply “follow the market”. For better or for worse, their business models influence the way academic research is prioritised, disseminated and evaluated. High APCs make it harder for researchers without substantial funds (eg, researchers from middle- and low-income countries, unaffiliated researchers and citizen scientists) to publish their research (or require a dependency on waivers), and a continued push for publishing in high impact factor journals by publishers, researchers and funders/institutions alike hampers developments towards more rigorous, relevant and equitable research communication.

How do we break out of this? It is promising to see initiatives from publishers and funders/institutions such as registered reports (where a decision to publish is made on the basis of the research proposal and methodology, independent of the results), the TOP guidelines that promote transparency and openness in published research, and moves towards more comprehensive assessment of quality of research by institutions and funders, as highlighted on the DORA website.

This will all help researchers do better research that is accessible and useful to as many people as possible, as might alternative publishing options coming from researchers, funders and institutions. Simply adding an “open access” option to the existing prestige-based journal system at ever increasing costs, however, will only serve to increase the profit margin of traditional publishers without contributing to more fundamental change in the way research is done and evaluated.

Jeroen Bosman (@jeroenbosman) and Bianca Kramer (@MsPhelps)
Utrecht University Library

* The prospectus has since been taken offline. We secured an offline copy for verification purposes, but unfortunately cannot share this copy publicly.

Stringing beads: from tool combinations to workflows

[update 20170820: the interactive online table now includes the 7 most often mentioned ‘other’ tools for each question, next to the 7 preset choices. See also heatmap, values and calculations for this dataset]

With the data from our global survey of scholarly communication tool usage, we want to work towards identifying and characterizing full research workflows (from discovery to assessment).

Previously, we explained the methodology we used to assess which tool combinations occur together in research workflows more often than would be expected by chance. How can the results (heatmap, values and calculations) be used to identify real-life research workflows? Which tools really love each other, and what does that mean for the way researchers (can) work?

Comparing co-occurences for different tools/platforms
First of all, it is interesting to compare the sets of tools that are specifically used together (or not used together) with different tools/platforms. To make this easier, we have constructed an interactive online table (http://tinyurl.com/toolcombinations, with a colour-blind safe version available at http://tinyurl.com/toolcombinations-cb) that allows anyone to select a specific tool and see those combinations. For instance, comparing tools specifically used by people publishing in journal from open access publishers vs. traditional publishers (Figure 1,2) reveals interesting patterns.

For example, while publishing in open access journals is correlated with the use of several repositories and preprint servers (institutional repositories, PubMedCentral and bioRxiv, specifically), publishing in traditional journals is not. The one exception here is sharing publications through ResearchGate, an activity that seems to be positively correlated with publishing regardless of venue….

Another interesting finding is that while both people who publish in open access and traditional journals specifically use the impact factor and Web of Science to measure impact (again, this may be correlated with the activity of publishing, regardless of venue), altmetrics tools/platforms are used specifically by people publishing in open access journals. There is even a negative correlation between the use of Altmetric and ImpactStory and publishing in traditional journals.

Such results can also be interesting for tool/platform providers, as it provides them with information on other tools/platforms their users employ. In addition to the data on tools specifically used together, providers could also use absolute numbers on tool usage to identify tools that are popular, but not specifically used with their own tool/platform (yet). This could identify opportunities to improve interoperability and integration of their own tool with other tools/platforms. All data are of course fully open and available for any party to analyze and use.

tool-combinations-079-topical-journal-oa-publisher

Figure 1. Tool combinations – Topical journal (Open Access publisher)

Tool combinations 078 - Topical journal Trad publisher.jpg

Figure 2. Tool combinations – Topical journal (traditional publisher)

Towards identifying workflows: clusters and cliques
The examples above show that, although we only analyzed combinations of any two tools/platforms so far, these data already bring to light some interesting differences between research workflows. There are several possibilities to extend this analysis  from separate tool combinations into groups of tools typifying full research workflows. Two of these possibilities are looking at clusters and cliques, respectively.

1. Clusters: tools occurring in similar workflows
Based on our co-occurrence data, we can look at which tools occur in similar workflows, i.e. have the most tools in common that they are or are not specifically used with. This can be done in R using a clustering analysis script provided by Bastian Greshake (see GitHub repo with code, source data and output). When run with our co-occurrence data, the script basically sorts the original heatmap with green and red cells by placing tools that have a similar pattern of correlation with other tools closer together (Figure 3). The tree structure on both sides of the diagram indicates the hierarchy of tools that are most similar in this respect.

survey_heatmap_p-values_2-tailed_coded_RG_white_AB.png

Figure 3. Cluster analysis of tool usage across workflows (click on image for larger version). Blue squares A and B indicate clusters highlighted in Figure 4. A color-blind safe version of this figure can be found here.

Although the similarities (indicated by the length of the branches in the hierarchy tree, with shorter lengths signifying closer resemblance) are not that strong, still some clusters can be identified. For example, one cluster contains popular, mostly traditional tools (Figure 4A) and another cluster contains mostly innovative/experimental tools, that apparently occur in similar workflows together. (Figure 4B).

clusters_examples

Figure 4. Two examples of clusters of tools (both clusters are highlighted in blue in Figure 3).

2. Cliques: tools that are linked together as a group
Another approach to defining workflows is to identify groups of tools that are all specifically used with *all* other tools in that group. In network theory, such groups are called ‘cliques’. Luckily, there is a good R-library (igraph) for identifying cliques from co-occurrence data. Using this library (see GitHub repo with code, source data and output) we found that the largest cliques in our set of tools consist of 17 tools . We identified 8 of these cliques, which are partially overlapping. In total, there are over 3000 ‘maximal cliques’ (cliques that cannot be enlarged) in our dataset of 119 preset tools, varying in size from 3 tot 17 tools. So there is lots to analyze!

An example of one of the largest cliques is shown in Figure 5. This example shows a workflow with mostly modern and innovative tools, with an emphasis on open science (collaborative writing, sharing data, publishing open access, measuring broader impact with altmetrics tools), but surprisingly, these tools are apparently also all used together with the more traditional ResearcherID. A hypothetical explanation might be that this represents the workflow of a subset of people actively aware of and involved in scholarly communication, who started using ResearcherID when there was not much else, still have that, but now combine it with many other, more modern tools.

cliques-example_colors

Figure 5. Example of a clique: tools that all specifically co-occur with each other

Clusters and cliques: not the same
It’s important to realize the difference between the two approaches described above. While the clustering algorithm considers similarity in patterns of co-occurrences between tools, the clique approach identifies closely linked groups of tools, that can, however, each also co-occur with other tools in workflows.

In other words, tools/platform that are clustered together occur in similar workflows, but do not necessarily all specifically occur together (see the presence of white and red squares in Figure 4A,B). Conversely, tools that do all specifically occur together, and thus form a clique, can appear in different clusters, as each can have a different pattern of co-occurrences with other tools (compare Figures 3/5).

In addition, it is worth noting that these approaches to identifying workflows are based on statistical analysis of aggregated data – thus, clusters or cliques do not necessarily have an exact match with individual workflows of survey respondents.Thus we are not describing actual observed patterns, but are inferring patterns based on observed strong correlations of pairs of tools/platforms.

Characterizing workflows further – next steps
Our current analyses of tool combinations and workflows are based on survey answers from all participants, for the 119 preset tools in our survey. We would like to extend these analyses to include tools most often mentioned by participants as ‘others’. We also want to focus on differences and similarities of workflows of specific subgroups (e.g. different disciplines, research roles and/our countries). The demographic variables in our public dataset (on Zenodo or Kaggle) allow for such breakdowns, but it would require coding an R script to generate the co-occurrence probabilities for different subgroups. And finally, we can add variables to the tools, for instance , classifying which tools support open research practices and which don’t. This then allows us to investigate to which extent full Open Science workflows are not only theoretically possible, but already put into practice by researchers.

See also our short video, added below:

header image: Turquoise Beads, Circe Denyer, CC0, PublicDomainPictures.net

GitHub and more: sharing data & code

A recent Nature News article ‘Democratic databases: Science on GitHub‘ discussed GitHub and other programs used for sharing code and data. As a measure for GitHub’s popularity, NatureNews looked at citations of GitHub repositories in research papers from various disciplines (source: Scopus). The article also mentioned BitBucket, Figshare and Zenodo as alternative tools for data and code sharing, but did not analyze their ‘market share’ in the same way.

Our survey on scholarly communication tools asked a question about tools used for archiving and sharing data & code, and included GitHub, FigShare, Zenodo and Bitbucket among the preselected answer options (Figure 1). Thus, our results can provide another measurement of use of these online platforms for sharing data and code.

sharedata

Figure 1 – Survey question on archiving and sharing data & code

Open Science  – in word or deed

Perhaps the most striking result is that of the 14,896 researchers among our 20,663 respondents (counting PhD students, postdocs and faculty), only 4,358 (29,3%) reported using any tools for archiving/sharing data. Saliently, of the 13,872 researchers who answered the question ‘Do you support the goals of Open Science’ (defined in the survey as  ‘openly creating, sharing and assessing research, wherever viable’), 80,0% said ‘yes’. Clearly, for open science, support in theory and adoption in practice are still quite far apart, at least as far as sharing data is concerned.

os-support-researchers

Figure 2 Support for Open Science among researchers  in our survey

Among those researchers that do archive and share data, GitHub is indeed the most often used, but just as many people indicate using ‘others’ (i.e. tools not mentioned as one of the preselected options). Figshare comes in third, followed by Bitbucket, Dryad, Dataverse, Zenodo and Pangaea (Figure 3).

all-researchers-sharing-data

Figure 3 – Survey results: tools used for archiving and sharing data & code

Among ‘others’, the most often mentioned tool was Dropbox (mentioned by 496 researchers), with other tools trailing far behind.  Unfortunately, the survey setup invalidates direct comparison of the number of responses for preset tools and tools mentioned as ‘others’ (see: Data are out. Start analyzing. But beware). Thus, we cannot say whether Dropbox is used more or less than GitHub, for example, only that it is the most often mentioned ‘other’ tool.

Disciplinary differences

As mentioned above, 29,3% of researchers in our survey reported to engage in the activity of archiving and sharing code/data. Are there disciplinary differences in this percentage? We explored this earlier in our post ‘The number games‘. We found that researchers in engineering & technology are the most inclined to archive/share data or code, followed by those in physical and life sciences. Medicine, social sciences and humanities are lagging behind at more or less comparable levels (figure 4). But is is also clear that in all disciplines archiving/sharing data or code is an activity that still only a minority of researchers engage in.

data-code-archiving-respons-researchers

Figure 4 – Share of researchers archiving/sharing data & code

Do researchers from different disciplines use different tools for archiving and sharing code & data? Our data suggest that they do (Table 1, data here). Percentages given are the share of researchers (from a given discipline) that indicate using a certain tool. For this analysis, we looked at the population of researchers (n=4,358) that indicated using at least one tool for archiving/sharing data (see also figure 4). As multiple answers were allowed for disciplines as well as tools used, percentages do not add up to 100%.

While it may be no surprise that researchers from Physical Sciences and Engineering & Technology are the most dominant GitHub users (and also the main users of BitBucket), GitHub use is strong across most disciplines. Figshare and Dryad are predominantly used in Life Sciences, which may partly be explained by the coupling of these repositories to journals in this domain (i.e. PLOS to Figshare and Gigascience, along with many others, to Dryad).

github-and-more-heatmap-table

Table 1: specific tool usage for sharing data & code across disciplines

As a more surprising finding, Dataverse seems to be adopted by some disciplines more than others. This might be due to the fact that there is often institutional  support from librarians and administrative staff for Dataverse (which was developed by Harvard and is in use at many universities). This might increase use by people who have somewhat less affinity with ‘do-it-yourself’ solutions like GitHub or Figshare. An additional reason, especially for Medicine, could be the possibility of private archiving of data in Dataverse, with control over whom to give access. This is often an important consideration when dealing with potentially sensitive and confidential patient data.

Another surprising finding is the overall low use of Zenodo – a CERN-hosted repository that is the recommended archiving and sharing solution for data from EU-projects and -institutions. The fact that Zenodo is a data-sharing platform that is available to anyone (thus not just for EU project data) might not be widely known yet.

A final interesting observation, which might go against the common idea, is that among researchers in Arts&Humanities who archive and share code, use of these specific tools is not lower than in Social Sciences and Medicine. In some cases, it is even higher.

A more detailed breakdown, e.g. across research role (PhD student, postdoc or faculty), year of first publication or country is possible using the publicly available survey data.

Support for Open Science in EU member states

In preparation for the EU Open Science Conference on April 4-5 in Amsterdam, we looked at what our survey data reveal about declared support for Open Access and Open Science among researchers in the EU.

Support for Open Access and Open Science

Of the 20,663 survey respondents, 10,297 were from the EU, of which 7,358 were researchers (from PhD-students to faculty). Most respondents provided an answer to the two multiple-choice questions on whether or not they support the goals of Open Access and Open Science, respectively. A large majority expressed support for Open Access (87%) and Open Science (79%) (see Fig 1).

OA/OS support from EU researchers

Fig. 1 Responses from EU researchers to survey questions on support for Open Access and Open Science

Even though support for Open Science is less than for Open Access, this does not mean that many more people actively state they do NOT support Open Science, as compared to Open Access (see Fig 1). Rather, more people indicate ‘I don’t know’ in answer to the question on Open Science. This could mean they have not yet reached an opinion on Open Science,  that they perhaps support some aspects of Open Science and not others, or simply that they found the wording of the question confusing.

It is interesting to note that the Open Access support figure roughly corresponds with results from Taylor & Francis Open Access surveys of 2013 and 2014, that reported only 16 and 11 percent respectively that agreed with the statement that there are no fundamental benefits to Open Access publication.


Differences between member states

When we look at the differences in professed support for Open Access and Open Science in the various EU member states (see Fig 2, Table 1) we see that support for Open Access is relatively high in many Western European countries. Here, more funding opportunities for Open Access are often available, either through institutional funds or increasingly through negotiations with publishers, where APCs are included in institutional subscriptions for hybrid Open Access journals. Perhaps many researchers in Southern and Eastern member states associate Open Access with either expensive APCs or with “free” or nationally oriented journals they wish to avoid because they are required to publish “international, highly ranked” venues.

Conversely, support for Open Science is higher in many ountries in Southern and Eastern Europe. As pure conjecture, may we state that in these regions, with sometimes less developed research infrastructures, the benefits of Open Science, e.g. for collaboration,  might be more apparant? The observed outliers to this general pattern (e.g. Belgium and Italy) illustrate both the limitations of these survey data (number of responses and possible bias) and the fact that the whole picture is likely to be more complicated.

OA-OS support EU member states

Fig. 2 Level of support for Open Access (left panel) and Open Science (right panel) in individual EU member states. Scale is based on non-weighted country averages. Results for states with less than 20 individual responses are omitted (see Table 1).

In general, the above differences between member states come into even clearer focus when support for Open Science is compared to that for Open Access, for each country. Fig 3 shows whether support for Open Science in a given country is higher or lower than for Open Access. Again, in most Western European countries Open Access is easily embraced while Open Science, perhaps because it is going further and being a more recent development, meets more doubt or even resistance. In many Southern and Eastern European countries, the pattern is reversed.  Clearly though, this cannot be the full story. Finding out what is behind these differences may valuably inform discussions on how to proceed with Open Access/Open Science policies and implementation.

OS vs. OA support EU member states

Fig. 3 Ratio of support for Open Science (OS) and Open Access (OA) in individual EU member states (red = relatively more support for OA than for OS, green = relatively more support for OS than OA). Scale is based non-weighted country ratios. Results for states with less than 20 individual responses were omitted (see Table 1).

Irrespective of differences between countries, the overall big majority support of Open Access as well as Open Science among European researchers is perhaps the most striking result. Of course, support not automatically implies that one puts ideas into practice. For this, it will be interesting to look at the actual research workflows of the researchers that took our survey, to see in how far their practices align with their stated support for Open Access and Open Science. Also, since our survey used a self-selected sample (though distribution was very broad), care should be taken in interpretation of the results, as they might be influenced by self-selection bias.

Data

The aggregated data underlying this post are shown in Table 1. For this analysis, we did not yet look at differences between scientific disciplines or career stage. Full (anonymized) data on this and all other survey questions will be made public on April 15th.

Do you support the goal of Open Access? Do you support the goals of Open Science?
Yes No I don’t know # responses Yes No I don’t know # responses
Austria 95% 2% 3% 60 83% 3% 14% 66
Belgium 89% 5% 6% 103 88% 3% 9% 102
Bulgaria 81% 14% 5% 21 72% 0% 28% 18
Croatia 85% 12% 3% 33 94% 0% 6% 31
Cyprus 69% 8% 23% 13 69% 8% 23% 13
Czech Republic 73% 13% 13% 75 69% 13% 18% 78
Denmark 90% 1% 9% 80 84% 0% 16% 82
Estonia 85% 8% 8% 13 92% 8% 0% 13
Finland 84% 4% 12% 92 83% 3% 14% 95
France 87% 5% 8% 686 79% 5% 16% 699
Germany 87% 3% 9% 1165 76% 7% 18% 1179
Greece 81% 7% 12% 214 85% 4% 12% 222
Hungary 89% 9% 2% 45 83% 10% 7% 41
Ireland 81% 5% 15% 62 82% 5% 13% 62
Italy 79% 7% 14% 407 77% 4% 18% 413
Latvia 86% 0% 14% 7 83% 0% 17% 6
Lithuania 88% 0% 13% 8 75% 13% 13% 8
Luxembourg 86% 0% 14% 7 57% 0% 43% 7
Malta 100% 0% 0% 8 75% 0% 25% 8
Netherlands 89% 2% 9% 1610 75% 5% 20% 1627
Poland 86% 7% 7% 85 88% 5% 7% 83
Portugal 88% 5% 8% 129 84% 5% 11% 133
Romania 80% 5% 15% 82 85% 5% 10% 82
Slovakia 70% 5% 25% 20 82% 6% 12% 17
Slovenia 96% 0% 4% 27 96% 0% 4% 28
Spain 87% 3% 10% 537 88% 2% 10% 542
Sweden 90% 3% 6% 146 76% 6% 19% 145
United Kingdom 88% 3% 9% 1113 79% 4% 17% 1123
Total 87% 4% 9% 6848 79% 5% 17% 6923

Table 1 Aggregated data on support of Open Access and Open Science per EU member state.