Eight routes towards Plan S compliance

by Jeroen Bosman & Bianca Kramer

Plan S

Much has already been said and written about Plan S, the initiative of a group of European research funders to drastically increase and speed up the transition to full open access. Instead of adding to that with statements on whether it is a good idea or on which elements we like and which we do not like, here we present and dissect eight possible routes towards compliance. For each of those routes the scheme shows examples (please treat them as such), assessments of effects on various stakeholders and on overall cost and also whether the route aligns with expected changes in the evaluation system.

The routes

In our view it is useful to discern 5 potential gold routes and 3 potential green routes.

  1. Using existing or new APC-based gold journals / platforms.
  2. Using existing or new non-APC-based gold journals / platforms (a.k.a. diamond).
  3. Flipping journals to an APC-based gold model, by publishers or by editors taking the journal with them.
  4. ‘Soft-flipping’ journals to APC gold (leaving subscription/hybrid intact): this means creating a APC-based full OA sister journal with same scope, editors, policies etc..
  5. Flipping journals to non-APC-based gold (diamond), by publishers or editors.
  6. Archiving the publisher version, on publication, with copyright retained and an open license.
  7. Archiving the accepted author manuscript, on publication, with copyright retained and an open license.
  8. Sharing preprints (e.g. in dedicated preprint archives) and using overlay journals for peer review.

Discuss

We hope this is valuable in supporting discussions or that it will at least provoke some comments. For the latter you can either use the comments function below, use Hypothesis or use the Google Slides version of the scheme.

The scheme

Scheme with characteristics of eight routes towards Plan S compliance

Plan S – response to alternatives proposed by Kamerlin et al.

The recent substantial critique* by a group of mainly chemistry researchers to Plan S has garnered a lot of discussion on Twitter and in blogposts (e.g. Plan S, Antwort auf die Kritik), mostly around the risks the authors associate with the implementation of Plan S in its current form. The authors, in their well-thought-out piece, also include four solutions as alternatives to Plan S, and these have as yet, to our knowledge, not been given as much attention they deserve. To further the healthy debate around both Plan S and alternative (existing) options for open access, we hereby provide our point-by-point response to the four scenarios sketched by the authors (below in cursive) and how we feel they relate to the goals and methods as proposed in plan S. 

(1) One possible solution would be to convince all subscription (TA) journals to make all papers fully OA after an embargo period of 6-12 months, without APCs. In this environment, libraries would still buy subscriptions to allow scientists to catch up with the most recent developments, and the broader public would have access to all research without a paywall (but with a slight delay). While this plan does not provide immediate access to everyone, it is a safe and easy solution that would be beneficial for most stakeholders. Under this model, most publications would be read by scientists in the first 6-12 months after publication, and after the embargo period is over, no further costs should be accrued to access a scientific paper. In a modification of Plan S, rather than an indiscriminate blanket ban on all non-pure Gold OA journals, it would then be possible to exclude any (non-society) journals that won’t accept this policy from the list of ‘allowed’ journals. This will likely still result in some journals being excluded as possible publication venues, but is a smaller infringement on academic freedom, and could become an acceptable situation for most researchers and a model to which any journal can easily adapt without compromising on quality. We note that according to Robert-Jan Smits, the European Commission’s Open Access Envoy, even an embargo period of 6-12 months is “unacceptable”, but he does not explain why  29 exactly that should be the case. Very recently, Belgium accepted a new law following this exact 6-12 month embargo model. This embargo period is intended to “give authors the chance to publish their papers in renowned journals, and prevents that publishers are damaged by a loss in income from subscriptions’, as is the opinion of Peeters’ cabinet.”

This option is currently executed by a number of journals/publishers, and is often referred to as delayed OA. While this would indeed be an option that would not disrupt the current reputation-driven publication system (the disruption of which is arguably one of the goals of plan S), it has also several issues:

1) by limiting immediate access to subscriptions, it would limit access to only those researchers (typically from richer institutions) that can afford those subscriptions, excluding researchers from other institutions, non-affiliated researchers, members of society, NGOs, small and medium (and large) companies, start-ups and non-profits, from immediate access to scientific and scholarly findings and the benefits flowing from that. Thus, this is arguably not an optimal solution for most stakeholders.

2) Currently, most delayed open access models do not include an open license for the publications involved, making this a read-only model rather than a true open access model that enables access as well as re-use.

3) Currently, as far as we know, publishers making journals available free to read after a number of months or years do not guarantee in any way that they will remain available. If the journal is sold to another publisher, volumes may become unavailable again.

4) This does not solve the problem currently unsustainable subscription prices, one of the very reasons of the push for OA.

NB1 The law recently approved in Belgium deals with the authors’ right to archive and sharing the manuscript of a publication after 6-12 months embargo, e.g. in a repository, not with the publisher making closed publications open on the publisher platform. It therefore more closely relates to solution 2 proposed by the authors. (see below).

NB2  It is unclear why the authors seem to argue that society journals should be exempted from this model (“it would then be possible to exclude any (non-society) journals that won’t accept this policy”).

 

(2) Another model, which can be implemented in conjunction with point (1), is a mandate on depositing preprints in appropriate online repositories (Green OA), similar to the Open Access requirements of the US National Institutes of Health . This is the model frequently employed by scientists to meet funders’ Open Access requirements. These are then easily searchable using a range of search tools, including (but not limited to), most easily, Google Scholar. This is a solution with great benefits to the reader and limited risks to the author, as it allows for rapid early-stage dissemination of research, the provision of real time feedback to the authors, while opening up research to the scientific community and general public much faster than waiting for the very long publication time scales inherent to some journals. (…)

There seems to be a misunderstanding here around the difference between preprints and the deposition of published articles (either publisher version, or the author-version after acceptance by the publisher). The OA requirements of NIH and many other funders concern the latter (e.g. through deposition in PubMed Central). While this model has indeed resulted in a large proportion of publications from NIH (as well as, for instance, the Wellcome trust) to be OA, where an embargo is involved (such as with NIH) it has the same drawbacks regarding non-immediate access as discussed above for scenario 1. As with scenario 1, it also does not provide incentives for publishers to change their publication model nor for funders, institutions and researchers to change the reputation-driven publication system.

NB The further benefits discussed in this scenario (early-stage dissemination, real-time feedback, circumventing long publication time scales) are benefits that are associated with preprints. Additional benefits of this model include a demonstrable trace of the scholarly record (e.g. being able to see changes made in an article as the result of peer review and community feedback).

(3) We note here also that more and more reputable publishers are now adding high quality open access publications to their repertoire of journals. In particular, we encourage fully open access journals published by scientific societies. A brief (but by no means exclusive) list of examples of such journals include ACS Central Science , ACS Omega , Chemical Science , RSC Advances , the Royal Society journals Open Biology and Open Science , IUCrJ and eLife , among others. A move to a fully open access landscape is clearly going to become much easier when there are more journals that can guarantee the same level of quality control and sustainability as current reputable subscription journals, as venues to disseminate one’s work. It may be a slower transition, but making this transition in an ecosystem that supports it does not infringe on academic freedom as Plan S does. Clearly, the overall march towards Open Knowledge Practices seems inevitable, as well as desirable, as researcher consciousness about the means of research dissemination, the possibilities, and the important ethical issues surrounding closed science increases. We must be careful to encourage this march in a way that does not replace one problem with another.    

The increase in the number of good quality open access venues (both from commercial and non-profit publishers, as well as from scholarly societies) is fully in line with what Plan S aims to stimulate. While there are clearly different opinions on the ways in which this development is best stimulated, there appears to be no difference in opinion as to the benefit of having a wide array of qualitatively good full OA publication options. It is encouraging to see that the authors include in their examples journals  for multiple disciplines that do not claim to be selective based on perceived impact, but judge research on the basis of soundness (like ACS Omega and Royal Society Open Science), indicating that they do not equate quality with selectivity per se. It should also be noted Plan S includes the commitment of funders to apply rigorous criteria as to the quality of full OA publication venues, although the exact nature of these criteria remain to be decided on. Plan S also wants to cap APCs. Though it is as yet unknown at what level, it probably will be at a level below the highest APCs currently asked for by full OA journals. It is interesting to see that the examples given have APCs ranging from 0 to 2500 USD.

Finally, the debate about Open Access, and APC, ignores the Diamond (also known as Platinum) model of OA publication. Diamond publication is a fully sponsored mode of publication, in which neither author nor publisher pays, but rather, the journals are funded by a third party sponsor. An example of Diamond OA is provided by the Beilstein Journals, all publications for which are covered by the  non-profit Beilstein Institute in Germany . Similarly, there is no fee for publication in ACS Central Science, and all publication costs are covered by the American Chemical Society . It is important to ensure the moral and ethical integrity of that sponsor. But, when performed in an ethically uncompromised framework, this would be an ideal model for publications by scientific societies, whose  journals could then either be sponsored by funders and other donors. In such a framework, rather than simply transferring costs from readers to authors, while allowing questionable journals to flourish and exploit APC, quality control can be ensured by financially supporting high quality not-for-profit publications. Would this not be a much braver step for European and National funders to mandate, than  a push for pure Gold OA?  

Plan S explicitly does not state a preference for an author-paid APC model. Other forms of pure gold OA, like indeed diamond and platinum OA, are fully in line with plan S. Diamond not being compliant is thus a misunderstanding. Depending on the implementation, the stated intent of funders to “provide incentives to establish and support full gold OA versions where appropriate” might also take the shape of enabling diamond/platinum models. One possible model for this would be the announced plans for a publication platform financed by the EC that will require no APCs from authors or institutions.  

Overall, the four solutions proposed by the authors all represent tried-and-tested solutions that are practiced in various settings, and all are providing valuable contributions to progress in open access (or in some cases, free-to-read access) of research articles. Two of them (3 and 4) are, as models, fully in line with plan S. The other two (1 and 2) facilitate access but fall short of the ambitions of plan S to not only provide immediate open access to research articles, but also to stimulate a shift in publishing away from a subscription-based journal system. Whether those ambitions and their proposed implementation are deemed to risky, too forceful and/or too limited in geographical scope to be beneficial to research and researchers remains a topic of debate even (or perhaps especially) among proponents of open research practices, which include both the original authors and ourselves.

Bianca Kramer (@MsPhelps) and Jeroen Bosman (@jeroenbosman)
Utrecht University Library

Bianca Kramer is currently also a member of the EC Expert Group ‘Future of Scholarly Publishing and Scholarly Communication 

 

*The piece is also published as part of a  post on the For better science blog

Linking impact factor to ‘open access’ charges creates more inequality in academic publishing

[this piece was first published on May 16, 2018 on the site of Times Higher Education under a CC-BY license]

The prospectus SpringerNature released on April 25* in preparation of its intended stock market listing provides a unique view into what the publisher thinks are the strengths of its business model and where it sees opportunities to exploit them, including its strategy on open access publishing. Whether the ultimate withdrawal of the IPO reflected investors’ doubt about the presented business strategies, or whether SpringerNature’s existing debts were deemed to be too great a risk, the prospectus has nonetheless given the scholarly community an insight into the publisher’s motivations in supporting and facilitating open access.

In the document, aimed at potential shareholders, the company outlines how it stands to profit from APC (article processing charge)-based gold open access in an otherwise traditional publishing system that remains focused on high-impact factor journals. From this perspective, a market with high barriers to entry for new players is a desirable situation. Any calls for transparency of contracts, legislation against exclusive ownership of content by publishers, public discussion on pricing models and a move towards broader assessment criteria – beyond impact factors – are all seen as a threat to the company’s profits. Whether this position also benefits the global research community is a question worth asking.

The open access market is seen by SpringerNature as differentiated by impact factor, making it possible to charge much higher APCs for publishing open access in high impact factor journals. Quite revealing is that on page 99 of the prospectus, SpringerNature aims to exploit the situation to increase prices: “We also aim at increasing APCs by increasing the value we offer to authors through improving the impact factor and reputation of our existing journals.”

First, this goes to show that APCs are paid not just to cover processing costs but to buy standing for a researcher’s article (if accepted). This is not new: other traditional publishers such as Elsevier, but even pure open access publishers such as PLoS and Frontiers, tier their market and ask higher APCs for their more selective journals.

Second, this prospectus section shows SpringerNature interprets impact factors and journal brands as what makes a journal valuable to authors and justifies high APCs – and not aspects such as quality and speed of peer review, manuscript formatting, or functionality and performance of the publishing platform.

Third, and most striking, is the deliberate strategy to raise APCs by securing and increasing impact factors of journals. SpringerNature admits it depends on impact factor thinking among researchers and seeks to exploit it.

The explicit aim to exploit impact factors and the presumed dependence of researchers on journal reputation is in sharp contrast with SpringerNature (to be precise BioMedCentralSpringerOpen and Nature Research) having signed the San Francisco Declaration on Research Assessment (DORA). By signing, these SpringerNature organisations agree with the need to “greatly reduce emphasis on the journal impact factor as a promotional tool” as the declaration states.

Additionally, in their 2016 editorial, “Time to remodel the journal impact factor” the editors of SpringerNature’s flagship journal Nature wrote: “These [impact factor] shortcomings are well known, but that has not prevented scientists, funders and universities from overly relying on impact factors, or publishers (Nature’s included, in the past) from excessively promoting them. As a result, researchers use the impact factor to help them decide which journals to submit to – to an extent that is undermining good science.”

The information revealed through the prospectus now raises the question whether signing DORA and the Nature editorial statements were in effect merely paying lip service to appease those worried by toxic effects of impact factor thinking, or whether they have real value and drive policy decisions by journal and publisher leadership. It could be argued that commercial publishers are foremost responsible for their financial bottom line, and that if enough researchers (or their institutions or funders) are willing and able to pay higher APCs for high impact factor journals, then that is a valid business model.

However, scientific publishers do not simply “follow the market”. For better or for worse, their business models influence the way academic research is prioritised, disseminated and evaluated. High APCs make it harder for researchers without substantial funds (eg, researchers from middle- and low-income countries, unaffiliated researchers and citizen scientists) to publish their research (or require a dependency on waivers), and a continued push for publishing in high impact factor journals by publishers, researchers and funders/institutions alike hampers developments towards more rigorous, relevant and equitable research communication.

How do we break out of this? It is promising to see initiatives from publishers and funders/institutions such as registered reports (where a decision to publish is made on the basis of the research proposal and methodology, independent of the results), the TOP guidelines that promote transparency and openness in published research, and moves towards more comprehensive assessment of quality of research by institutions and funders, as highlighted on the DORA website.

This will all help researchers do better research that is accessible and useful to as many people as possible, as might alternative publishing options coming from researchers, funders and institutions. Simply adding an “open access” option to the existing prestige-based journal system at ever increasing costs, however, will only serve to increase the profit margin of traditional publishers without contributing to more fundamental change in the way research is done and evaluated.

Jeroen Bosman (@jeroenbosman) and Bianca Kramer (@MsPhelps)
Utrecht University Library

* The prospectus has since been taken offline. We secured an offline copy for verification purposes, but unfortunately cannot share this copy publicly.

Gates Foundation and AAAS – comments

Last week, Nature News reported on the termination of the open access agreement between the Gates Foundation and AAAS. Under this agreement, that lasted 18 months, the Gates Foundation paid a lump sum to have papers  by their funded authors  in AAAS-journals  (including Science) published open access,  in compliance with the immediate open access mandate of the Gates Foundation.

In preparation of the Nature News article, the author, Richard van Noorden, asked us for our comments on these developments and was able to use a quote in the final article.

For transparency reasons, and because we feel this is an important issue, we share our full comments below.

What would be interesting to know is the reason the agreement was not renewed – whether it was due to an inability to reach an agreement over costs for this specific arrangement, or whether it signifies a wish by either AAAS or the Gates foundation to switch gears.

There are two  main points why we are glad this agreement is discontinued:

  • The Gates Foundation is very likely paying exorbitant APCs per paper given the lump sum and the number of papers made OA, simply to be able to publish OA and gain the reputation these journals convey, while leaving everything else as it is, especially the reward system.
  • The agreement was setting a bad example of showing that everything can be had if you just throw enough money at it, while many researchers, institutions and countries are struggling to provide immediate open access at current prices.

It has probably done next to nothing among the glam journal publishers to raise awareness of the importance to switch to open access and hasn’t paved the way for researchers and institutions lacking the type of resources that Gates Foundation has.

Recently, we are seeing more activities by funders that indicate a desire to better control costs associated with funding Open Access publications. Wellcome is reviewing their open access mandate (also in light of the large difference in APCs for hybrid and full gold OA journals), the European Union has been floating the idea of no longer funding hybrid open access publications (in the Impact Assessment (part 2, page 107) for Horizon Europe), and Robert Jan Smits, special envoy Open Access of the EC is exploring an agreement with European National funders to require grantees to publish in open access journals, control APC costs and stimulate the flipping of subscription and hybrid journals to full OA (as presented July 11 at ESOF (see the video recording and the accompanying press release by Science Europe).

Perhaps the most interesting aspect of these developments is whether they can be accompanied by a shift in evaluation criteria (by funders, governments and institutions) that will lessen the stranglehold of high-impact journals such as Nature, Science, NEJM and PNAS that enables them to negotiate agreements as the one described for AAAS and Gates at such high costs, or resist making agreements over OA publishing altogether.  

So it will be interesting to see whether Gates will continue to enforce its open access policy unchanged despite the termination of the agreement with AAAS/Science. If funders will be steadfast in their determination to both move OA forwards and are able to contribute to a change in evaluation criteria, we might yet see a true watershed moment in scholarly publication.

There is one additional aspect regarding current initiatives by funders to exert influence over OA publishing and its accompanying costs, and that is of course the funder publishing platforms as implemented by (a.o.) Gates and Wellcome and as put forward for tender by the EC.  These could be seen as a strategic option by funders to stimulate OA under conditions they can control, and indirectly also as a lever to promote more systemic change in OA publishing.

Jeroen Bosman and Bianca Kramer

 

 

 

 

 

 

 

 

 

 

 

Stringing beads: from tool combinations to workflows

[update 20170820: the interactive online table now includes the 7 most often mentioned ‘other’ tools for each question, next to the 7 preset choices. See also heatmap, values and calculations for this dataset]

With the data from our global survey of scholarly communication tool usage, we want to work towards identifying and characterizing full research workflows (from discovery to assessment).

Previously, we explained the methodology we used to assess which tool combinations occur together in research workflows more often than would be expected by chance. How can the results (heatmap, values and calculations) be used to identify real-life research workflows? Which tools really love each other, and what does that mean for the way researchers (can) work?

Comparing co-occurences for different tools/platforms
First of all, it is interesting to compare the sets of tools that are specifically used together (or not used together) with different tools/platforms. To make this easier, we have constructed an interactive online table (http://tinyurl.com/toolcombinations, with a colour-blind safe version available at http://tinyurl.com/toolcombinations-cb) that allows anyone to select a specific tool and see those combinations. For instance, comparing tools specifically used by people publishing in journal from open access publishers vs. traditional publishers (Figure 1,2) reveals interesting patterns.

For example, while publishing in open access journals is correlated with the use of several repositories and preprint servers (institutional repositories, PubMedCentral and bioRxiv, specifically), publishing in traditional journals is not. The one exception here is sharing publications through ResearchGate, an activity that seems to be positively correlated with publishing regardless of venue….

Another interesting finding is that while both people who publish in open access and traditional journals specifically use the impact factor and Web of Science to measure impact (again, this may be correlated with the activity of publishing, regardless of venue), altmetrics tools/platforms are used specifically by people publishing in open access journals. There is even a negative correlation between the use of Altmetric and ImpactStory and publishing in traditional journals.

Such results can also be interesting for tool/platform providers, as it provides them with information on other tools/platforms their users employ. In addition to the data on tools specifically used together, providers could also use absolute numbers on tool usage to identify tools that are popular, but not specifically used with their own tool/platform (yet). This could identify opportunities to improve interoperability and integration of their own tool with other tools/platforms. All data are of course fully open and available for any party to analyze and use.

tool-combinations-079-topical-journal-oa-publisher

Figure 1. Tool combinations – Topical journal (Open Access publisher)

Tool combinations 078 - Topical journal Trad publisher.jpg

Figure 2. Tool combinations – Topical journal (traditional publisher)

Towards identifying workflows: clusters and cliques
The examples above show that, although we only analyzed combinations of any two tools/platforms so far, these data already bring to light some interesting differences between research workflows. There are several possibilities to extend this analysis  from separate tool combinations into groups of tools typifying full research workflows. Two of these possibilities are looking at clusters and cliques, respectively.

1. Clusters: tools occurring in similar workflows
Based on our co-occurrence data, we can look at which tools occur in similar workflows, i.e. have the most tools in common that they are or are not specifically used with. This can be done in R using a clustering analysis script provided by Bastian Greshake (see GitHub repo with code, source data and output). When run with our co-occurrence data, the script basically sorts the original heatmap with green and red cells by placing tools that have a similar pattern of correlation with other tools closer together (Figure 3). The tree structure on both sides of the diagram indicates the hierarchy of tools that are most similar in this respect.

survey_heatmap_p-values_2-tailed_coded_RG_white_AB.png

Figure 3. Cluster analysis of tool usage across workflows (click on image for larger version). Blue squares A and B indicate clusters highlighted in Figure 4. A color-blind safe version of this figure can be found here.

Although the similarities (indicated by the length of the branches in the hierarchy tree, with shorter lengths signifying closer resemblance) are not that strong, still some clusters can be identified. For example, one cluster contains popular, mostly traditional tools (Figure 4A) and another cluster contains mostly innovative/experimental tools, that apparently occur in similar workflows together. (Figure 4B).

clusters_examples

Figure 4. Two examples of clusters of tools (both clusters are highlighted in blue in Figure 3).

2. Cliques: tools that are linked together as a group
Another approach to defining workflows is to identify groups of tools that are all specifically used with *all* other tools in that group. In network theory, such groups are called ‘cliques’. Luckily, there is a good R-library (igraph) for identifying cliques from co-occurrence data. Using this library (see GitHub repo with code, source data and output) we found that the largest cliques in our set of tools consist of 17 tools . We identified 8 of these cliques, which are partially overlapping. In total, there are over 3000 ‘maximal cliques’ (cliques that cannot be enlarged) in our dataset of 119 preset tools, varying in size from 3 tot 17 tools. So there is lots to analyze!

An example of one of the largest cliques is shown in Figure 5. This example shows a workflow with mostly modern and innovative tools, with an emphasis on open science (collaborative writing, sharing data, publishing open access, measuring broader impact with altmetrics tools), but surprisingly, these tools are apparently also all used together with the more traditional ResearcherID. A hypothetical explanation might be that this represents the workflow of a subset of people actively aware of and involved in scholarly communication, who started using ResearcherID when there was not much else, still have that, but now combine it with many other, more modern tools.

cliques-example_colors

Figure 5. Example of a clique: tools that all specifically co-occur with each other

Clusters and cliques: not the same
It’s important to realize the difference between the two approaches described above. While the clustering algorithm considers similarity in patterns of co-occurrences between tools, the clique approach identifies closely linked groups of tools, that can, however, each also co-occur with other tools in workflows.

In other words, tools/platform that are clustered together occur in similar workflows, but do not necessarily all specifically occur together (see the presence of white and red squares in Figure 4A,B). Conversely, tools that do all specifically occur together, and thus form a clique, can appear in different clusters, as each can have a different pattern of co-occurrences with other tools (compare Figures 3/5).

In addition, it is worth noting that these approaches to identifying workflows are based on statistical analysis of aggregated data – thus, clusters or cliques do not necessarily have an exact match with individual workflows of survey respondents.Thus we are not describing actual observed patterns, but are inferring patterns based on observed strong correlations of pairs of tools/platforms.

Characterizing workflows further – next steps
Our current analyses of tool combinations and workflows are based on survey answers from all participants, for the 119 preset tools in our survey. We would like to extend these analyses to include tools most often mentioned by participants as ‘others’. We also want to focus on differences and similarities of workflows of specific subgroups (e.g. different disciplines, research roles and/our countries). The demographic variables in our public dataset (on Zenodo or Kaggle) allow for such breakdowns, but it would require coding an R script to generate the co-occurrence probabilities for different subgroups. And finally, we can add variables to the tools, for instance , classifying which tools support open research practices and which don’t. This then allows us to investigate to which extent full Open Science workflows are not only theoretically possible, but already put into practice by researchers.

See also our short video, added below:

header image: Turquoise Beads, Circe Denyer, CC0, PublicDomainPictures.net

Academic social networks – the Swiss Army Knives of scholarly communication

On December 7, 2016, at the STM Innovations Seminar  we gave a presentation (available from Figshare) on academic social networks. For this, we looked at the functionalities and usage of three of the major networks (ResearchGate, Mendeley and Academia.edu) and also offered some thoughts on the values and choices at play both in offering and using such platforms.

Functionalities of academic social networks
Academic social networks support activities across the research cycle, from getting job suggestions, sharing and reading full-text papers to following use of your research output within the system. We looked at detailed functionalities offered by ResearchGate, Mendeley and Academia.edu (Appendix 1) and mapped these against seven phases of the research workflow (Figure 1).

In total, we identified 170 functionalities, of which 17 were shared by all three platforms. The largest overlap between ResearchGate and Academia lies in functionalities for discovery and publication (a.o. sharing of papers), while for outreach and assessment, these two platforms have many functionalities that do not overlap. Some examples of unique functionalities include publication sessions (time-limited feedback sessions on one of your full text papers) and making metrics public or private in Academia, and Q&A’s, ‘enhanced’ full-text views and downloads and the possibility to add additional resources to publications in ResearchGate. Mendeley is the only platform offering  reference management and specific functionality for data storage, according to FAIR principles. A detailed list of all functionalities identified can be found in Appendix 1.

functionalities_euler

Figure 1. Overlap of functionalities of ResearchGate, Mendeley and Academia in seven phases of the research cycle

Within the seven phases of the research cycle depicted above, we identified 31 core research activities. If the functionalities of ResearchGate, Mendeley and Academia are mapped against these 31 activities (Figure 2), it becomes apparent that Mendeley offers the most complete support of discovery, which ResearchGate supports archiving/sharing of the widest spectrum of research output. All three platforms support outreach and assessment activities, including impact metrics.

functionalities_31activities

Figure 2. Mapping of functionalities of ResearchGate, Mendeley and Academia against 31 activities across the research worflow

What’s missing?
Despite offering 170 distinct functionalities between them, there are still important functionalities that are missing from the three major academic social networks. For a large part, these center around integration with other platforms and services:

  • Connect to ORCID  (only in Mendeley), import from ORCID
  • Show third party altmetrics
  • Export your publication list (only in Mendeley)
  • Automatically show and use clickable DOIs (only in Mendeley)
  • Automatically link to research output/object versions at initial publication platforms (only in Mendeley)

In addition, some research activities are underserved by the three major platforms. Most notably among these are activities in the analysis phase, where functionality to share notebooks and protocols might be a useful addition, as would text mining of full-text publications on the platform. And while Mendeley offers extensive reference management options, support for collaborative writing is currently not available on any of the three platforms.

If you build it, will they come?
Providers of academic social networks clearly aim to offer researchers a broad range of functionalities to support their research workflow. But which of these functionalities are used by which researchers? For that, we looked at the data of 15K researchers from our recent survey on scholarly communication tool usage. Firstly, looking at the question on which researcher profiles people use (Figure 3), it is apparent that of the preselected options, ResearchGate is the most popular. This is despite the factor that overall Academia.edu report a much higher number of accounts (46M compared to 11M for ResearchGate). One possible explanation for this discrepancy could be a high number of lapsed or passive accounts on Academia.edu – possibly set up by students.

survey_profiles

Figure 3. Survey question and responses (researchers only) on use of researcher profiles. For interactive version see http://dashboard101innovations.silk.co/page/Profiles

Looking  a bit more closely at the use of ResearchGate and Academia in different disciplines (Figure 4), ResearchGate proves to be dominant in the ‘hard’ sciences, while Academia is more popular in Arts & Humanities and to a lesser extent in Social Sciences and Economics. Whether this is due to the specific functionalities the platforms offer, the effect of what one’s peers are using or even to the name of the platforms (with researchers from disciplines identifying more with the term ‘Research’ than ‘Academia’ or vice versa) is up for debate.

survey_profiles_disciplines

Figure 4. Percentage of researchers in a given disicpline that indicate using ResearchGate and/or Academia (survey data)

If they come, what do they do?
Our survey results also give some indication as to what researchers are using academic social networks for. We had ResearchGate and Mendeley as preset answer options in a number of questions about different research activities, allowing a quantitative comparison of the use of these platforms for these specific activities (Figure 5). These results show that of these activities, ResearchGate is most often used as researcher profile, followed by its use for getting access to publications and sharing publications, respectively. Mendeley was included as preset answer option for different activities; of these, it is most often used for research management, following by reading/viewing/annotating and searching for literature/data. The results also show that for each activity it was presented as a preset option for, ResearchGate is used by most often by postdocs, while Mendeley is predominantly used by PhD students. Please note that these results do not allow a direct comparison between ResearchGate and Mendeley, except for the fourth activity in both charts: getting alerts/recommendations.

survey_presetactivities.jpg

Figure 5. Percentage of researchers using ResearchGate / Mendeley for selected research activities (survey data)

In addition to choosing tools/platforms presented as preset options, survey respondents could also indicate any other tools they use for a specific activity. This allows us to check for which other activities people use any of the academic social networks, and plot these against the activities these platforms offer functionalities for. The results are shown in Figure 6 and indicate that, in addition to activities supported by the respective platforms, people also carry out activities on social networks for which there are no dedicated functionalities. Some examples are using Academia and ResearchGate for reference management, and sharing all kinds of research outputs, including formats not specifically supported by  the respective networks. Some people even indicate using Mendeley for analysis – we would love to find out what type of research they are carrying out!

For much more and alternative data on use of these platforms’ functionalities please read the analyses by Ortega (2016), based on scraping millions of pages in these systems.

survey__reported_activities

Figure 6. Research ctivities people report using ResearchGate, Mendeley and/or Academia for (survey data)

Good, open or efficient? Choices for platform builders and researchers
Academic social networks are built for and used by many researchers for many different activities. But what kind of scholarly communication do they support? At Force11, the Scholarly Communications Working Group (of which we both are steering committee members) has been working on formulating principles for scholarly communication that encourage open, equitable, sustainable, and research- and culture-led (as opposed to technology- and business-model led) scholarschip.

This requires, among other things, that research objects and all information about them can be freely shared among different platforms, and not be locked into any one platform. While Mendeley has an API  they claim is fully open,  both ResearchGate and Academia are essentially closed systems. For example, all metrics remain inside the system (though Academia offers an export to csv that we could not get working) and by uploading full text to ResearchGate you grant them the right to change your PDFs (e.g. by adding links to cited articles that are also in ResearchGate).

There are platforms that operate from a different perspective, allowing a more open flow of research objects. Some examples are the Open Science Framework, F1000 (with F1000 Workspace), ScienceOpen, Humanities Commons and GitHub (with some geared more towards specific disciplines). Not all platforms support all the same activities as ResearchGate and Academia (Figure 7), and there are marked differences in the level of support for activities: sharing a bit of code through ResearchGate is almost incomparable to the full range of options for this at GitHub. All these platforms pose alternatives for researchers wanting to conduct and share their research in a truly open manner.

functionalities_open_networks

Figure 7. Alternative platforms that support research in multiple phases of the research cycle

Reading list
Some additional readings on academic social networks and their use:

Appendix 1
List of functionalities within ResearchGate, Mendeley and Academia (per 20161204). A live, updated version of this table can be found here: http://tinyurl.com/ACMERGfunctions.

functionalities-list-total

Appendix 1. Detailed functionalities of ResearchGate, Mendeley and Academia per 20161204. Live, updated version at http://tinyurl.com/ACMERGfunctions

Tools that love to be together

[updates in brackets below]
[see also follow-up post: Stringing beads: from tool combinations to workflows]

Our survey data analyses so far have focused on tool usage for specific research activities (e.g. GitHub and others: data sharing, Who is using altmetrics tools, The number games). As a next step, we want to explore which tool combinations occur together in research workflows more often than would be expected by chance. This will also facilitate identification of full research workflows, and subsequent empirical testing of our hypothetical workflows against reality.

Checking which tools occur together more often than expected by chance is not as simple  as looking which tools are most often mentioned together. For example, even if two tools are not used by many people, they might still occur together in people’s workflows more often than expected based on their relatively low overall usage. Conversely, take two tools that each are used by many people: stochastically, a sizable proportion of those people will be shown to use both of them, but this might still be due to chance alone.

Thus, to determine whether the number of people that use two tools together is significantly higher than can be expected by chance, we have to look at the expected co-use of these tools given the number of people that use either of them. This can be compared to the classic example in statistics of taking colored balls out of an urn without replacement: if an urn contains 100 balls (= the population) of which 60 are red (= people in that population who use tool A), and from these 100 balls a sample of 10 balls is taken (= people in the population who use tool B), how many of these 10 balls would be red (=people who use both tool A and B)? This will vary with each try, of course, but when you repeat the experiment many times, the most frequently occurring number of red balls in the sample will be 6. The stochastic distribution in this situation is the hypergeometric distribution.

memrise-heatmap

Figure 1. Source: Memrise

For any possible number x of red balls in the sample (i.e. 1-10), the probability of result x occurring at any given try can be calculated with the hypergeometric probability function. The cumulative hypergeometric probability function gives the probability that the number of red balls in the sample is x or higher. This probability is the p-value of the hypergeometric test (identical to the one-tailed Fisher test), and can be used to assess whether an observed result (e.g. 9 red balls in the sample) is significantly higher than expected by chance. In a single experiment as described above, a p-value of less than 0.05 is commonly considered significant.

In our example, the probability of getting at least 9 red balls in the sample is 0.039 (Figure 2).  Going back to our survey data, this translates to the probability that in a population of 100 people, of which 60 people use tool A and 10 people use tool B, 9 or more people use both tools.

hg-calculation-example

Figure 2 Example of hypergeometric probability calculated using GeneProf.

In applying the hypergeometric test to our survey data, some additional considerations come into play.

Population size
First, for each combination of two tools, what should be taken as total population size (i.e. the 100 balls/100 people in the example above)? It might seem intuitive that that population is the total number of respondents (20,663 for the survey as a whole). However, it is actually better to use only the number of respondents who answered the survey questions where tools A and B occurred as answers.

People who didn’t answer both question cannot possibly have indicated using both tools A and B. In addition, the probability that at least x people are found to use tools A and B together is lower in a large total population than in a small population. This means that the larger the population, the smaller the number of respondents using both tools needs to be for that number to be considered significant. Thus, excluding people that did not answer both questions (and thereby looking at a smaller population) sets the bar higher for two tools to be considered preferentially used together.

Choosing the p-value threshold
The other consideration in applying the hypergeometric test to our survey data is what p-value to use as a cut-off point for significance. As said above, in a single experiment, a result with a p-value lower than 0.05 is commonly considered significant. However, with multiple comparisons (in this case: when a large number of tool combinations is tested in the same dataset), keeping the same p-value will result in an increased number of false-positive results (in this case: tools incorrectly identified as preferentially used together).

The reason is that a p-value of 0.05 indicates there is a 5% chance the observed result is due to chance.  With many observations, there will be inevitably be more results that may seem positive, but are in reality due to chance.

One possible solution to this problem is to divide the p-value threshold by the number of tests  carried out simultaneously. This is called the Bonferroni correction. In our case, where we looked at 119 tools (7 preset answer options for 17 survey questions) and thus at 7,021 unique tool combinations, this results in a p-value threshold of 0.0000071.

Finally, when we not only want to look at tools used more often together than expected by chance, but also at tools used less often together than expected, we are performing a 2-tailed, rather than a 1-tailed test. This means we need to halve the p-value used to determine significance, resulting in a p-value threshold of 0.0000036.

Ready, set, …
Having made the decisions above, we are now ready to apply the hypergeometric test to our survey data. For this, we need to know for each tool combination (e.g. tool A and B, mentioned as answer options in survey questions X and Y, respectively):

a) the number of people that indicate using tool A
b) the number of people that indicate using tool B
c) the number of people that indicate using both tool A and B
d) the number of people that answered both survey questions X and Y (i.e. indicated using at least one tool (including ‘others’) for activity X and one for activity Y).

These numbers were extracted from the cleaned survey data either by filtering in Excel (a,b (12 MB), d (7 MB)) or through an R-script (c, written by Roel Hogervorst during the Mozilla Science Sprint.

The cumulative probability function was calculated in Excel (values and calculations) using the following formulas:

=1-HYPGEOM.DIST((c-1),a,b,d,TRUE)
(to check for tool combination used together more often than expected by chance)

and
=HYPGEOM.DIST(c,a,b,d,TRUE)
(to check for tool combination used together less often than expected by chance)

excel-sucks

Figure 3 – Twitter

Bonferroni correction was applied to the resulting p-values as described above and conditional formatting was used to color the cells. All cells with a p-value less than 0.0000036 were colored green or red, for tools used more or less often together than expected by chance, respectively.

The results were combined into a heatmap with green-, red- and non-colored cells (Fig 4), which can also be found as first tab in the Excel-files (values & calculations).

[Update 20170820: we now also have made the extended heatmap for all preset answer options and the 7 most often mentioned ‘others’ per survey question (Excel files: values & calculations)]

heatmap-2-tailed-at-a-glance

Figure 4 Heatmap of tool combinations used together more (green) or less (red) often than expected by chance (click on the image for a larger, zoomable version).

Pretty colors! Now what?
While this post focused on methodological aspects of identifying relevant tool combinations, in future posts we will show how the results can be used to identify real-life research workflows. Which tools really love each other, and what does that mean for the way researchers (can) work?

Many thanks to Bastian Greshake for his helpful advice and reading of a draft version of this blogpost. All errors in assumptions and execution of the statistics remain ours, of course 😉