Data are out. Start analyzing. But beware.

Now that we have our data set on research tool usage out and shared the graphical dashboard, let the analysis start! We hope people around the world will find the data interesting and useful.

If you are going to do in depth analyses, make sure to read our article on the survey backgrounds and methods. It helps you understand the type of sampling we used and the resulting response distributions. It also explains the differences between the raw and cleaned data sets.

For more user friendly insights, you can use the graphical dashboard made in Silk. It is easy to use, but still allows for quite sophisticated filtering and even supports filtering answers to one question by answers given to another question. Please be kind on Silk: it crunches a lot of data and may sometimes need a few seconds to render the charts.

example chart with filter options

Example chart that also shows filter options in the dashboard

When looking at the charts and when carrying out your analyses, please note two things.

First, whatever you are going to do, make sure to reckon with the fundamental difference between results from preset answers (entered by simply clicking an image) and those from specifications of other tools used (entered by typing the  tool names manually). The latter are quite probably an underestimation and thus cannot be readily compared with the former. [Update 20160501: This is inherent to the differences between open and closed questions, of which ease of answering the question is one aspect. Specifications of ‘others’ can be seen as an open question]. This is why we present them separately in the dashboard  Integrated lists of these two types of results, if made at all, should be accompanied with the necessary caveats.

Frequency distribution of survey answers

Frequency distribution of 7 preset answers (dark blue) and the first 7 ‘other’ tools (light blue) per survey question

Second, basic statistics tells that when you apply filters, the absolute numbers in some cases can become so low as to render the results unfit for any generalization. And the other way around: when not filtering, please note that usage patterns will vary according to research role, field, country etc. Also, our sample was self-selected and thus not necessarily representative.

Now that we are aware of these two limitations, nothing stops you (and us) to dive in.

Our own priorities, time permitting, are to look at which tools are used together across research activities and why that is, concentration ratios of tools used for the various research activities, and combining these usage data with data on the tools themselves like age, origin, business model etc. More in general, we want to investigate what tool usage says about the way researchers shape their workflow: do they choose tools to make their work more efficient, open and/or reproducible?  We also plan to do a more qualitative analysis of the thousands of answers people gave to the question what they see as the most important development in scholarly communication.

By the way, we’d love to get your feedback and learn what you are using these data for, whether it is research, evaluation and planning of services or something else still. Just mail us, or leave your reply here or on any open commenting/peer review platform!

4000 survey responses – geographical distribution and the need for translation

Last week we silently passed the 4,000 responses mark on our survey. With the summer season waning it seems a good moment to look at where we stand. The survey has been running for 15 weeks, with another 23 weeks to go. We’re glad to have 4,000 responses, but they are not nearly enough to allow for detailed analyses, e.g. by field and country. We would like to see that number double or triple before the survey ends on February 10, 2016. And what is perhaps more important: we would like to see a more or less even global distribution.

A self-selected non-probability sample as the one we work with is bound to have a lot of biases in the response, due to uneven distribution and uptake across groups and countries. The levels of survey uptake in countries is probably affected by:

  • (Effect of) distribution and promotion actions
  • Propensity of people in a certain country to take surveys
  • Degree to which a survey on research tools is considered relevant or interesting
  • Ability of targets groups to understand the survey, largely due to differences in (foreign) language proficiency
click to enlarge

Response levels per 100 billion US$ GDP at August 22, 2015, weighted average = 5,1

This map shows the geographical variation in uptake of our survey. To make things comparable we need to use relative numbers of responses. Ideally we’d have them relative to the number of researchers in each country. However those figures are not available for most countries. Instead we use GDP of 2013/2014 (Worldbank data) as a proxy as we expect countries to have more active researchers if their economy is larger.

The map shows response levels at or above average (green) in many countries in Europe, Oceania and Canada. Uptake in Russia, Latin America and South Asia is below average (orange/yellow). Despite many responses from the US, that country is also still slightly below average with 4.53. Levels in many countries in East Asia, the Arab World and Africa are very low (red) or even zero (white).

As said, many factors come into play, but it seems obvious that to increase levels outside Europe and Anglo-Saxon countries, translation into a few world languages would help. To find out which languages are the most important for us, we calculated for all language areas the number of responses needed to get below average country levels to the average, relative to their GDP:

language responses needed to get to average
Chinese, simplified 484
Japanese 170
Arabic 124
Spanish 104
Portuguese 82
French 68
Korean 59
Russian 56
Bahasa Indonesia 43

This means that we are now working towards having the survey and some other texts translated into …

  • simplified Chinese
  • Japanese
  • Arabic
  • Spanish
  • French
  • Russian

whereas we hope to increase uptake in Brazil, Korea and Indonesia by partnering with local institutions to distribute the English version of the survey.

We are looking for support in reviewing, testing and distributing the translations in these six languages. If you have any ideas or contacts that might be helpful for that, please let us know!