All posts by Danny Kingsley

Me, Myself and Data – David Marshall

For Love Data Week (12th-16th February 2018) we are featuring data-related people. Today we talk to David Marshall, FutureLib Project Coordinator, University Library

Telling Stories with Data

Let’s start with an easy one. What kind of data do you work with and what do you do with it?

I deal primarily with qualitative data collected through working with people, gathered in a number of different ways. Research methods tend to include things such as in-depth interviews, observation and shadowing, diary studies, as well as various remote data capture mechanisms. I am usually looking for a mix of attitudinal and behavioural data; comparing what people say with what they actually do. I use the insights and findings arrived at through the analysis of this data to make recommendations for service design and delivery on behalf of Cambridge University libraries.

Tell us how you think you can use data to make a difference in your field.

I am an advocate for the importance of using research and research data to understand the wider lives of people who use a product or service. This has long been an established principle in service design and delivery in the commercial sector, and libraries in UK Higher Education are learning to adopt this in order to tailor their services to the approaches, goals, needs and behaviours of their users. The data I work with often highlights aspects of the study and research lives of Cambridge students and academic staff which it would be difficult to fully uncover and explore through more ‘traditional’, quantitative methods, such as usage statistics and surveys. This in-depth, qualitative study of people provides valuable insights which can be used to inform the development of services and working practices that affect those people.

My ‘field’ is working within and for University of Cambridge library services; slightly oddly I am often conducting research, with researchers as the subjects of that research, with the aim of developing services that support research!

How do you talk about your data to someone outside of academia?

I’m going to turn this one on its head as, although I work within academia, I’m not involved in what would typically be described as academic research. I tend to refer to what I do as design research, i.e. with the end goal of using the data gathered and insights arrived at to inform service design and delivery. I often talk of ‘stealing’ methods from academic disciplines and areas such as anthropology and ethnography, and from the commercial design world. This can involve immersive research techniques such as ethnographic observation, or quick, easily-deployed techniques such as card sorting exercises and ad-hoc interviews. In terms of the data itself, I often talk of patterns emerging and insights developing. Immersing myself in the data over the course of its collection, through activities and tasks such as transcription, and again through the analysis process helps things to ‘take root’ and for these patterns and insights to become more clear.

Connected Conversations

What data-related challenges do you have to deal with in your research environment?

Collecting the data in the first place is one of the biggest challenges for me. To do my work I need data from real people, leading real, busy lives. Finding and connecting with the people I need to work with is a constant challenge. Happily, the need to go to people where they are and work around their schedules in fact leads to better data; I would much rather talk to people about their studies, research, or other aspects of their lives when they are in the middle of doing those things! Another challenge is finding the right tool for the job; over the years I have been lucky enough to work with and learn from people who have extensive experience in a wide variety of research techniques, but it can still be tricky to match the appropriate method/s to a specific question or area of study.

To add a more data-related challenge to the list of data-related challenges…: I deal with a lot of personal data, not just names and demographic information but a large amount of qualitative data gathered from individuals about their lives; their goals, motivations, points of frustration, and so on. This leads to challenges in terms of how data is collected, stored and used, even how it is considered during the analysis process.

How do you think these challenges might be overcome?

For my first points: the old Carnegie Hall adage… practice, practice, practice! Relationship building and communication is a huge part of what I do day-to-day; each time I need to find research participants it becomes a little easier due to the continuous work done in this regard over the years I have been working in my role.

For the latter: I think appropriate awareness is part of the battle. Working with research data, particularly that gained from working with people, demands high levels of awareness and an emphasis on reflection, and so it should! It is important to see qualitative data in context, for many reasons, and to be constantly aware of the ethical implications of its analysis and use.

If you were in charge what data-related rule would you introduce?

That every person I’m interested in finding out more about needs to supply me with it tout suite, please and thank you. No, that might be going too far…!

Without being specific, anything which increases the transparency of what will happen to data after it has been gathered is a good thing. I rarely struggle to get people to consent to participating in research once I have found and approached them, and am as transparent as I can be about why I need their data, what I’m planning to do with it, and where it will end up. Maybe I’m blessed by the context within which I work, and might be slightly naïve, but I can’t help but think that on any scale and in any circumstance this emphasis on transparency might be quite a useful thing.

We are Data

Tell us about your happiest data moment.

Around two years ago we (Futurelib) finished the data gathering phase of a project, Protolib, looking at the design of physical study spaces. We had prototyped different study spaces based on the findings of a collaborative design process conducted with Cambridge students and researchers. We conducted hours (as in 300+ hours…) of observation in these prototype spaces, and gathered data in various other ways, such as interviews with people leaving the spaces, feedback walls, comment cards and questionnaires. The first thing we did as researchers after this was to brainstorm the insights we had arrived at from this work. To see themes and ideas emerging so quickly, and to see them backed up and added to by the research data, was amazingly fulfilling. This is what ‘sold’ me on the value of ethnographic techniques; we had immersed ourselves so fully in the environments under study that we understood them to an extent which I would not have previously thought possible.

What advice do you have for someone who is just embarking on a career in your field?

Want to learn. Get interested in people; who they are, how they think and what they do. I don’t much like the idea of the cold, disinterested researcher. Whilst being aware of your own potential biases, and biases based on what you learn and uncover, care about the people you are working with and try to emphasise as far as is possible with what is important to them. If you don’t like talking to people and finding out about the way they work, this is possibly not quite the right job for you. Of course, there are areas of research in which disinterestedness is probably a very valuable characteristic, I just don’t think this applies to what I do.

What do you think the future of research data looks like?

Speaking about the context within which I work day-to-day, I think the future looks bright! Libraries and HE institutions are becoming increasingly interested in finding out more about the people their services support. In my area of work, usage statistics, quantitative survey mechanisms and other similar methods will always provide the broad strokes, and this is great. It is, however, absolutely not where gathering data should stop. I cannot over-emphasise the value of qualitative approaches and qualitative data in providing actionable evidence for service design.

In terms of the future of research data more generally, I don’t feel too qualified to comment… I would tentatively assume and hope that data will become more accessible, less owned, more malleable, and through this invite more discussion, criticism and conversation.

There is A LOT of data out there about all sorts of things and it is being collected all the time. Does anything frighten you about data?

From a personal standpoint, potentially losing focus. It is almost the modus operandi of what I do to collect as much data as possible about the lives of people studying and working at the University of Cambridge, so I do feel that I collect ‘A LOT’. I sometimes wonder about the nature of the data I gather, as I’m keen to emphasise with participants that I’m interested in all aspects of the ways in which they work, and more widely, the ways in which they live. This does, on occasion, lead to people sharing quite personal aspects of their lives. There are obvious concerns around how this data is handled and used, but, as mentioned previously, I feel that an appropriate level of awareness and diligence in this regard is a good starting point for working with this kind of data in a sensible, conscientious way.

Published 16 February 2018
Written by David Marshall
Creative Commons License

Me, Myself and Data – Keren Limor-Waisberg

For Love Data Week (12th-16th February 2018) we are featuring data-related people. Today we talk to Keren Limor-Waisberg , Founder and CEO of the Scientific Literacy Tool. Advocating for open access, citizen science and scientific literacy.

Telling Stories with Data

Let’s start with an easy one. What kind of data do you work with and what do you do with it?

I help people from all walks of life access, understand, and/or use scientific data and literature concerning their scientific topics of interest.

Tell us how you think you can use data to make a difference in your field.

A scientifically literate society is a society in which people are empowered with knowledge they can use to achieve their different goals. As we look at data and understand it, we acquire skills that are essential for both our personal and our societal development.

How do you talk about your data to someone outside of academia?

When I talk about data with someone outside academia, I will take the time to define any new terminology and make sure we understand each other.

Connected Conversations

What data-related challenges do you have to deal with in your research environment?

The main data-related challenge non-academics have is the access to data. Many datasets are simply not accessible to the public. Sadly, some datasets are not even available for other academics.

Once access is achieved, people will struggle with different formats, lack of metadata, different units and/or the lack of tools to process and analyse the data.

How do you think these challenges might be overcome?

The challenge of open data, making data accessible, is currently addressed by many countries. The European commission for communications networks, content and technology (CNECT), for example, are formulating directives that aim to open up and help reuse publicly funded research data in Europe.

Different organisations are now developing tools and packages that will help people work with datasets.

If you were in charge what data-related rule would you introduce?

As a citizen of the World, I advocate for open access, citizen science and scientific literacy so as to promote the understanding that knowledge empowers both individuals and societies to develop and prosper. To make this progress, I think we need to agree on common ethical guidelines – from the right of access to the right of use of publicly funded data.

We are Data

Tell us about your happiest data moment.

My happiest data moment was during my PhD. I calculated the performance of some viral elements using different tests. I had a lot of data and it took a while for the scripts to run. It was nerve-racking. I can still remember sitting there listening to the screeching sounds of the computer. And then one by one I got the results, and they all confirmed my hypothesis. It was great. It was a small piece of scientific knowledge, but I was the first person in the world to know about it.

What advice do you have for someone who is just embarking on a career in your field?

For someone embarking on a career in the field of promoting scientific literacy, I would recommend to be very patient. It is a slow process and there are many obstacles, but at the end it is a very rewarding profession.

What do you think the future of research data looks like?

I think that in the near future we will have much more publicly funded research data accessible. We will see more and more tools emerging to handle this data. More and more people will use this data and tools to make their statements, to dispute ideas, to create products and services, to entertain, or perhaps just to enjoy finding something new.

There is A LOT of data out there about all sorts of things and it is being collected all the time. Does anything frighten you about data?

I think it is important to make sure privacy and identities are protected when data is collected and shared.

Published 15 February 2018
Written by Keren Limor-Waisberg
@TheLiteracyTool, @OpenResCam
Creative Commons License

Me, Myself and Data – Melissa Scarpate

For Love Data Week (12th-16th February 2018) we are featuring data-related people. Today we talk to Melissa Scarpate, Research Associate in the Faculty of Education, PEDAL Centre.

Telling Stories with Data

Let’s start with an easy one. What kind of data do you work with and what do you do with it?

I work with large longitudinal data sets and large cross-cultural data. I really enjoy running latent growth models with the longitudinal data to assess changes over time in my variables of interest (primarily child/adolescent self-regulation and parenting). I use the cross-cultural data to test for differences or similarities in parenting and adolescent developmental outcomes.

Tell us how you think you can use data to make a difference in your field

By using large data sets that either have many time points or have many different countries and cultures represented, I am able to assess relationships between study variables in an impactful way. For instance, if I find that parental monitoring predicts lower levels of adolescent anxiety in 13,000 adolescents across 10 countries then I feel this information has a larger impact on families in a more global way than using a local data set with a small sample size.

How do you talk about your data to someone outside of academia?

Very carefully! I eliminate jargon and speak about the data in broad, general terms rather than getting into details. I would rather the person come away from our conversation with an understanding of what I do, what I have found in my research, and how this impacts society than to impress them with fancy words and statistics.

Connected Conversations

What data-related challenges do you have to deal with in your research environment?

I work in an office with others and it is important for each of us to keep the data confidential and out of sight of one another.

How do you think these challenges might be overcome?

Screen protectors, earplugs in the case of video coding, not printing any data, etc.

If you were in charge what data-related rule would you introduce?

If I were in charge of all data then I would create a rule that all data could be shared in an easy and collaborative way whilst maintaining study participants’ anonymity.

We are Data

Your happiest data moment?

When I finally got my latent growth model to run!

What advice do you have for someone who is just embarking on a career in your field?

Take as many classes in data management, methods, and statistics that you can and get experience in these concepts with researchers that have excellent skills and training in these areas while in graduate school.

What do you think the future of research data looks like?

Open, transparent, simplified with data visualisation techniques, and impactful.

There is A LOT of data out there about all sorts of things and it is being collected all the time. Does anything frighten you about data?

Technological advances such as my phone being able to predict where I am driving before I leave and my Echo Dot/Alexa picking up all of my conversations make me nervous. The benefits, so far, outweigh potential negatives (at least as I have experienced so far).

Published 14 February 2018
Written by Melissa Scarpate
Creative Commons License

Me, Myself and Data – Kirsten Lamb

For Love Data Week (12th-16th February 2018) we are featuring data-related people. Today we talk to Kirsten Lamb, Deputy Librarian, Department of Engineering.

Telling Stories with Data

Let’s start with an easy one. What kind of data do you work with and what do you do with it?

I work with bibliometric data for the most part. I’m not very systematic and tend to be rather intuitive about how I gather and work with it in order to tease out insights. Because I don’t usually have a specific question to get out of the data I tend to just explore to find out what is interesting about a body of literature.

Tell us how you think you can use data to make a difference in your field.

The idea that librarians can help define a research landscape and identify gaps is relatively new. I like to think that by learning to work with bibliometric data I can help researchers better engage with information professionals and give librarians the confidence to use their skills in the research context.

How do you talk about your data to someone outside of academia?

I tell them that by looking at patterns in publishing researchers can see where trends and gaps are, as well as exploiting those patterns to have a larger impact. But I also make sure to point out the fact that basing insights off of metrics is flawed. You have to understand what each metric is and isn’t measuring. None of the metrics are an indication of the quality of an individual piece of research and there’s no replacement for critical analysis to determine that.

Connected Conversations

What data-related challenges do you have to deal with in your research environment?

First, I don’t have a background in statistics or programming, so there’s a limit to how complex an analysis I can do. Second, the metrics themselves are limited, so communicating the value of the information embedded in the data is a challenge. Third, a lot of bibliometric software is based on use of a particular database’s API so it’s difficult to combine results from different databases to give a broader picture.

How do you think these challenges might be overcome?

All of them would be helped if I could learn to programme! Collaborating with someone who knew how to do the actual analysis bit would be great because that way I could provide the insights and figure out exactly what I wanted to measure and they could make it happen.

If you were in charge what data-related rule would you introduce?

People who write software that does data analysis would make it more user-friendly for people who don’t know how to code. Basically there’d be a WYSIWYG/Microsoft Excel-style programme for doing bibliometric analyses and generating beautiful graphics based on it that didn’t require any coding.

We are Data

Tell us about your happiest data moment.

I was pleased when I discovered that Web of Science does a lot of the analysis I wanted to do but thought I could only do if I had InCites or similar. As much as I like knowing what’s being measured and having an intimate knowledge of the data, sometimes it’s nice to just be able to click a few buttons and get a nice graph!

What advice do you have for someone who is just embarking on a career in your field?

I’d want to tell them that they don’t need to be a maths or programming whiz to do it, but I’m not yet convinced of that myself! I think the main thing is not to think of some metrics as good and some as bad. They’re all just tools and you need to know what they do in order to pick which ones you want to use. Always look under the bonnet!

What do you think the future of research data looks like?

While I’d love for it to be open, interoperable, integrated and well-indexed, I’m not sure that’s going to happen any time soon. Each time someone develops a new standard to rule them all, it just gets added to the growing list of standard

There is A LOT of data out there about all sorts of things and it is being collected all the time. Does anything frighten you about data?

Yes. I don’t think that as a species we’ve really figured out what it means to live in a data-rich ecosystem, and I mean that both metaphorically and literally. The rate at which data is growing is currently unsustainable from the perspectives of preservation, legislation, interpretation and energy use. While I’m definitely uncomfortable with how much certain companies know about me, I’m more concerned with the fact that collecting and managing that amount of data about everyone and everything is bad for the planet and we haven’t figured out how to make sure it’s safe. We need legislation and curation to catch up with technology instead of lagging about a decade behind.

Published 13 February 2018
Written by Kirsten Lamb
Creative Commons License

Me, Myself and Data – Dr Sudhakaran Prabakaran

For Love Data Week (12th-16th February 2018) we are featuring data-related people. Today we talk to Dr Sudhakaran Prabakaran, Lecturer/ Group leader, Department of Genetics.

Telling Stories with Data

Let’s start with an easy one. What kind of data do you work with and what do you do with it?

We use population level sequencing data sets from TCGA, mutation datasets from COSMIC, ClinVar, HGMD, curated database from other labs. We use discarded datasets, negative datasets, already published datasets, anything and everything. We develop and use structural genomics, mathematical modelling and machine learning tools to analyse mutations that map to noncoding regions of the human genome.

Tell us how you think you can use data to make a difference in your field.

We live on these datasets. Biological data is going to exceed 2.5 Exabytes in the next two years, and the bottleneck is the analysis of these datasets. Our job is to find patterns in these datasets. Rare variants and driver mutations become significant and identifiable only when we look for them in a population context.

How do you talk about your data to someone outside of academia?

​For us it is not difficult. The datasets we are using are generated and curated by governmental and international consortiums. They have done the bulk of publicity. For example, the TCGA dataset has all kinds of data from thousands of cancer patients and is curated by the NIH. The power of this data is for all to see. I just say we try to aid in cancer diagnosis by crawling through these datasets to find patterns.

Connected Conversations

What data-related challenges do you have to deal with in your research environment?

We are happy with the publicly available datasets. Our problem starts with the datasets we collect. How to store, analyse, and make it available for everyone to use are the questions we are trying to answer all the time.

How do you think these challenges might be overcome?

I am an ardent proponent of cloud-storage and computation. I believe that is the future. I am also aware that some countries are concerned with data migration outside their geographical boundaries.

If you were in charge what data-related rule would you introduce?

I am not going to make up anything new. Past US Presidents have made laws like any data generated with public funds should be made available.

Governmental organisations should demystify cloud based storage and computation processes. People are unduly worried. People are giving away more personal data wilfully on Facebook, Twitter, Instagram than through genome sequences collected by public consortiums.

Tell us about your happiest data moment.

It is not one moment, it is a series of moments up until now. I can run a viable research program with no startup money or funds just by scavenging through publicly available datasets.

What advice do you have for someone who is just embarking on a career in your field?

Learn machine-learning and cloud-computing

What do you think the future of research data looks like?

Lots of data analysis than data generation

There is A LOT of data out there about all sorts of things and it is being collected all the time. Does anything frighten you about data?

I am in fact excited. I believe we need to train more data scientists. We are in good times. Data is becoming truly democratic!

Published 12 February 2018
Written by Dr Sudhakaran Prabakaran
Creative Commons License

The Open Research Pilot – one year in

As we close in on the halfway point for the Open Research Pilot between the University of Cambridge and the Wellcome Trust, how are things going?

Well, in many ways exactly as expected. The primary issues that we are facing are a lack of sustainable support for infrastructure and a lack of reward and incentive to work openly. None of this is news, and while no new issues are being surfaced by the Open Research Pilot, having the dialogue is helping the participating researchers exchange ideas and the Wellcome Trust develop new services and policies.

This was the take home message from the second full meeting of the Open Research Pilot participants held in London at the Wellcome Trust on 13 September. Given one of the main goals of the project is to learn what the barriers and incentives are for open research and to share these findings with others interested in the subject to inform policy development, it seems the Pilot is on track.

This blog summarises the discussions and ideas that arose at the event. A full write-up and the presentations are now available in the Open Research Pilot collection in Apollo, the University repository. The notes are also available from the pilot kick-off meeting held in January in Cambridge.

As a memory jogger, information about the Pilot and the different people and research projects involved are available. A blog describing the kick off meeting is also online as part of our new Open Adventures blog platform.

Outcomes of the meeting

Main issues raised:

  • Today a successful researcher and a good researcher are not necessarily the same thing
  • Time is a big issue. It takes time to annotate data sets to be made useable to others
  • The current incentive and reward structure is a barrier to change
  • The ethos needs to change with regard to the need to publish in particular journals
  • There is a need to re-define what is valuable and this may need to be defined at the discipline level
  • There is a mismatch between the reliance the research community have on certain resources and the availability of funding for long-term sustainability
  • There is a need for dedicated staff to manage sharing of research at the institute or department level

Possible solutions:

  • The new publishing platform, Wellcome Open Research is cheaper and faster than traditional publishing outlets
  • There are movements towards international approaches to collectively funding scientific infrastructure
  • The participants have found having access to library colleagues through this Open Research Pilot Project has been useful for figuring out where to put their research data – this does raise questions about future library services
  • We need strong leadership to drive the change to Open Research and given the risk adverse nature of institutions, change needs to be led by funders


Summary of the discussions

Wellcome Open Research update

Robert Kiley and David Carr gave a progress report on what had been happening in open science at the Wellcome Trust. They gave an update on the new Wellcome Trust open publishing platform, Wellcome Open Research.

When Wellcome Open Research was launched, ‘success’ was defined as 25-30 publications in a year. However in less than a year, more than 100 items have been published on this platform from a broad range of institutions. While half the publications are research articles, the rest are other output types such as data notes, software studies and protocols. This is important, given the new requirement of the Wellcome Trust to share all research outputs.

The platform is relatively popular as well. A comparison of the volume of Wellcome Trust funded publications across the range of publications showed Wellcome Open Research was found to be the fourth most used after Scientific Reports, PLoS ONE and Nature Communications.

This is significant because the average cost of publication on the Wellcome platform is of the order of £700, which is significantly lower than the average cost of the other named publications (generally around £2000).  It is not just cheaper, it is faster as well. Robert described an example where an item was submitted, reviewed, approved, published and made discoverable and then requests were received for the data within a three week period.

Open data, Funding and Sustainability

Recently as part of this Pilot, the OSC published a series of blogs discussing the problem of supporting infrastructure, from the researcher perspective, the funder perspective, and that of the university library. The group discussed the serious problem with infrastructure being funded at a grant level but the data is used by the whole community. Funders do not necessarily fund ongoing infrastructure which is in competition with new requests to fund new ideas. Another question is whether it is even the funder’s job to provide long-term sustainability?

The point was raised that there is a mismatch between a reliance on certain resources on the one hand, but a reluctance to fund for long-term sustainability.  For example, when arXiv (an e-print service, operated by Cornell University) asked the physics community to provide support, they thought that it was the library’s responsibility to provide funding, not theirs.  Similarly, Canadian Health Research heavily rely on GenBank, but do not contribute to the costs of this resource.

This problem is recognised internationally and there are some attempts to address the problem. Earlier this year there was a meeting of several major funding organisations, from which a strong consensus emerged that core data resources for the life sciences should be supported through a coordinated international effort(s) that better ensure long-term sustainability and that appropriately align funding with scientific impact. There is also some work to to build a stable and sustainable infrastructure for biological information across Europe

Support for Open Research activities

One question posed to the researchers in the group was: what support from their institutions and funders would they want to make their data more accessible? It was commented that time was a big issue. For example, it takes time to annotate data sets to be made useable to others.  One group said that they could write protocols and a series of articles to put on Wellcome Open Research, which would be a good thing, but it would take the team a long time.

There appears to be a need for dedicated staff to manage sharing of research at the institute or department level.  It was commented that having had access to library colleagues through this Open Research Pilot Project has been useful for figuring out where to put their research data. An action was taken for the library component of the group to think about what support is being provided in this context (and into the future).

Open Research and Culture

In the current climate it is easy to identify a successful researcher.  A successful researcher has prizes, publishes in particular journals with high impact factors and has grants and funding. But a successful researcher and a good researcher are not necessarily the same thing. One of the blockers for a future Open Research environment seems to be the research community itself.  For example, the current incentive and reward structure is a barrier to change and there is a need to re-define what is valuable and this may need to be defined at the discipline level.

Some suggestions that arose in the discussion were:

  • a data re-use prize by Wellcome Trust
  • only provide grants or Fellowships to institutions or departments that have signed or support the Declaration on Research Assessment (DORA)
  • travel fellowships awards for good Open Research practices (noting that credit should be given to the individual winning the award and not the head of the laboratory who was the recipient of the original grant)

By ‘fighting on different fronts’, slowly the research environment might change. We need strong leadership to drive the change to Open Research, and the leadership needs to come from funders and institutions for the researchers to align themselves with working openly. But institutions are very risk adverse with activities that could jeopardise funding, so change needs to be led by the funders.

The hybrid question

While open access could be a vehicle for improving open research, the route to achieve this is debatable. The group asked whether funders could insist on green open access or only pay for truly open access journals. If payment for articles in hybrid journals, for example, was stopped, the money saved could be used to invest in other aspects of open research.

An alternative option discussed was whether the value given for APCs be limited, say to $1000? But this would be very difficult to implement, as an indicator, the SCOAP3 project took five years to get off the ground.

The question arose: if Wellcome Trust stopped paying for open access in hybrid journals, would researchers stop applying for funding? The feeling was no. But researchers perceive that when applying for grants, their record for publications in particular journals is very important.  The ethos needs to change with regard to the need to publish in particular journals.

Next steps

The Office of Scholarly Communication is coordinating an “In Conversation” event on 5 December to give researchers the opportunity to talk to Wellcome Trust representatives about their Policy on data, software and materials management and sharing.

We are also looking to find evidence that data is reused.

The Wellcome Trust will be using the group as scoping group for a proposal that the Wellcome Trust build a repository by sharing the draft requirements.


Published 10 November 2017
Written by Dr Danny Kingsley, based on notes by Dr Debbie Hansen
Creative Commons License

RDN Open Research Panel: the researcher’s perspective

By Jennifer Harris and Tim Fulton

The 4th Research Data Network event organised by JISC took place over 2 very rainy days at the end of June at the University of York.  It was a packed schedule of talks, technology demonstrations and interactive sessions covering a full gambit of topics related to Research Data Management and Open Data.  Rounding off the event was a panel discussion organised by the Office of Scholarly Communication at the University of Cambridge on the subject of Open Research and in particular the Open Research pilot project that is currently being run here in conjunction with the Wellcome Trust.

The Open Research pilot project is formed of a number of research groups from across a range of disciplines all of whom are  intending to share not only complete articles but also single data sets or unexplained, yet reproducible, results. The pilot intends to look at the benefits and barriers of open research for the researchers and what support libraries can provide to facilitate open research.

Sitting on the panel were:

  • David Carr from the Wellcome Trust, offering his opinions on Open Data from the funder’s perspective
  • Lauren Cadwallader from the Office of Scholarly Communications, discussing how Academic support staff can deal with the demands of Open Data (and recalcitrant academics)
  • Tim Fulton a researcher in Cambridge University’s Department of Genetics and participant in the open research pilot
  • Jennifer Harris, a researcher from the UCL/Birkbeck Centre for Planetary Sciences at Birkbeck University of London.

Leading the discussion was Marta Teperek, formerly of the University of Cambridge but now working as a Data Steward at TU Delft .

After a brief introduction from each of the panelists it was over to the audience for questions.  Questions were solicited from the room in two ways, the traditional ‘stick your hand up and someone will bring you a microphone’ method and the newer medium of, enabling the shyer members of the audience to still participate without having to identify themselves or speak up in front of approximately 100 people.  The range of expertise and experience on the panel was reflected in the questions, with topics of discussion ranging from how to to fund Open Science, (should it be included in block grants?) to what the panelists find most frustrating about the current methods of publishing non-traditional outputs including data and how best to persuade academics who are wary of data sharing that making their research open would be a good thing.

As two of the only people at the conference who are currently employed as full-time academic researchers this was an interesting and thought-provoking session that we were both glad to have had the chance to contribute to.

Views from an experienced Open Researcher – Jennifer Harris

As most of the audience were academic research data managers or related professions I definitely felt I was there to give a view from the other side, and to demonstrate to them that there are some researchers out there who do care about Open Research and are willing to work with the RDM community to make it a reality.  As the only member of the panel not involved in the Cambridge Open Research Project my contributions were also more general, offering a perspective from a field not involved in the pilot but one that does have a lot of experience with open research in the form of open (and free) data and citizen science.

Open Research is a multi-faceted issue and it was clear that everyone in the room had a good understanding of this and the complications that inevitably arise when attempting to promote it to a community of academics who already have a heavy load of pressures and demands on their time.  As a researcher currently employed on a postdoctoral contract I can get a little defensive when I hear members of the academic support community complain about researchers not engaging with their efforts to promote a new service or perform some new task that they’re requesting.  The task in question is sometimes only a small one such as uploading a paper to an institutional repository post-publication.  But these small tasks can sometimes be the final straw when it comes to managing your workload.

It was refreshing therefore to be in a room of people who mostly seemed to get that and genuinely wanted to understand how they could best get across to researchers that what they’re requesting when they push open research may involve an increase in workload but that it will (or at least should) be something that will pay off in the longer term.  A lot of the discussion kept coming back to the fact that what is ultimately required is an entire change of culture and that’s something that everyone will have to be involved in, from publishers to support staff to researchers at all levels (but most importantly at the top).

Views from a Newcomer – Tim Fulton

Over the two day meeting I was encouraged to speak to so many RDM community members who understood that open research policy and platforms are only as good as the  engagement of researchers and the research community. In talking to delegates it became clear to me that there were two clear main concerns with the project: getting researchers involved, and maintaining funding to ensure sustainability of the repositories. Hearing that the Wellcome Trust are in discussion about the sustainability of their project is heartening, if not currently lacking in detail, however at a pilot stage this is not to be unexpected.

It too was encouraging that the RDM community and pilot scheme organisers are keen on understanding what the current obstacles to sharing data are – most commonly mentioned were time and the fear of being ‘scooped’ by someone else using your data. Our laboratory intends to publish data following the publication of a pre-print article thereby securing our ownership of our conclusions without overly delaying the sharing of the data. We also intend to publish detailed methods to explain how our data was generated, addressing the repeatedly mentioned frustration with traditional journals of poor method description and non-reproducibility of results.

This project is a cornerstone in changing the attitude of researchers to data: changing the current culture to hoard data to one where data is a community possession. Whilst this may take time and there will be issues along the way I foresee this project creating a better research environment. We have begun the process of preparing data for publication at the time of collection rather than as a subsequent publication step. The hope is that these small changes to daily working practice should make the general research community more efficient and fruitful going forwards which is good for researchers, funders and the general public.

Published 23 October 2017
By Jennifer Harris and Tim Fulton
Creative Commons License

Sustaining long-term access to open research resources – a university library perspective

Originally published 11 September 2017, Written by Dave Gerrard

In the third in a series of three blog posts, Dave Gerrard, a Technical Specialist Fellow from the Polonsky-Foundation-funded Digital Preservation at Oxford and Cambridge project, describes how he thinks university libraries might contribute to ensuring access to Open Research for the longer-term.  The series began with Open Resources, who should pay, and continued with Sustaining open research resources – a funder perspective.

Blog post in a nutshell

This blog post works from the position that the user-bases for Open Research repositories in specific scientific domains are often very different to those of institutional repositories managed by university libraries.

It discusses how in the digital era we could deal with the differences between those user-bases more effectively. The upshot might be an approach to the management of Open Research that requires both types of repository to work alongside each other, with differing responsibilities, at least while the Open Research in question is still active.

And, while this proposed method of working together wouldn’t clarify ‘who is going to pay’ entirely, it at least clarifies who might be responsible for finding funding for each aspect of the task of maintaining access in the long-term.

Designating a repository’s user community for the long-term

Let’s start with some definitions. One of the core models in Digital Preservation, the International Standard Open Archival Information System Reference Model (or OAIS) defines ‘the long term’ as:

“A period of time long enough for there to be concern about the impacts of changing technologies, including support for new media and data formats, and of a changing Designated Community, on the information being held in an OAIS. This period extends into the indefinite future.”

This leads us to two further important concepts defined by the OAIS:

Designated Communities” are an identified group of potential Consumers who should be able to understand a particular set of information”, i.e. the set of information collected by the ‘archival information system’.

A “Representation Information Network” is the tool that allows the communities to explore the metadata which describes the core information collected. This metadata will consist of:

  • descriptions of the data contained in the repository
  • metadata about the software used to work with that data,
  • the formats in which the data are stored and related to each other, and so forth.

In the example of the Virtual Fly Brain Platform repository discussed in the first post in this series, the Designated Community appears to be: “… neurobiologists [who want] to explore the detailed neuroanatomy, neuron connectivity and gene expression of Drosophila melanogaster.” And one of the key pieces of Representation Information, namely “how everything in the repository relates to everything else”, is based upon a complex ontology of fly anatomy.

It is easy to conclude, therefore, that you really do need to be a neurobiologist to use the repository: it is fundamentally, deeply and unashamedly confusing to anyone else that might try to use it.

Tending towards a general audience

The concept of Designated Communities is one that, in my opinion, the OAIS Reference Model never adequately gets to grips with. For instance, the OAIS Model suggests including explanatory information in specialist repositories to make the content understandable to the general community.

Long term access within this definition thus implies designing repositories for Designated Communities consisting of what my co-Polonsky-Fellow Lee Pretlove describes as: “all of humanity, plus robots”. The deluge of additional information that would need to be added to support this totally general resource would render it unusable; to aim at everybody is effectively aiming at nobody. And, crucially, “nobody” is precisely who is most likely to fund a “specialist repository for everyone”, too.

History provides a solution

One way out of this impasse is to think about currently existing repositories of scientific information from more than 100 years ago. We maintain a fine example at Cambridge: The Darwin Correspondence Project, though it can’t be compared directly to Virtual Fly Brain. The former doesn’t contain specialist scientific information like that held by the latter – it holds letters, notebooks, diary entries etc – ‘personal papers’ in other words. These types of materials are what university archives tend to collect.

Repositories like Darwin Correspondence don’t have “all of humanity, plus robots” Designated Communities, either. They’re aimed at historians of science, and those researching the time period when the science was conducted. Such communities tend more towards the general than ‘neurobiologists’, but are still specialised enough to enable production and management of workable, usable, logical archives.

We don’t have to wait for the professor to die any more

So we have two quite different types of repository. There’s the ‘ultra-specialised’ Open Research repository for the Designated Community of researchers in the related domain, and then there’s the more general institutional ‘special collection’ repository containing materials that provide context to the science, such as correspondence between scientists, notebooks (which are becoming fully electronic), and rough ‘back of the envelope’ ideas. Sitting somewhere between the two are publications – the specialist repository might host early drafts and work in progress, while the institutional repository contains finished, publish work. And the institutional repository might also collect enough data to support these publications, too, like our own Apollo Repository does.

The way digital disrupts this relationship is quite simple: a scientist needs access to her ‘personal papers’ while she’s still working, so, in the old days (i.e. more than 25 years ago) the archive couldn’t take these while she was still active, and would often have to wait for the professor to retire, or even die, before such items could be donated. However, now everything is digital, the prof can both keep her “papers” locally and deposit them at the same time. The library special collection doesn’t need to wait for the professor to die to get their hands on the context of her work. Or indeed, wait for her to become a professor.

Key issues this disruption raises

If we accept that specialist Open Research repositories are where researchers carry out their work, that the institutional repository role is to collect contextual material to help us understand that work further down the line, then what questions does this raise about how those managing these repositories might work together?

How will the relationship between archivists and researchers change?

The move to digital methods of working will change the relationships between scientists and archivists.  Institutional repository staff will become increasingly obliged to forge relationships with scientists earlier in their careers. Of course, the archivists will need to work out which current research activity is likely to resonate most in future. Collection policies might have to be more closely in step with funding trends, for instance? Perhaps the university archivist of the digital future might spend a little more time hanging round the research office?

How will scientists’ behaviour have to change?

A further outcome of being able to donate digitally is that scientists become more responsible for managing their personal digital materials well, so that it’s easier to donate them as they go along. This has been well highlighted by another of the Polonsky Fellows, Sarah Mason at the Bodleian Libraries, who has delivered personal digital archiving training to staff at Oxford, in part based on advice from the Digital Preservation Coalition. The good news here is that such behaviour actually helps people keep their ongoing work neat and tidy, too.

How can we tell when the switch between Designated Communities occurs?

Is it the case that there is a ‘switch-over’ between the two types of Designated Community described above? Does the ‘research lifecycle’ actually include a phase where the active science in a particular domain starts to die down, but the historical interest in that domain starts to increase? I expect that this might be the case, even though it’s not in any of the lifecycle models I’ve seen, which mostly seem to model research as either continuing on a level perpetually, or stopping instantly. But such a phase is likely to vary greatly even between quite closely-related scientific domains. Variables such as the methods and technologies used to conduct the science, what impact the particular scientific domain has upon the public, to what degree theories within the domain conflict, indeed a plethora of factors, are likely to influence the answer.

How might two archives working side-by-side help manage digital obsolescence?

Not having access to the kit needed to work with scientific data in future is one of the biggest threats to genuine ‘long-term’ access to Open Research, but one that I think it really does fall to the university to mitigate. Active scientists using a dedicated, domain specific repository are by default going to be able to deal with the material in that repository: if one team deposits some material that others don’t have the technology to use, then they will as a matter of course sort that out amongst themselves at the time, and they shouldn’t have to concern themselves with what people will do 100 years later.

However, university repositories do have more of a responsibility to history, and a daunting responsibility it is. There is some good news here, though… For a start, universities have a good deal of purchasing power they can bring to bear upon equipment vendors, in order to insist, for example, that they produce hardware and software that creates data in formats that can be preserved easily, and to grant software licenses in perpetuity for preservation purposes.

What’s more fundamental, though, is that the very contextual materials I’ve argued that university special collections should be collecting from scientists ‘as they go along’ are the precise materials science historians of the future will use to work out how to use such “ancient” technology.

Who pays?

The final, but perhaps most pressing question, is ‘who pays for all this’? Well – I believe that managing long-term access to Open Research in two active repositories working together, with two distinct Designated Communities, at least might makes things a little clearer. Funding specialist Open Research repositories should be the responsibility of funders in that domain, but they shouldn’t have to worry about long-term access to those resources. As long as the science is active enough that it’s getting funded, then a proportion of that funding should go to the repositories that science needs to support it. The exact proportion should depend upon the value the repository brings – might be calculated using factors such as how much the repository is used, how much time using it saves, what researchers’ time is worth, how many Research Excellence Framework brownie points (or similar) come about as a result of collaborations enabled by that repository, etc etc.

On the other hand, I believe that university / institutional repositories need to find quite separate funding for their archivists to start building relationships with those same scientists, and working with them to both collect the context surrounding their science as they go along, and prepare for the time when the specialist repository needs to be mothballed. With such contextual materials in place, there don’t seem to be too many insurmountable technical reasons why, when it’s acknowledged that the “switch from one Designated Community to another” has reached the requisite tipping point, the university / institutional repository couldn’t archive the whole of the specialist research repository, describe it sensibly using the contextual material they have collected from the relevant scientists as they’ve gone along, and then store it cheaply on a low-energy medium (i.e. tape, currently). It would then be “available” to those science historians that really wanted to have a go at understanding it in future, based on what they could piece together about it from all the contextual information held by the university in a more immediately accessible state.

Hence the earlier the institutional repository can start forging relationships with researchers, the better. But it’s something for the institutional archive to worry about, and get the funding for, not the researcher.

Originally published 11 September 2017
Written by Dave Gerrard

Creative Commons License

Sustaining open research resources – a funder perspective

Originally published 26 July 2017, Written by David Carr, Wellcome Trust

This is the second in a series of three blog posts which set out the perspectives of researchers, funders and universities on support for open resources. The first was Open Resources, who should pay? In this post, David Carr from the Open Research team at the Wellcome Trust provides the view of a research funder on the challenges of developing and sustaining the key infrastructures needed to enable open research.

As a global research foundation, Wellcome is dedicated to ensuring that the outputs of the research we fund – including articles, data, software and materials – can be accessed and used in ways that maximise the benefits to health and society.  For many years, we have been a passionate advocate of open access to publications and data sharing.

I am part of a new team at Wellcome which is seeking to build upon the leadership role we have taken in enabling access to research outputs.  Our key priorities include:

  • developing novel platforms and tools to support researchers in sharing their research – such as the Wellcome Open Research publishing platform which we launched last year;
  • supporting pioneering projects, tools and experiments in open research, building on the Open Science Prize which with the NIH and Howard Hughes Medical Institute;
  • developing our policies and practices as a funder to support and incentivise open research.

We are delighted to be working with the Office of Scholarly Communication on the Open Research Pilot Project, where we will work with four Wellcome-funded research groups at Cambridge to support them in making their research outputs open.  The pilot will explore the opportunities and challenges, and how platforms such as Wellcome Open Research can facilitate output sharing.

Realising the long-term value of research outputs will depend critically upon developing the infrastructures to preserve, access, combine and re-use outputs for as long as their value persists.  At present, many disciplines lack recognised community repositories and, where they do exist, many cannot rely on stable long-term funding.  How are we as a funder thinking about this issue?

Meeting the costs of outputs sharing

In July 2017, Wellcome published a new policy on managing and sharing data, software and materials.  This replaced our long-standing policy on data management and sharing – extending our requirements for research data to also cover original software and materials (such as antibodies, cell lines and reagents).  Rather than ask for a data management plan, applicants are now asked to provide an outputs management plan setting out how they will maximise the value of their research outputs more broadly.

Wellcome commits to meet the costs of these plans as an integral part of the grant, and provides guidance on the costs that funding applicants should consider.  We recognise, however, that many research outputs will continue to have value long after the funding period comes to an end.  Further, while it not appropriate to make all research data open indefinitely, researchers are expected to retain data underlying publications for at least ten years (a requirement which was recently formalised in the UK Concordat on Open Research Data).  We must accept that preserving and making these outputs available into the future carries an ongoing cost.

Some disciplines have existing subject-area repositories which store, curate and provide access to data and other outputs on behalf of the communities they serve.  Our expectation, made more explicit in our new policy, is that researchers should deposit their outputs in these repositories wherever they exist.  If no recognised subject-area repository is available, we encourage researchers to consider using generalist repositories – such as DryadFigShare and Zenodo – or if not, to use institutional repositories.  Looking ahead, we may consider developing an orphan repository to house Wellcome-funded research data which has no other obvious home.

Recognising the key importance of this infrastructure, Wellcome provides significant grant funding to repositories, databases and other community resources.  As of July 2016, Wellcome had active grants totalling £80 million to support major data resources.  We have also invested many millions more in major cohort and longitudinal studies, such as UK Biobank and ALSPAC.  We provide such support through our Biomedical Resource and Technology Development scheme, and have provided additional major awards over the years to support key resources, such as PDB-EuropeEnsembl and the Open Microscopy Environment.

While our funding for these resources is not open-ended and subject to review, we have been conscious for some time that the reliance of key community resources on grant funding (typically of three to five years’ duration) can create significant challenges, hindering their ability to plan for the long-term and retain staff.  As we develop our work on Open Research, we are keen to explore ways in which we adapt our approach to help put key infrastructures on a more sustainable footing, but this is a far from straightforward challenge.

Gaining the perspectives of resource providers

In order to better understand the issues, we did some initial work earlier this year to canvas the views of those we support.  We conducted semi-structured interviews with leaders of 10 resources in receipt of Wellcome funding – six database and software resources, three cohort resources and one materials stock centre – to explore their current funding, long-term sustainability plans and thoughts on the wider funding and policy landscape.

We gathered a wealth of insights through these conversations, and several key themes emerged:

  • All of the resources were clear that they would continue to be dependent on support from Wellcome and/or other funders for the long-term.
  • While cohort studies (which provide managed access to data) can operate cost recovery models to transfer some of the cost of accessing data onto users, such models were not appropriate for data and software resources who commit to open and unrestricted access.
  • Several resources had additional revenue-generation routes – including collaborations with commercial entities– and these had delivered benefits in enhancing their resources.  However, the level of income was usually relatively modest in terms of the total cost of sustaining the resource. Commitments to openness could also limit the extent to which such arrangements were feasible.
  • Diversification of funding sources can give greater assurance and reduce reliance on single funders, but can bring an additional burden.  There was felt to be a need for better coordination between funders where they co-fund resources.  Europe PMC, which has 27 partner funders but is managed through a single grant is a model which could be considered.
  • Several of the resources were actively engaged in collaborations with other resources internationally that house related data – it was felt that funders could help further facilitate such partnerships.

We are considering how Wellcome might develop its funding approaches in light of these findings.  As an initial outcome, we plan to develop guidance for our funded researchers on key issues to consider in relation to sustainability.  We are already working actively with other funders to facilitate co-funding and make decisions as streamlined as possible, and wish to explore how we join forces in the future in developing our broader approaches for funding open resources.

Coordinating our efforts

There is growing recognition of the crucial need for funders and wider research community to work together develop and sustain research data infrastructure.  As the first blogin this series highlighted, the scientific enterprise is global and this is an issue which must be addressed international level.

In the life sciences, the ELIXIR and US BD2K initiatives have sought to develop coordinated approaches for supporting key resources and, more recently, the European Open Science Cloud initiative has developed a bold vision for a cloud-based infrastructure to store, share and re-use data across borders and disciplines.

Building on this momentum, the Human Frontiers Science Programme convened an international workshop last November to bring together data resources and major funders in the life sciences.  This resulted in a call for action (reported in Nature) to coordinate efforts to ensure long-term sustainability of key resources, whilst supporting resources in providing access at no charge to users.  The group proposed an international mechanism to prioritise core data resources of global importance, building on the work undertaken by ELIXIR to define criteria for such resources.  It was proposed national funders could potentially then contribute a set proportion of their overall funding (with initial proposals suggesting around 1.5 to 2 per cent) to support these core data resources.

Grasping the nettle

Public and charitable funders are acutely aware that many of the core repositories and resources needed to make research outputs discoverable and useable will continue to rely on our long-term funding support.  There is clear realisation that a reliance on traditional competitive grant funding is not the ideal route through which to support these key resources in a sustainable manner.

But no one yet has a perfect solution and no funder will take on this burden alone.  Aligning global funders and developing joint funding models of the type described above will be far from straightforward, but hopefully we can work towards a more coordinated international approach.  If we are to realise the incredible potential of open research, it’s a challenge we must address

Originally published 26 July 2017,
Written by David Carr, Wellcome Trust

Creative Commons License

Open Resources: Who Should Pay?

Originally published 23 June 2017 Written by Dr Lauren Cadwallader

This blog is the first in a series of three which considers the perspectives of researchers, funders and universities in relation to the support for open resources, coordinated and written by Dr Lauren Cadwallader. This post asks the question: What is the responsibility of national funders to research resources that are internationally important?

In January 2017 the Office of Scholarly Communication and Wellcome Trust started an Open Research Pilot Project to try to understand how we could help our researchers work more openly and what barriers they faced with making their work open. One of the issues that is a common theme with the groups that we are working with is the issue of the sustainability of open resources.

The Virtual Fly Brain Example

Let’s take the Connectomics group I am working with for example. They investigate the connections of neurons in fly brains (Drosophila). They produce a lot of data and are committed to sharing this openly. They share their data via the Virtual Fly Brain platform (VFB).

This platform was set up in 2009 by a group of researchers in Cambridge and Edinburgh; some of the VFB team are now also involved in the Connectomics group so there is a close relationship between these projects. The platform was created as a domain-specific location to curate existing data, taken from the literature, on Drosophila neurons and for curating and sharing new data produced by researchers working in this area.

Initially it was set up thanks to a grant from the Biotechnology and Biological Sciences Research Council (BBSRC). After an initial three year grant, the BBSRC declined to fund the database further. One likely reason for this is that the BBSRC resources scheme explicitly favours resources with a large number of UK users. The number of UK researchers who use Drosophila brain image data is relatively small (<10 labs), whereas the number of international researchers who use this data is relatively large, with an estimated 200 labs working on this type of data in other parts of the world.

Subsequently, the Wellcome Trust stepped in with funding for a further three years, due to end in September 2017. Currently it is uncertain whether or not they will fund it in the future. By now, almost eight years after its creation, VFB has become the go-to source for openly available data on Drosophila brain information and images integrated into a queryable platform. No other resource like it exists and no other research group is making moves to curate Drosophila neurobiology data openly. The VFB case raises interesting and important questions about how resources are funded and the future of domain specific open infrastructures.

The status quo

On the one hand funders like the Wellcome Trust, Research Councils UK and National Institutes of Health (NIH) are encouraging researchers to use domain specific repositories for data sharing. Yet on the other, they are acknowledging that the current approaches for these resources are not necessarily sustainable.

A recent review on building and sustaining data infrastructures commissioned by the Wellcome Trust acknowledges that in light of the FAIR principles “it is clear that data is best made available through repositories where aggregation can add most value”, which is arguably in a domain-specific repository. Use of domain-specific repositories allows data to be aggregated with similar data recorded using the same metadata fields.

It is also clear that publishers can influence where data is deposited, with publishers such as Nature Publishing GroupPLOS and F1000 all recommending subject-specific repositories as the first choice place for deposition. If no subject-specific repository is available then unstructured repositories, such as Dryad or figshare are often recommended instead, which complicates infrastructure needs and therefore provisions.

The economic model for supporting data infrastructures is something the Wellcome Trust are considering, with reports recently published by other funding agencies (herehere and here). The Wellcome Trust’s commissioned review noted that project-based funding for data infrastructures in not sustainable in the long term.

However, historically funders have encouraged, and still encourage, the use of domain specific resources, which have been born from project-based funding because of a lack of provision elsewhere. This has created a complex situation – researchers created domain specific data infrastructures using their project funding; these have become the subject norm; funder’s encourage their use, but now don’t have the mechanisms to be able to pledge sustained long-term funding.

National interests?

What is the responsibility of national funders to research resources that are internationally important? Academic research is collaborative. It crosses borders and utilises shared knowledge regardless of where it was generated and this is acknowledged by funders who see the benefits of collaboration. Yet, the strategic goals of funders, such as the BBSRC, are often focused on the national level when it comes to relevance and importance.

On the one hand it is understandable that funders concentrate on national interests – taxpayers’ money goes into the funder’s coffers and therefore they have a responsibility to those taxpayers to ensure that the money is spent on research that benefits the nation.

But, one could argue that international collaboration is in the national interest. The US-based NIH funds resources that are of international importance, including most of the model organism databases and genomic resources, such as the Gene Expression Omnibus. These are highly used by US researchers so one could argue that NIH are acting in the national interest but they are open to researchers all over the world and therefore constitute a resource of international importance.

Wellcome Trust do have a global outlook when it comes to funding, with 21% of their total spend (2015-6) going to projects outside of the UK. Yet, the VFB resource is still vulnerable despite being an internationally important resource.

One of the motivations for the Connectomics group to to participate in the Open Research Pilot is to open a dialogue with the Wellcome Trust about these issues. The Wellcome Trust are committed to strategically investing in Open Research and encourage the use of domain-specific resources. The Connectomics group are interested in how will this strategic investment translate into actual funding decisions now and into the future.

Issues on which researchers would like clarification

All the researchers who are part of the Open Research Pilot have had the opportunity to contribute to questions on open resources sustainability. Posts on the funder’s and University’s perspective will be published as parts 2 and 3 of this blog.

  1. What do you think is the responsibility of national funders towards research resources that are of more international benefit than national?
  2. How do you think the funding landscape will react to the move towards open research in terms of supporting the sustainability of resources used for curating and sharing data?
  3. Researchers are asked to share their data in domain specific resources if they are available. There are 1598 discipline specific repositories listed on and each one needs to be supported. How big does a research community need to be to expect support?
  4. What percentage of financial support should be focussed on resources versus primary research?
  5. If funders are reluctant to pay for domain specific resources, is there a need to move to a researcher pays model for data sharing rather than centrally funding resources in some circumstances? Why? How do they envisage this being paid for?
  6. How can we harmonise the approach to sustainable open resources across a global research community? Should we move to centralised infrastructures like the European Open Science Cloud?
  7. More generally how can funders and employers help to incentivise open research (carrot or stick?)
  8. Wellcome often tries to act in a way to bring about change (e.g. open access publishing): Do they envisage that the long term funding of open research (10-20 years from now) will be very different from the situation over e.g. the next 5 years?

Originally published 23 June 2017
Written by Dr Lauren Cadwallader

Creative Commons License