Me, Myself and Data – Dr Sudhakaran Prabakaran

For Love Data Week (12th-16th February 2018) we are featuring data-related people. Today we talk to Dr Sudhakaran Prabakaran, Lecturer/ Group leader, Department of Genetics.

Telling Stories with Data

Let’s start with an easy one. What kind of data do you work with and what do you do with it?

We use population level sequencing data sets from TCGA, mutation datasets from COSMIC, ClinVar, HGMD, curated database from other labs. We use discarded datasets, negative datasets, already published datasets, anything and everything. We develop and use structural genomics, mathematical modelling and machine learning tools to analyse mutations that map to noncoding regions of the human genome.

Tell us how you think you can use data to make a difference in your field.

We live on these datasets. Biological data is going to exceed 2.5 Exabytes in the next two years, and the bottleneck is the analysis of these datasets. Our job is to find patterns in these datasets. Rare variants and driver mutations become significant and identifiable only when we look for them in a population context.

How do you talk about your data to someone outside of academia?

​For us it is not difficult. The datasets we are using are generated and curated by governmental and international consortiums. They have done the bulk of publicity. For example, the TCGA dataset has all kinds of data from thousands of cancer patients and is curated by the NIH. The power of this data is for all to see. I just say we try to aid in cancer diagnosis by crawling through these datasets to find patterns.

Connected Conversations

What data-related challenges do you have to deal with in your research environment?

We are happy with the publicly available datasets. Our problem starts with the datasets we collect. How to store, analyse, and make it available for everyone to use are the questions we are trying to answer all the time.

How do you think these challenges might be overcome?

I am an ardent proponent of cloud-storage and computation. I believe that is the future. I am also aware that some countries are concerned with data migration outside their geographical boundaries.

If you were in charge what data-related rule would you introduce?

I am not going to make up anything new. Past US Presidents have made laws like any data generated with public funds should be made available.

Governmental organisations should demystify cloud based storage and computation processes. People are unduly worried. People are giving away more personal data wilfully on Facebook, Twitter, Instagram than through genome sequences collected by public consortiums.

Tell us about your happiest data moment.

It is not one moment, it is a series of moments up until now. I can run a viable research program with no startup money or funds just by scavenging through publicly available datasets.

What advice do you have for someone who is just embarking on a career in your field?

Learn machine-learning and cloud-computing

What do you think the future of research data looks like?

Lots of data analysis than data generation

There is A LOT of data out there about all sorts of things and it is being collected all the time. Does anything frighten you about data?

I am in fact excited. I believe we need to train more data scientists. We are in good times. Data is becoming truly democratic!

Published 12 February 2018
Written by Dr Sudhakaran Prabakaran
Creative Commons License

The Open Research Pilot – one year in

As we close in on the halfway point for the Open Research Pilot between the University of Cambridge and the Wellcome Trust, how are things going?

Well, in many ways exactly as expected. The primary issues that we are facing are a lack of sustainable support for infrastructure and a lack of reward and incentive to work openly. None of this is news, and while no new issues are being surfaced by the Open Research Pilot, having the dialogue is helping the participating researchers exchange ideas and the Wellcome Trust develop new services and policies.

This was the take home message from the second full meeting of the Open Research Pilot participants held in London at the Wellcome Trust on 13 September. Given one of the main goals of the project is to learn what the barriers and incentives are for open research and to share these findings with others interested in the subject to inform policy development, it seems the Pilot is on track.

This blog summarises the discussions and ideas that arose at the event. A full write-up and the presentations are now available in the Open Research Pilot collection in Apollo, the University repository. The notes are also available from the pilot kick-off meeting held in January in Cambridge.

As a memory jogger, information about the Pilot and the different people and research projects involved are available. A blog describing the kick off meeting is also online as part of our new Open Adventures blog platform.

Outcomes of the meeting

Main issues raised:

  • Today a successful researcher and a good researcher are not necessarily the same thing
  • Time is a big issue. It takes time to annotate data sets to be made useable to others
  • The current incentive and reward structure is a barrier to change
  • The ethos needs to change with regard to the need to publish in particular journals
  • There is a need to re-define what is valuable and this may need to be defined at the discipline level
  • There is a mismatch between the reliance the research community have on certain resources and the availability of funding for long-term sustainability
  • There is a need for dedicated staff to manage sharing of research at the institute or department level

Possible solutions:

  • The new publishing platform, Wellcome Open Research is cheaper and faster than traditional publishing outlets
  • There are movements towards international approaches to collectively funding scientific infrastructure
  • The participants have found having access to library colleagues through this Open Research Pilot Project has been useful for figuring out where to put their research data – this does raise questions about future library services
  • We need strong leadership to drive the change to Open Research and given the risk adverse nature of institutions, change needs to be led by funders


Summary of the discussions

Wellcome Open Research update

Robert Kiley and David Carr gave a progress report on what had been happening in open science at the Wellcome Trust. They gave an update on the new Wellcome Trust open publishing platform, Wellcome Open Research.

When Wellcome Open Research was launched, ‘success’ was defined as 25-30 publications in a year. However in less than a year, more than 100 items have been published on this platform from a broad range of institutions. While half the publications are research articles, the rest are other output types such as data notes, software studies and protocols. This is important, given the new requirement of the Wellcome Trust to share all research outputs.

The platform is relatively popular as well. A comparison of the volume of Wellcome Trust funded publications across the range of publications showed Wellcome Open Research was found to be the fourth most used after Scientific Reports, PLoS ONE and Nature Communications.

This is significant because the average cost of publication on the Wellcome platform is of the order of £700, which is significantly lower than the average cost of the other named publications (generally around £2000).  It is not just cheaper, it is faster as well. Robert described an example where an item was submitted, reviewed, approved, published and made discoverable and then requests were received for the data within a three week period.

Open data, Funding and Sustainability

Recently as part of this Pilot, the OSC published a series of blogs discussing the problem of supporting infrastructure, from the researcher perspective, the funder perspective, and that of the university library. The group discussed the serious problem with infrastructure being funded at a grant level but the data is used by the whole community. Funders do not necessarily fund ongoing infrastructure which is in competition with new requests to fund new ideas. Another question is whether it is even the funder’s job to provide long-term sustainability?

The point was raised that there is a mismatch between a reliance on certain resources on the one hand, but a reluctance to fund for long-term sustainability.  For example, when arXiv (an e-print service, operated by Cornell University) asked the physics community to provide support, they thought that it was the library’s responsibility to provide funding, not theirs.  Similarly, Canadian Health Research heavily rely on GenBank, but do not contribute to the costs of this resource.

This problem is recognised internationally and there are some attempts to address the problem. Earlier this year there was a meeting of several major funding organisations, from which a strong consensus emerged that core data resources for the life sciences should be supported through a coordinated international effort(s) that better ensure long-term sustainability and that appropriately align funding with scientific impact. There is also some work to to build a stable and sustainable infrastructure for biological information across Europe

Support for Open Research activities

One question posed to the researchers in the group was: what support from their institutions and funders would they want to make their data more accessible? It was commented that time was a big issue. For example, it takes time to annotate data sets to be made useable to others.  One group said that they could write protocols and a series of articles to put on Wellcome Open Research, which would be a good thing, but it would take the team a long time.

There appears to be a need for dedicated staff to manage sharing of research at the institute or department level.  It was commented that having had access to library colleagues through this Open Research Pilot Project has been useful for figuring out where to put their research data. An action was taken for the library component of the group to think about what support is being provided in this context (and into the future).

Open Research and Culture

In the current climate it is easy to identify a successful researcher.  A successful researcher has prizes, publishes in particular journals with high impact factors and has grants and funding. But a successful researcher and a good researcher are not necessarily the same thing. One of the blockers for a future Open Research environment seems to be the research community itself.  For example, the current incentive and reward structure is a barrier to change and there is a need to re-define what is valuable and this may need to be defined at the discipline level.

Some suggestions that arose in the discussion were:

  • a data re-use prize by Wellcome Trust
  • only provide grants or Fellowships to institutions or departments that have signed or support the Declaration on Research Assessment (DORA)
  • travel fellowships awards for good Open Research practices (noting that credit should be given to the individual winning the award and not the head of the laboratory who was the recipient of the original grant)

By ‘fighting on different fronts’, slowly the research environment might change. We need strong leadership to drive the change to Open Research, and the leadership needs to come from funders and institutions for the researchers to align themselves with working openly. But institutions are very risk adverse with activities that could jeopardise funding, so change needs to be led by the funders.

The hybrid question

While open access could be a vehicle for improving open research, the route to achieve this is debatable. The group asked whether funders could insist on green open access or only pay for truly open access journals. If payment for articles in hybrid journals, for example, was stopped, the money saved could be used to invest in other aspects of open research.

An alternative option discussed was whether the value given for APCs be limited, say to $1000? But this would be very difficult to implement, as an indicator, the SCOAP3 project took five years to get off the ground.

The question arose: if Wellcome Trust stopped paying for open access in hybrid journals, would researchers stop applying for funding? The feeling was no. But researchers perceive that when applying for grants, their record for publications in particular journals is very important.  The ethos needs to change with regard to the need to publish in particular journals.

Next steps

The Office of Scholarly Communication is coordinating an “In Conversation” event on 5 December to give researchers the opportunity to talk to Wellcome Trust representatives about their Policy on data, software and materials management and sharing.

We are also looking to find evidence that data is reused.

The Wellcome Trust will be using the group as scoping group for a proposal that the Wellcome Trust build a repository by sharing the draft requirements.


Published 10 November 2017
Written by Dr Danny Kingsley, based on notes by Dr Debbie Hansen
Creative Commons License

RDN Open Research Panel: the researcher’s perspective

By Jennifer Harris and Tim Fulton

The 4th Research Data Network event organised by JISC took place over 2 very rainy days at the end of June at the University of York.  It was a packed schedule of talks, technology demonstrations and interactive sessions covering a full gambit of topics related to Research Data Management and Open Data.  Rounding off the event was a panel discussion organised by the Office of Scholarly Communication at the University of Cambridge on the subject of Open Research and in particular the Open Research pilot project that is currently being run here in conjunction with the Wellcome Trust.

The Open Research pilot project is formed of a number of research groups from across a range of disciplines all of whom are  intending to share not only complete articles but also single data sets or unexplained, yet reproducible, results. The pilot intends to look at the benefits and barriers of open research for the researchers and what support libraries can provide to facilitate open research.

Sitting on the panel were:

  • David Carr from the Wellcome Trust, offering his opinions on Open Data from the funder’s perspective
  • Lauren Cadwallader from the Office of Scholarly Communications, discussing how Academic support staff can deal with the demands of Open Data (and recalcitrant academics)
  • Tim Fulton a researcher in Cambridge University’s Department of Genetics and participant in the open research pilot
  • Jennifer Harris, a researcher from the UCL/Birkbeck Centre for Planetary Sciences at Birkbeck University of London.

Leading the discussion was Marta Teperek, formerly of the University of Cambridge but now working as a Data Steward at TU Delft .

After a brief introduction from each of the panelists it was over to the audience for questions.  Questions were solicited from the room in two ways, the traditional ‘stick your hand up and someone will bring you a microphone’ method and the newer medium of, enabling the shyer members of the audience to still participate without having to identify themselves or speak up in front of approximately 100 people.  The range of expertise and experience on the panel was reflected in the questions, with topics of discussion ranging from how to to fund Open Science, (should it be included in block grants?) to what the panelists find most frustrating about the current methods of publishing non-traditional outputs including data and how best to persuade academics who are wary of data sharing that making their research open would be a good thing.

As two of the only people at the conference who are currently employed as full-time academic researchers this was an interesting and thought-provoking session that we were both glad to have had the chance to contribute to.

Views from an experienced Open Researcher – Jennifer Harris

As most of the audience were academic research data managers or related professions I definitely felt I was there to give a view from the other side, and to demonstrate to them that there are some researchers out there who do care about Open Research and are willing to work with the RDM community to make it a reality.  As the only member of the panel not involved in the Cambridge Open Research Project my contributions were also more general, offering a perspective from a field not involved in the pilot but one that does have a lot of experience with open research in the form of open (and free) data and citizen science.

Open Research is a multi-faceted issue and it was clear that everyone in the room had a good understanding of this and the complications that inevitably arise when attempting to promote it to a community of academics who already have a heavy load of pressures and demands on their time.  As a researcher currently employed on a postdoctoral contract I can get a little defensive when I hear members of the academic support community complain about researchers not engaging with their efforts to promote a new service or perform some new task that they’re requesting.  The task in question is sometimes only a small one such as uploading a paper to an institutional repository post-publication.  But these small tasks can sometimes be the final straw when it comes to managing your workload.

It was refreshing therefore to be in a room of people who mostly seemed to get that and genuinely wanted to understand how they could best get across to researchers that what they’re requesting when they push open research may involve an increase in workload but that it will (or at least should) be something that will pay off in the longer term.  A lot of the discussion kept coming back to the fact that what is ultimately required is an entire change of culture and that’s something that everyone will have to be involved in, from publishers to support staff to researchers at all levels (but most importantly at the top).

Views from a Newcomer – Tim Fulton

Over the two day meeting I was encouraged to speak to so many RDM community members who understood that open research policy and platforms are only as good as the  engagement of researchers and the research community. In talking to delegates it became clear to me that there were two clear main concerns with the project: getting researchers involved, and maintaining funding to ensure sustainability of the repositories. Hearing that the Wellcome Trust are in discussion about the sustainability of their project is heartening, if not currently lacking in detail, however at a pilot stage this is not to be unexpected.

It too was encouraging that the RDM community and pilot scheme organisers are keen on understanding what the current obstacles to sharing data are – most commonly mentioned were time and the fear of being ‘scooped’ by someone else using your data. Our laboratory intends to publish data following the publication of a pre-print article thereby securing our ownership of our conclusions without overly delaying the sharing of the data. We also intend to publish detailed methods to explain how our data was generated, addressing the repeatedly mentioned frustration with traditional journals of poor method description and non-reproducibility of results.

This project is a cornerstone in changing the attitude of researchers to data: changing the current culture to hoard data to one where data is a community possession. Whilst this may take time and there will be issues along the way I foresee this project creating a better research environment. We have begun the process of preparing data for publication at the time of collection rather than as a subsequent publication step. The hope is that these small changes to daily working practice should make the general research community more efficient and fruitful going forwards which is good for researchers, funders and the general public.

Published 23 October 2017
By Jennifer Harris and Tim Fulton
Creative Commons License

Sustaining long-term access to open research resources – a university library perspective

Originally published 11 September 2017, Written by Dave Gerrard

In the third in a series of three blog posts, Dave Gerrard, a Technical Specialist Fellow from the Polonsky-Foundation-funded Digital Preservation at Oxford and Cambridge project, describes how he thinks university libraries might contribute to ensuring access to Open Research for the longer-term.  The series began with Open Resources, who should pay, and continued with Sustaining open research resources – a funder perspective.

Blog post in a nutshell

This blog post works from the position that the user-bases for Open Research repositories in specific scientific domains are often very different to those of institutional repositories managed by university libraries.

It discusses how in the digital era we could deal with the differences between those user-bases more effectively. The upshot might be an approach to the management of Open Research that requires both types of repository to work alongside each other, with differing responsibilities, at least while the Open Research in question is still active.

And, while this proposed method of working together wouldn’t clarify ‘who is going to pay’ entirely, it at least clarifies who might be responsible for finding funding for each aspect of the task of maintaining access in the long-term.

Designating a repository’s user community for the long-term

Let’s start with some definitions. One of the core models in Digital Preservation, the International Standard Open Archival Information System Reference Model (or OAIS) defines ‘the long term’ as:

“A period of time long enough for there to be concern about the impacts of changing technologies, including support for new media and data formats, and of a changing Designated Community, on the information being held in an OAIS. This period extends into the indefinite future.”

This leads us to two further important concepts defined by the OAIS:

Designated Communities” are an identified group of potential Consumers who should be able to understand a particular set of information”, i.e. the set of information collected by the ‘archival information system’.

A “Representation Information Network” is the tool that allows the communities to explore the metadata which describes the core information collected. This metadata will consist of:

  • descriptions of the data contained in the repository
  • metadata about the software used to work with that data,
  • the formats in which the data are stored and related to each other, and so forth.

In the example of the Virtual Fly Brain Platform repository discussed in the first post in this series, the Designated Community appears to be: “… neurobiologists [who want] to explore the detailed neuroanatomy, neuron connectivity and gene expression of Drosophila melanogaster.” And one of the key pieces of Representation Information, namely “how everything in the repository relates to everything else”, is based upon a complex ontology of fly anatomy.

It is easy to conclude, therefore, that you really do need to be a neurobiologist to use the repository: it is fundamentally, deeply and unashamedly confusing to anyone else that might try to use it.

Tending towards a general audience

The concept of Designated Communities is one that, in my opinion, the OAIS Reference Model never adequately gets to grips with. For instance, the OAIS Model suggests including explanatory information in specialist repositories to make the content understandable to the general community.

Long term access within this definition thus implies designing repositories for Designated Communities consisting of what my co-Polonsky-Fellow Lee Pretlove describes as: “all of humanity, plus robots”. The deluge of additional information that would need to be added to support this totally general resource would render it unusable; to aim at everybody is effectively aiming at nobody. And, crucially, “nobody” is precisely who is most likely to fund a “specialist repository for everyone”, too.

History provides a solution

One way out of this impasse is to think about currently existing repositories of scientific information from more than 100 years ago. We maintain a fine example at Cambridge: The Darwin Correspondence Project, though it can’t be compared directly to Virtual Fly Brain. The former doesn’t contain specialist scientific information like that held by the latter – it holds letters, notebooks, diary entries etc – ‘personal papers’ in other words. These types of materials are what university archives tend to collect.

Repositories like Darwin Correspondence don’t have “all of humanity, plus robots” Designated Communities, either. They’re aimed at historians of science, and those researching the time period when the science was conducted. Such communities tend more towards the general than ‘neurobiologists’, but are still specialised enough to enable production and management of workable, usable, logical archives.

We don’t have to wait for the professor to die any more

So we have two quite different types of repository. There’s the ‘ultra-specialised’ Open Research repository for the Designated Community of researchers in the related domain, and then there’s the more general institutional ‘special collection’ repository containing materials that provide context to the science, such as correspondence between scientists, notebooks (which are becoming fully electronic), and rough ‘back of the envelope’ ideas. Sitting somewhere between the two are publications – the specialist repository might host early drafts and work in progress, while the institutional repository contains finished, publish work. And the institutional repository might also collect enough data to support these publications, too, like our own Apollo Repository does.

The way digital disrupts this relationship is quite simple: a scientist needs access to her ‘personal papers’ while she’s still working, so, in the old days (i.e. more than 25 years ago) the archive couldn’t take these while she was still active, and would often have to wait for the professor to retire, or even die, before such items could be donated. However, now everything is digital, the prof can both keep her “papers” locally and deposit them at the same time. The library special collection doesn’t need to wait for the professor to die to get their hands on the context of her work. Or indeed, wait for her to become a professor.

Key issues this disruption raises

If we accept that specialist Open Research repositories are where researchers carry out their work, that the institutional repository role is to collect contextual material to help us understand that work further down the line, then what questions does this raise about how those managing these repositories might work together?

How will the relationship between archivists and researchers change?

The move to digital methods of working will change the relationships between scientists and archivists.  Institutional repository staff will become increasingly obliged to forge relationships with scientists earlier in their careers. Of course, the archivists will need to work out which current research activity is likely to resonate most in future. Collection policies might have to be more closely in step with funding trends, for instance? Perhaps the university archivist of the digital future might spend a little more time hanging round the research office?

How will scientists’ behaviour have to change?

A further outcome of being able to donate digitally is that scientists become more responsible for managing their personal digital materials well, so that it’s easier to donate them as they go along. This has been well highlighted by another of the Polonsky Fellows, Sarah Mason at the Bodleian Libraries, who has delivered personal digital archiving training to staff at Oxford, in part based on advice from the Digital Preservation Coalition. The good news here is that such behaviour actually helps people keep their ongoing work neat and tidy, too.

How can we tell when the switch between Designated Communities occurs?

Is it the case that there is a ‘switch-over’ between the two types of Designated Community described above? Does the ‘research lifecycle’ actually include a phase where the active science in a particular domain starts to die down, but the historical interest in that domain starts to increase? I expect that this might be the case, even though it’s not in any of the lifecycle models I’ve seen, which mostly seem to model research as either continuing on a level perpetually, or stopping instantly. But such a phase is likely to vary greatly even between quite closely-related scientific domains. Variables such as the methods and technologies used to conduct the science, what impact the particular scientific domain has upon the public, to what degree theories within the domain conflict, indeed a plethora of factors, are likely to influence the answer.

How might two archives working side-by-side help manage digital obsolescence?

Not having access to the kit needed to work with scientific data in future is one of the biggest threats to genuine ‘long-term’ access to Open Research, but one that I think it really does fall to the university to mitigate. Active scientists using a dedicated, domain specific repository are by default going to be able to deal with the material in that repository: if one team deposits some material that others don’t have the technology to use, then they will as a matter of course sort that out amongst themselves at the time, and they shouldn’t have to concern themselves with what people will do 100 years later.

However, university repositories do have more of a responsibility to history, and a daunting responsibility it is. There is some good news here, though… For a start, universities have a good deal of purchasing power they can bring to bear upon equipment vendors, in order to insist, for example, that they produce hardware and software that creates data in formats that can be preserved easily, and to grant software licenses in perpetuity for preservation purposes.

What’s more fundamental, though, is that the very contextual materials I’ve argued that university special collections should be collecting from scientists ‘as they go along’ are the precise materials science historians of the future will use to work out how to use such “ancient” technology.

Who pays?

The final, but perhaps most pressing question, is ‘who pays for all this’? Well – I believe that managing long-term access to Open Research in two active repositories working together, with two distinct Designated Communities, at least might makes things a little clearer. Funding specialist Open Research repositories should be the responsibility of funders in that domain, but they shouldn’t have to worry about long-term access to those resources. As long as the science is active enough that it’s getting funded, then a proportion of that funding should go to the repositories that science needs to support it. The exact proportion should depend upon the value the repository brings – might be calculated using factors such as how much the repository is used, how much time using it saves, what researchers’ time is worth, how many Research Excellence Framework brownie points (or similar) come about as a result of collaborations enabled by that repository, etc etc.

On the other hand, I believe that university / institutional repositories need to find quite separate funding for their archivists to start building relationships with those same scientists, and working with them to both collect the context surrounding their science as they go along, and prepare for the time when the specialist repository needs to be mothballed. With such contextual materials in place, there don’t seem to be too many insurmountable technical reasons why, when it’s acknowledged that the “switch from one Designated Community to another” has reached the requisite tipping point, the university / institutional repository couldn’t archive the whole of the specialist research repository, describe it sensibly using the contextual material they have collected from the relevant scientists as they’ve gone along, and then store it cheaply on a low-energy medium (i.e. tape, currently). It would then be “available” to those science historians that really wanted to have a go at understanding it in future, based on what they could piece together about it from all the contextual information held by the university in a more immediately accessible state.

Hence the earlier the institutional repository can start forging relationships with researchers, the better. But it’s something for the institutional archive to worry about, and get the funding for, not the researcher.

Originally published 11 September 2017
Written by Dave Gerrard

Creative Commons License

Sustaining open research resources – a funder perspective

Originally published 26 July 2017, Written by David Carr, Wellcome Trust

This is the second in a series of three blog posts which set out the perspectives of researchers, funders and universities on support for open resources. The first was Open Resources, who should pay? In this post, David Carr from the Open Research team at the Wellcome Trust provides the view of a research funder on the challenges of developing and sustaining the key infrastructures needed to enable open research.

As a global research foundation, Wellcome is dedicated to ensuring that the outputs of the research we fund – including articles, data, software and materials – can be accessed and used in ways that maximise the benefits to health and society.  For many years, we have been a passionate advocate of open access to publications and data sharing.

I am part of a new team at Wellcome which is seeking to build upon the leadership role we have taken in enabling access to research outputs.  Our key priorities include:

  • developing novel platforms and tools to support researchers in sharing their research – such as the Wellcome Open Research publishing platform which we launched last year;
  • supporting pioneering projects, tools and experiments in open research, building on the Open Science Prize which with the NIH and Howard Hughes Medical Institute;
  • developing our policies and practices as a funder to support and incentivise open research.

We are delighted to be working with the Office of Scholarly Communication on the Open Research Pilot Project, where we will work with four Wellcome-funded research groups at Cambridge to support them in making their research outputs open.  The pilot will explore the opportunities and challenges, and how platforms such as Wellcome Open Research can facilitate output sharing.

Realising the long-term value of research outputs will depend critically upon developing the infrastructures to preserve, access, combine and re-use outputs for as long as their value persists.  At present, many disciplines lack recognised community repositories and, where they do exist, many cannot rely on stable long-term funding.  How are we as a funder thinking about this issue?

Meeting the costs of outputs sharing

In July 2017, Wellcome published a new policy on managing and sharing data, software and materials.  This replaced our long-standing policy on data management and sharing – extending our requirements for research data to also cover original software and materials (such as antibodies, cell lines and reagents).  Rather than ask for a data management plan, applicants are now asked to provide an outputs management plan setting out how they will maximise the value of their research outputs more broadly.

Wellcome commits to meet the costs of these plans as an integral part of the grant, and provides guidance on the costs that funding applicants should consider.  We recognise, however, that many research outputs will continue to have value long after the funding period comes to an end.  Further, while it not appropriate to make all research data open indefinitely, researchers are expected to retain data underlying publications for at least ten years (a requirement which was recently formalised in the UK Concordat on Open Research Data).  We must accept that preserving and making these outputs available into the future carries an ongoing cost.

Some disciplines have existing subject-area repositories which store, curate and provide access to data and other outputs on behalf of the communities they serve.  Our expectation, made more explicit in our new policy, is that researchers should deposit their outputs in these repositories wherever they exist.  If no recognised subject-area repository is available, we encourage researchers to consider using generalist repositories – such as DryadFigShare and Zenodo – or if not, to use institutional repositories.  Looking ahead, we may consider developing an orphan repository to house Wellcome-funded research data which has no other obvious home.

Recognising the key importance of this infrastructure, Wellcome provides significant grant funding to repositories, databases and other community resources.  As of July 2016, Wellcome had active grants totalling £80 million to support major data resources.  We have also invested many millions more in major cohort and longitudinal studies, such as UK Biobank and ALSPAC.  We provide such support through our Biomedical Resource and Technology Development scheme, and have provided additional major awards over the years to support key resources, such as PDB-EuropeEnsembl and the Open Microscopy Environment.

While our funding for these resources is not open-ended and subject to review, we have been conscious for some time that the reliance of key community resources on grant funding (typically of three to five years’ duration) can create significant challenges, hindering their ability to plan for the long-term and retain staff.  As we develop our work on Open Research, we are keen to explore ways in which we adapt our approach to help put key infrastructures on a more sustainable footing, but this is a far from straightforward challenge.

Gaining the perspectives of resource providers

In order to better understand the issues, we did some initial work earlier this year to canvas the views of those we support.  We conducted semi-structured interviews with leaders of 10 resources in receipt of Wellcome funding – six database and software resources, three cohort resources and one materials stock centre – to explore their current funding, long-term sustainability plans and thoughts on the wider funding and policy landscape.

We gathered a wealth of insights through these conversations, and several key themes emerged:

  • All of the resources were clear that they would continue to be dependent on support from Wellcome and/or other funders for the long-term.
  • While cohort studies (which provide managed access to data) can operate cost recovery models to transfer some of the cost of accessing data onto users, such models were not appropriate for data and software resources who commit to open and unrestricted access.
  • Several resources had additional revenue-generation routes – including collaborations with commercial entities– and these had delivered benefits in enhancing their resources.  However, the level of income was usually relatively modest in terms of the total cost of sustaining the resource. Commitments to openness could also limit the extent to which such arrangements were feasible.
  • Diversification of funding sources can give greater assurance and reduce reliance on single funders, but can bring an additional burden.  There was felt to be a need for better coordination between funders where they co-fund resources.  Europe PMC, which has 27 partner funders but is managed through a single grant is a model which could be considered.
  • Several of the resources were actively engaged in collaborations with other resources internationally that house related data – it was felt that funders could help further facilitate such partnerships.

We are considering how Wellcome might develop its funding approaches in light of these findings.  As an initial outcome, we plan to develop guidance for our funded researchers on key issues to consider in relation to sustainability.  We are already working actively with other funders to facilitate co-funding and make decisions as streamlined as possible, and wish to explore how we join forces in the future in developing our broader approaches for funding open resources.

Coordinating our efforts

There is growing recognition of the crucial need for funders and wider research community to work together develop and sustain research data infrastructure.  As the first blogin this series highlighted, the scientific enterprise is global and this is an issue which must be addressed international level.

In the life sciences, the ELIXIR and US BD2K initiatives have sought to develop coordinated approaches for supporting key resources and, more recently, the European Open Science Cloud initiative has developed a bold vision for a cloud-based infrastructure to store, share and re-use data across borders and disciplines.

Building on this momentum, the Human Frontiers Science Programme convened an international workshop last November to bring together data resources and major funders in the life sciences.  This resulted in a call for action (reported in Nature) to coordinate efforts to ensure long-term sustainability of key resources, whilst supporting resources in providing access at no charge to users.  The group proposed an international mechanism to prioritise core data resources of global importance, building on the work undertaken by ELIXIR to define criteria for such resources.  It was proposed national funders could potentially then contribute a set proportion of their overall funding (with initial proposals suggesting around 1.5 to 2 per cent) to support these core data resources.

Grasping the nettle

Public and charitable funders are acutely aware that many of the core repositories and resources needed to make research outputs discoverable and useable will continue to rely on our long-term funding support.  There is clear realisation that a reliance on traditional competitive grant funding is not the ideal route through which to support these key resources in a sustainable manner.

But no one yet has a perfect solution and no funder will take on this burden alone.  Aligning global funders and developing joint funding models of the type described above will be far from straightforward, but hopefully we can work towards a more coordinated international approach.  If we are to realise the incredible potential of open research, it’s a challenge we must address

Originally published 26 July 2017,
Written by David Carr, Wellcome Trust

Creative Commons License

Open Resources: Who Should Pay?

Originally published 23 June 2017 Written by Dr Lauren Cadwallader

This blog is the first in a series of three which considers the perspectives of researchers, funders and universities in relation to the support for open resources, coordinated and written by Dr Lauren Cadwallader. This post asks the question: What is the responsibility of national funders to research resources that are internationally important?

In January 2017 the Office of Scholarly Communication and Wellcome Trust started an Open Research Pilot Project to try to understand how we could help our researchers work more openly and what barriers they faced with making their work open. One of the issues that is a common theme with the groups that we are working with is the issue of the sustainability of open resources.

The Virtual Fly Brain Example

Let’s take the Connectomics group I am working with for example. They investigate the connections of neurons in fly brains (Drosophila). They produce a lot of data and are committed to sharing this openly. They share their data via the Virtual Fly Brain platform (VFB).

This platform was set up in 2009 by a group of researchers in Cambridge and Edinburgh; some of the VFB team are now also involved in the Connectomics group so there is a close relationship between these projects. The platform was created as a domain-specific location to curate existing data, taken from the literature, on Drosophila neurons and for curating and sharing new data produced by researchers working in this area.

Initially it was set up thanks to a grant from the Biotechnology and Biological Sciences Research Council (BBSRC). After an initial three year grant, the BBSRC declined to fund the database further. One likely reason for this is that the BBSRC resources scheme explicitly favours resources with a large number of UK users. The number of UK researchers who use Drosophila brain image data is relatively small (<10 labs), whereas the number of international researchers who use this data is relatively large, with an estimated 200 labs working on this type of data in other parts of the world.

Subsequently, the Wellcome Trust stepped in with funding for a further three years, due to end in September 2017. Currently it is uncertain whether or not they will fund it in the future. By now, almost eight years after its creation, VFB has become the go-to source for openly available data on Drosophila brain information and images integrated into a queryable platform. No other resource like it exists and no other research group is making moves to curate Drosophila neurobiology data openly. The VFB case raises interesting and important questions about how resources are funded and the future of domain specific open infrastructures.

The status quo

On the one hand funders like the Wellcome Trust, Research Councils UK and National Institutes of Health (NIH) are encouraging researchers to use domain specific repositories for data sharing. Yet on the other, they are acknowledging that the current approaches for these resources are not necessarily sustainable.

A recent review on building and sustaining data infrastructures commissioned by the Wellcome Trust acknowledges that in light of the FAIR principles “it is clear that data is best made available through repositories where aggregation can add most value”, which is arguably in a domain-specific repository. Use of domain-specific repositories allows data to be aggregated with similar data recorded using the same metadata fields.

It is also clear that publishers can influence where data is deposited, with publishers such as Nature Publishing GroupPLOS and F1000 all recommending subject-specific repositories as the first choice place for deposition. If no subject-specific repository is available then unstructured repositories, such as Dryad or figshare are often recommended instead, which complicates infrastructure needs and therefore provisions.

The economic model for supporting data infrastructures is something the Wellcome Trust are considering, with reports recently published by other funding agencies (herehere and here). The Wellcome Trust’s commissioned review noted that project-based funding for data infrastructures in not sustainable in the long term.

However, historically funders have encouraged, and still encourage, the use of domain specific resources, which have been born from project-based funding because of a lack of provision elsewhere. This has created a complex situation – researchers created domain specific data infrastructures using their project funding; these have become the subject norm; funder’s encourage their use, but now don’t have the mechanisms to be able to pledge sustained long-term funding.

National interests?

What is the responsibility of national funders to research resources that are internationally important? Academic research is collaborative. It crosses borders and utilises shared knowledge regardless of where it was generated and this is acknowledged by funders who see the benefits of collaboration. Yet, the strategic goals of funders, such as the BBSRC, are often focused on the national level when it comes to relevance and importance.

On the one hand it is understandable that funders concentrate on national interests – taxpayers’ money goes into the funder’s coffers and therefore they have a responsibility to those taxpayers to ensure that the money is spent on research that benefits the nation.

But, one could argue that international collaboration is in the national interest. The US-based NIH funds resources that are of international importance, including most of the model organism databases and genomic resources, such as the Gene Expression Omnibus. These are highly used by US researchers so one could argue that NIH are acting in the national interest but they are open to researchers all over the world and therefore constitute a resource of international importance.

Wellcome Trust do have a global outlook when it comes to funding, with 21% of their total spend (2015-6) going to projects outside of the UK. Yet, the VFB resource is still vulnerable despite being an internationally important resource.

One of the motivations for the Connectomics group to to participate in the Open Research Pilot is to open a dialogue with the Wellcome Trust about these issues. The Wellcome Trust are committed to strategically investing in Open Research and encourage the use of domain-specific resources. The Connectomics group are interested in how will this strategic investment translate into actual funding decisions now and into the future.

Issues on which researchers would like clarification

All the researchers who are part of the Open Research Pilot have had the opportunity to contribute to questions on open resources sustainability. Posts on the funder’s and University’s perspective will be published as parts 2 and 3 of this blog.

  1. What do you think is the responsibility of national funders towards research resources that are of more international benefit than national?
  2. How do you think the funding landscape will react to the move towards open research in terms of supporting the sustainability of resources used for curating and sharing data?
  3. Researchers are asked to share their data in domain specific resources if they are available. There are 1598 discipline specific repositories listed on and each one needs to be supported. How big does a research community need to be to expect support?
  4. What percentage of financial support should be focussed on resources versus primary research?
  5. If funders are reluctant to pay for domain specific resources, is there a need to move to a researcher pays model for data sharing rather than centrally funding resources in some circumstances? Why? How do they envisage this being paid for?
  6. How can we harmonise the approach to sustainable open resources across a global research community? Should we move to centralised infrastructures like the European Open Science Cloud?
  7. More generally how can funders and employers help to incentivise open research (carrot or stick?)
  8. Wellcome often tries to act in a way to bring about change (e.g. open access publishing): Do they envisage that the long term funding of open research (10-20 years from now) will be very different from the situation over e.g. the next 5 years?

Originally published 23 June 2017
Written by Dr Lauren Cadwallader

Creative Commons License

Reflections on Open Research – a PI’s perspective

Originally published 22 June 2017 Written by Dr Marta Teperek

As part of the Open Research Pilot Project, Marta Teperek met with Dr David Savage and asked him several questions about his own views and motivations for Open Research. This led to a very inspiring conversation and great reflections on Open Research from the Principal Investigator’s perspective. The main points that came out of the discussion were:

  • Lack of reproducibility raises questions about scientific rigour, integrity and relevance of work in general
  • Being open is to work in a team and be collaborative
  • Open Research will benefit science as a whole, and not the careers of individuals
  • Peer review remains a critical aspect of the scientific process
  • Nowadays, global collaboration and information exchange is possible, making the data really robust
  • Funders should emphasise the importance of research integrity and scientific rigour

This conversation is reported below in the original interview format.

Motivations for doing Open Research

Marta: To start, could you tell me why you are keen on Open Research and why did you decide to get involved in the Open Research Pilot Project?

David: Sure, but before we start I wanted to stress that when I make comments about science, these are very general comments and they don’t apply to anyone in particular.

So my general feeling is that I am very concerned and disappointed about the lack of research reproducibility in science. Lack of reproducibility raises questions about scientific rigour, integrity and relevance of work in general. Therefore, I am really keen on exploring ways of addressing these failings of science and I want to make a contribution to solving these problems. Additionally, I am aware that I am not perfect either and I want to learn how I can improve my own practice.

Were there any particular experiences which made you realise the importance of Open Research?

This is just the general experience of reading and also reviewing far too many papers where I thought that the quality of underlying data was poor, or authors were exaggerating their claims without supporting evidence. There is too much hype around, and the general awareness about the number of papers published in high impact journals which cannot be reproduced makes the move to more transparent and open approaches necessary.

Do we need additional rewards for working openly?

How do you think Open Research could benefit academic careers?

I am not sure if Open Research could or should benefit academic careers – this should not be the goal of Open Research. The goal is to improve the quality of science and therefore the benefit of science to the public. Open Research will benefit science as a whole, and not the careers of individuals. Science has become very egotistical and badge –accumulating. We should be investigating things which we find interesting. We should not be motivated by the prize. We should be motivated by the questions.

In science we have far too many people who behave like bankers. Publishing seems to be the currency for them and thus they are sloppy and lack the necessary rigour just because they want to publish as fast as they can.

In my opinion it is the responsibility of every researcher to the profession to try to produce data which is robust. It is fine to make honest mistakes. But it is not acceptable to be sloppy or fraudulent, or not to read enough literature. These are simply not good enough excuses. I’m not claiming to be perfect. But I want to constantly improve myself and my research practice.

Barriers to greater openness in research

What obstacles may be preventing researchers from making their research openly available?

The obvious one is competition for funding, which creates the need to publish in high impact factor journals and consequently leads to the fear of being scooped. And that’s a difficult one to work around. That’s the reason why I do not make everything we do in my research group openly available. However, looking at this from society’s perspective, everything should be made openly available, and as soon as possible for the sake of greater benefit to mankind. So balance needs to be found.

Do you think that some researchers might want to make their research open, but might not know how to do it, or might not have the appropriate skills to do it?

Definitely. Researchers need to know about the best ways of making their research open. I am currently trying to work out how to make my own project’s website more open and accessible to others and what are the best ways of achieving this. So yes, awareness of tools and awareness of resources available is necessary, as well as training about working reproducibly and openly. In my opinion, Cambridge has a responsibility to be transparent and open about its processes.

Role of peer-review in improving the quality of research

What frustrates you most about the current scholarly communication systems?

Some people get frustrated with the business model of some of the major publishers. I do not have a problem with it, although I do support the idea of pre-print services, such as bioRxiv. Some researchers get frustrated about long peer-review process. I am used to the fact that peer-review is long, and I accept it because I do not want fraudulent papers to be published. However, flawed peer review, such as biased peer-review or lack of rigorous peer review, is not acceptable and it is a problem.

So how to improve the peer-review process?

I think that peer-reviewers need to have greater awareness of the need for greater rigour. I was recently asked to peer review an article. The journal had dedicated guidance for peer reviewers. However, the guidance did not contain any information about suitability to undertake the peer-reviewing work. Peer-reviewer guidance documents need to address questions like: Do you really know what the paper is about? Do you know the discipline well enough? Are there any conflicts of interest? Would you have the time to properly peer-review the work? Peer-review needs to be done properly.

What do you think about the idea of journals employing professional peer-reviewers, who could be experts in their respective fields and could perform unbiased, high quality peer-review?

This sounds very reasonable, as long as professional peer-reviewers stay up to date with science. Though this would of course cost money!

I suppose publishers have enough money to pay for this. Have you heard of open peer-review and what do you think about it?

I think it is fine, but it might be subject to cronyism. I suspect that most people will be more likely to agree for their reviews to be made open as long as they make a recommendation for the paper to be accepted.

I recently reviewed a paper of a senior person and I rejected it. But if I made my review open, it would pose a risk to me – what if the author of the paper I rejected was the reviewer of my future grant application? Would they still assess my grant application objectively? What if people start reviewing each other’s papers and start treating peer-review as a mechanism to exchange favours?

The future of Open Research is in your hands

Who or what inspires you and makes you optimistic about the future of Open Research?

In Cambridge and at the Wellcome Trust there are many researchers who care about the quality of science. These researchers inspire me. These are very clever people, who work hard and make important discoveries.

I am also inspired by teamwork and collaboration. In Big Data and in human genetics in particular, people are working collectively. Human genetics and epidemiology are excellent examples of disciplines where 10-20 years ago studies were too small to allow researchers to make significant and reproducible conclusions. Nowadays, global collaboration and information exchange is possible, making the data really robust. As a result, human genetics is delivering really important observations.

To me, part of being open is to work in a team and be collaborative.

If you had a magic wand and if you could get one thing changed to get more people share and open up their research, what would it be?

Not sure… I suppose I am still looking for it! Maybe I will find one during the Open Research Pilot Project. Seriously speaking, I do not believe that a single thing could make a difference. It is the little things that matter. For example, on my side I am trying to make my own lab and institute more aware of reproducibility issues and ensure that I can make a difference in my own environment.

So as a Group Leader, how do you ensure that researchers in your own group are rigorous in their approach?

First, I really make them aware of the importance of reproducible research and of scientific rigour. I am also making a lot of effort to ensure that my colleagues are up to date with literature. I ask them if they read important literature and if they are unable to answer I ask them to do their homework. I am also imposing rigorous standards for experiments. In my lab people repeat the key experiments, or those which are particularly surprising, in a blind fashion. It takes a lot of time and extra resources, but it is important not to be too quick and to validate findings before making claims.

I am also ensuring that my people are motivated. For example, even though everyone helps each other in my group, all PhD students have direct access to me and we have regular discussions about their work. It is important that your group is of a manageable size; otherwise, as a group leader, you will not know all your people and you will not be able to have regular discussions about their work.

How do you identify people who care about reproducible research when making hiring decisions?

I ask all prospective applicants to make a short presentation about their previous work. During their presentation I ask them to tell me exactly what their research question was and how confident they were about their discovery. I am looking for evidence of rigorous methodology, but also for honesty and for people who are not overselling their findings.

In addition, I ask about their career goals. If they tell me that their career goal is to publish in Nature, or have two papers in Science, I count this against them. Instead, I favour applicants who are question-driven, who want to make progress in understanding how things work.

Role of funding bodies in promoting Open Research

Do you think that funders could play a role in promoting Open Research?

Funders could definitely contribute to this. The Wellcome Trust is a particularly notable example of a funding body keen on Open Research. The Trust is currently looking into the best ways to make Open Research the norm. Through various projects such as the Open Research Pilot, the Trust helps researchers like myself to learn best practice on reproducible research,and also to understand the benefits of sharing expertise to improve skills across the research community.

Do you think funder policies to mandate more openness could help?

Potentially. However, policies on Open Access to publications are easy to mandate and relatively easy to interpret and implement. It is much more difficult for Open Research. What does Open Research mean exactly? The right scope and definitions would be key. What should be made open? How? The Wellcome Trust is already doing a lot of work on making important research results available, and human genomic data in particular. But making your proteomic and genomic data publicly available is slightly different from ensuring that your experiments are rigorous and your results honest. So in my opinion, funders should emphasise the importance of research integrity and scientific rigour.

To close our discussion, what do you hope to achieve through your participation in the Open Research Pilot Project?

I want to improve my own lab’s transparency. I want to make sure that we are rigorous and that our research is reproducible. So I want to learn. At the same time I wish to contribute to increased research integrity in science overall.


Marta Teperek would like to thank SPARC EUROPE and Dr Joyce Heckman for interviewing her for the Open Data Champions programme – many of the questions asked by Marta in the interview with Dr David Savage originate from inspiring, open questions prepared by SPARC EUROPE.

Originally published 22 June 2017
Written by Dr Marta Teperek

Creative Commons License

Open at scale: sharing images in the Open Research Pilot

Originally published 8 May 2017 Written by Rosie Higman and Dr Ben Steventon
Dr Ben Steventon is one of the participants in the 
Open Research Pilot. He is working with the Office of Scholarly Communication to make his research process more open and here reports on some of the major challenges he perceives at the beginning of the project.

The Steventon Group is a new group established last year which looks at embryonic development, in particular focusing on the zebrafish. To investigate problems in this area the group uses time-lapse imaging and tracks cells in 3D visualisations which presents many challenges when it comes to data sharing, which they hope to address through the Wellcome Trust Open Research Project. Whilst the difficulties that this group are facing are specific to a particular type of research, they highlight some common challenges across open research: sharing large files, dealing with proprietary software and joining up the different outputs of a group.

Sharing imaging data

The data created by time-lapse imaging and cell tracking is frequently on a scale that presents a technical, as well as financial, challenge. The raw data consists of several terabytes of film which is then compressed for analysis into 500GB files. These compressed files are of a high enough quality that they can be used for analysis but they are still not small enough that they can be easily shared. In addition the group also generates spreadsheets of tracking data, which can be easily shared but are meaningless without the original imaging files and specific software to allow the two pieces of data to be connected. One solution which we are considering is the Image Data Resource, which is working to make imaging datasets in the life sciences, which have not previously been shareable due to their size, available to the scientific community to re-use.

Making it usable

The software used in this type of research is a major barrier to making the group’s work reproducible. The Imaris software the group uses costs thousands of pounds so anything shared in their proprietary formats are only accessible to an extremely small group of researchers at wealthier institutions, which is in direct opposition to the principles of Open Research. It is possible to use Fiji, an open source alternative, to recreate tracking with the imaging files and tracking spreadsheets; however, the data annotation originally performed in Imaris will be lost when the images are not saved in the proprietary formats.

An additional problem in such analyses is the sharing of protocols that detail the methodologies applied, from the preparation of the samples all the way through data generation and analysis. This is a common problem with standard peer-review journals that are often limited in the space available for the description of methods. The group are exploring new ways to communicate their research protocols and have created an article for the Journal of Visualised Experiments, but these are time consuming to create and so are not always possible. Open peer-review platforms potentially offer a solution to sharing detailed protocols in a more rapid manner, as do specialist platforms such as Wellcome Open Research and

Increasing efficiency by increasing openness

Whilst the file size and proprietary software in this type of research presents some barriers to sharing, there are also opportunities through sharing to improve practice across the community. Currently there are several different software packages being used for visualisation and tracking. Therefore, sharing more imaging data would allow groups to try out different types of images on different tools and make better purchasing decisions with their grant money. Furthermore, there is a great frustration in this area that lots of people are working on different algorithms for different datasets, so greater sharing of these algorithms could reduce the amount of time wasted creating algorithms when it might be possible to adapt a pre-existing one.

Shifting models of scholarly communication

As we move towards a model of greater openness, research groups are facing a new difficulty in working out how best to present their myriad outputs. The Steventon group intends to publish data (in some form), protocols and a preprint at the same time as submitting their papers to a traditional journal. This will make their work more reproducible, and it also allows researchers who are interested in different aspects of their work to access the bits that interest them. These outputs will link to one another, through citations, but this relies on close reading of the different outputs and checking references. The Steventon group would like to make the links between the different aspects of their work more obvious and browsable, so the context is clear to anyone interest in the lab’s work. As the research of the group is so visual it would be appropriate to represent the different aspects of their work in a more appealing form than a list of links.
The Steventon lab is attempting to link and contextualise their work through their website, and it is possible to cross-reference resources in many repositories (including Cambridge’s Apollo), but they would like there to be a more sustainable solution. They work in areas with crossovers to other disciplines – some people may be interested in their methodologies, others the particular species they work on, and others still the particular developmental processes they are researching. There are opportunities here for openness to increase the discoverability of interdisciplinary research and we will be exploring this, as well as the issues around sharing images and proprietary software, as part of the Open Research Pilot.

Originally published 8 May 2017
Written by Rosie Higman and Dr Ben Steventon

Creative Commons License

Open Research Project, first thoughts

Originally blogged 14 February 2017 Written by Dr Laurent Gatto 

I am proud to be one of the participants in the Wellcome Trust Open Research Project(and here). The call was initially opened in December 2016 and was pitched like this:
Are you in favour of more transparency in research? Are you concerned about research reproducibility? Would you like to get better recognition and credit for all outputs of your research process? Would you like to open up your research and make it more available to others?If you responded ‘yes’ to any of these questions, we would like to invite you to participate in the Open Research Pilot Project, organised jointly by the Open Research team at the Wellcome Trust and theOffice of Scholarly Communication at the University of Cambridge.

This of course sounded like a great initiative for me and I promptly filed an application.

We had our kick-off meeting on the 27th January, with the aim of getting to know each other and somehow define/clarify some of the objectives of the project. This post summarises my take on it.

Here’s how I introduced myself.

Who are you?

Laurent Gatto, Senior Research Associate in the Department of Biochemistry, physically located in Systems Biology and the Maths Department. SSI fellow and Software/Data Carpentry instructor and generally involved in the Open community in Cambridge, such as OpenConCam and Data Champions initiative.

What is your research about and what kind of data does your research generate?

My area of research is computational biology, with special focus on high-throughput proteomics and integration of different data and annotations. I use raw data produced by third parties, in particular the Cambridge Centre for Proteomics (mass spectrometry data), and produce processed/annotated/interactive data and a lot of software (and also here).

What motivated you to participate in the Pilot?

Improve openness/transparency (and hence reproducibility/rigour) in my research and communication, and participate in improving openness (and hence reproducibility/rigour) more widely.

What kind of outputs are you planning to share? Do you foresee any difficulties in sharing?

My direct outputs are systematically shared openly early on: open source software (before publication), pre-prints, improved data (as data packages). Difficulties, if any, generally stem from collaborators less willing to share early and openly.

A personal take on the project

It is a long project, 2 years, and hence a rather ambitious one, of a unique kind. Hence, we will have to define its overall goals as we go. The continued involvement of the participants over time will play a major role in the project’s success.

What are attainable goals?

It is important to note that there is no funding for the participants. We are driven by a desire to be open, benefit from being open and the visibility that we can gain through the project, and the prospect that the Wellcome Trust will learn from our experience and, implement any lessons learnt. We get to interact with each other and with research support librarians, who will help us throughout the duration of the project. We also commit to sharing of research outputs beyond traditional publications and to engage with the Project, by participating in Project meetings and contributing to Project publications.

A lot of our initial discussions centred around rewards for open research or, actually, lack thereof and perceived associated risks. Indeed, the traditional academic rewarding system and the competitiveness in research leaves little room for reproducibility and openness. It is, I believe, all participants hope that this project will benefit us, in some form or another.

A critical point that is missing is the academic promotion of open research and open researcher, as a way to promote a more rigorous and sound research process and tackle the reproducibility crisis. What should the incentives be? How to make sure that the next generation of academics genuinely value openness and transparency as a foundation of rigorous research?

Some desired outputs

Ideally, I would like that the Wellcome Trust’s famous Research investigator awards to be de facto Open research investigator awards. There’s currently a split (opposition?) between doing research and supporting open science when doing research. In every grant I have written, I had to demonstrate that the team had a track record, or was in a good position to successfully pursue to proposed project. Well, how about demonstrating a track record in being good in opening and sharing science outputs? Every researcher submitting a grant should convincingly demonstrate that they are, have been and/or will be proactive open researcher and openly disseminate all the outputs. By leading by example in the frame of this Open Research Project, this is something that the Wellcome Trust could take away from.

Unfortunately, it is a fact that open science is not on the agenda of many (most?) more senior researchers and that they are neither in a position to be open nor that open science is a priority at all. I find it particularly disheartening that many senior academics (i.e. those that will sit on the panel deciding if I’m worth my next job) consider investing time in open science and the promotion of open science as time wasted of actually doing research. A bit like time for outreach and promotion of science to the wider public is sometimes looked down at, as not being the real stuff.

Another desire is that this project will enable us to influence funders, such as the Wellcome Trust, of course, but also more widely the research councils.

As a concrete example, I would like all grants that are accepted to be openly published beyond the daft layman summary. Published grants after acceptance should include data management plan, the pathway to impact, possibly more, and these could then be used to assess to what extend the project delivered as promised.

This serves at least two purposes. First, it is a way to promote transparency and accountability towards the funder, scientific community and public. Also, it is a great resource for early career researchers. Unless there is specific support in place, writing a first grant is not an easy job, especially given the multitude documents to prepare in addition to the scientific case for support. And even for more experienced researchers, it can’t harm to explore different approaches to grant writing.

Another concrete output is the requirement for a dedicated software management planfor each grant that involves any software development. I certainly consider my software to be equivalent to data and document it as such in my DMPs, but there seems to be a need for clarification.

I believe that I do a pretty decent job in conducting open science: pre-prints, open access, release data, … In the frame of this project, I shall do a better job at promoting open science for its own sake.

I also hope that by bringing some of my projects under the umbrella of the the Open Research Project, I will benefit from a broader dissemination that will, directly or indirectly, be beneficial for my career (see the importance of benefits and rewards above).

Next steps

It is important to make the most out of this unique opportunity. We need to create a momentum, define ambitious goals, and work hard to reach them. But I also think that it is important to get as much input as possible from the community. Nothing beats collective intelligence for such open-ended projects, in particular for open projects.

So please, do not hesitate to comment, discuss on twitter or elsewhere, or email me directly if you have ideas you would like to promote and or discuss.

Originally blogged 14 February 2017
Written by Dr Laurent Gatto 

Creative Commons License

Piloting more openness at the University of Cambridge

Marta Teperek describes a new pilot project being undertaken at the University of Cambridge with the aim of finding out what can be done to bring about more open research practices. Originally published 10 February 2017

The problem: reproducibility crisis and lack of transparency in research

There have been a lot of discussions recently about the reproducibility crisis and about the growing distrust among the public in the quality of research. As illustrated in our ‘Case for Open Research’ series of blog posts, one of the main reasons for this is that researchers are currently rewarded for the number of papers they publish in high impact factor journals, and not necessarily for the quality of work that they are doing. University of Cambridge researchers have clearly indicated that the lack of incentives is one of the main reasons that discourages them from adopting a more open research practice.

Joining forces with the Wellcome Trust

As a way of addressing this issue, we approached the Open Research team at the Wellcome Trust. The Wellcome Trust are natural allies, as they always promoted the virtues of greater openness to their researchers. As one of the first funding bodies to introduce policies on open access and on data management and sharing, they have been leaders in this area, so it seemed natural to speak to them about this.

The Wellcome Trust has started to make moves in proactively supporting open research beyond enforcing their open access compliance requirements. To promote immediate and transparent research sharing, they recently launched Wellcome Open Research. This publication platform that allows researchers to submit articles about virtually any research output and get published in a matter of days with author-led open peer review occurring afterwards. The Wellcome Trust is now considering making open research one of their strategic priorities.

From our discussions, we quickly realised that we have a lot of shared interests, and joining forces to tackle the reluctance to adopting open research practices made a lot of sense. The final result was the idea to launch the Open Research Pilot Project at Cambridge.


The Open Research Pilot – understanding the barriers to “openness”

We conceive the project as being a two-year experiment, which would allow us to gain an understanding of what is needed for researchers to share their research results and get credit for all outputs of the research process. These include non-positive results, protocols, source code, presentations and other research outputs beyond the remit of traditional publications.

The project aims to understand the barriers preventing researchers from sharing their research outputs, including resource and time implications, as well as what the incentives are. The Project aims to utilise the Wellcome Open Research, together with other channels, to share these outputs.

An invitation to take part in the pilot was sent to all researchers at Cambridge funded by the Wellcome Trust. Participating researchers had to commit to sharing of research outputs beyond traditional publications and to engage with the project, by participating in project meetings and contributing case studies about their experience of taking part in the pilot.

Recruiting researchers

One of our biggest questions was whether how willing would people be to participate in the pilot. We have not offered any incentive other than encouraging researchers to contribute to the greater good. Support is offered by the Wellcome Trust and Cambridge Open Research team members for those who have agreed to be part of this experiment, but no financial aid to prospective participants has been offered. We thought that regardless of the outcome, that inviting researchers would be a good exercise to go through – we thought that if no one applied, we would have learnt that doing ‘the right thing’ was not a good enough motivator.

To our surprise, we received several fantastic applications from individual researchers and research groups who demonstrated a great interest and willingness to practice open research. We initially planned to work with two research groups, but given the quality of applications received and passion for Open Research expressed by the applicants, we decided to extend the scope of the project to four research groups. We have selected researchers doing different types of research, with the aim of learning about distinct problems in sharing that are experienced in diverse research disciplines:

  • Dr Laurent Gatto carries our research in computational biology, with a special focus on proteomics data. From his participation, we aim to find out, how to effectively share the research data and the code needed to reproduce them.
  • Dr David Savage researches the molecular pathogenesis of the consequences of obesity. Through his participation, we aim to identity what the problems are with sharing data coming from human participants.
  • Dr Benjamin Steventon is a developmental biologist generating and analysing large-scale imaging datasets. We intend to find out from his participation if there are image repositories that allow the sharing of large image datasets in a re-usable way.
  • Dr Marta Costa and Dr Greg Jefferies (and others) are researchers leading the work on two collaborative projects: Connectomics and Virtual Fly Brain, which will create interactive tools to interrogate Drosophila neural network connections. By being part of this pilot they will inform us of the issues with sharing complex interactive datasets, and help answer questions around the long-term preservation of complex digital objects.

Identifying researchers

So what motivated these researchers to apply for the project? We asked this question at the application stage and were positively surprised by the altruistic answers that we received. Our researchers were largely driven by a desire to improve the research process. We have seen responses like:

  • “Openness in research, from data and software to publication, is a central pillar of good research.”
  • “I am very concerned (disappointed as a scientist) by the current wave of ‘unreproducible’ and/or ‘irrelevant’ research, and am very passionate about contributing to improving scientific endeavour in this regard.”
  • I am very enthusiastic about exploiting new ways of sharing my research output beyond the established peer-review journal system.”
  • “I believe that sharing research outputs fully, including data and code are essential to accelerate research, and I have benefitted from it in my own research.”

Summarising, researchers expressed a great desire for contributing to a cultural change. Researchers wanted to change the way in which research was disseminated and to increase research transparency and reproducibility.

Let’s get to work

We all met – the researchers, Wellcome Trust and Cambridge open research teams – in January to officially start the two-year project. Each research group was appointed a facilitator – a dedicated member of the Cambridge open research team to support researchers throughout the project. Research groups will meet with their facilitators on a monthly basis in order to discuss shareable research outputs and to decide on best ways to disseminate them. Every six months all project members will meet to discuss barriers to sharing outputs that have been identified through the pilot and to assess the progress of the Project.

One of the main goals of the project is to learn what the barriers and incentives are for open research and to share these findings with others interested in the subject to inform policy development. Therefore, we will be regularly publishing blog posts with case studies describing what we have discovered while working together. There will also be an update from each research group every six months.

We will also be publicly sharing all main outputs of the Project. To date, we have shared the archived call for participants and the presentation from the kick off meeting on 27 January.

We are all extremely excited about  beginning this openness and we suggest that anyone interested in the open research practice watches this space – you can keep up to date on all aspects of this project here.

A version of this blog has also been posted on University of Cambridge Office of Scholarly Communication’s blog “Unlocking Research“.