Open Research Pilot case studies: sharing all research outputs and future sustainability of data repositories

In this final blog from the researchers involved in the Open Research Pilot, the Jefferis group discuss their participation.  During this time, their open interests have been focused particularly on how to share all outputs from the research process, and on issues around sustainability of data repositories into the future. 


Dr Greg Jefferis believes that sharing research outputs fully – including data and code – is essential to accelerate research, and he has himself benefited from others sharing unpublished research outputs during his career. He hopes to create a standard in the field of neural circuits / connectomics that echoes the very high standards of sharing amongst Drosophila researchers in the past. He, together with Matthias Landgraf (Department of Zoology), recently started a new group funded by the Wellcome Trust, focusing on Drosophila connectomics. This project is an international collaboration, and will generate outputs besides publications e.g. neuronal skeletons and analysis code, that will also have great significance for over 50 labs working in this area. With this in mind they are collaborating with Virtual Fly Brain (VFB), a Wellcome Trust-funded web resource that curates and disseminates Drosophila neuroscience data to make their research freely accessible. They already plan to release and share key data via VFB on publication, and already use practices that fit with the type of data they use and produce.

They joined the Open Research Pilot to look for ways to share additional data and interim results with collaborators, and to start a discussion on what resources are needed to maintain data repositories that are functional long-term and accessible to all.


As part of their effort to identify ways to easily share results with collaborators, and ultimately with their wider community, the Jefferis group also took part in the Electronic Lab Notebook (ELN) trial run by the University. Unfortunately, they did not find an ELN that worked better than the tools they already use. The trial process was, however, helpful in raising awareness in the group of the available ELNs, and in identifying the features and functionality that are the most important for the group members. Greg and his group will continue to follow the development of ELNs, as they might become relevant to them in the future.

The group became more aware of the support that the Office of Scholarly Communication (OSC) provides or can facilitate for researchers. Dr Marta Costa (the Project Lead for the group) and Greg felt that their group would benefit from some dedicated training on using GitHub, as this is an important tool in the group’s research practices. The OSC doesn’t run this training but was able to ask for help from the Data Champion community that they facilitate. Data Champions are volunteers from around the University who want to help foster good data management practices. Three members were happy to collaborate to provide the Jefferis group with the training and support they needed. This benefited the group members but also proved interesting for the OSC, who were able to understand more about how or where researchers were looking for support, as Marta and Greg were initially unaware of the Data Champions and the support they could offer.

Both Greg and Marta found it fruitful to engage with the Wellcome Open Research team. Specifically, they found discussions useful around how open research (including preparing, sharing and managing it) should be funded; this issue is very relevant to the subject-specific data repositories they use, such as VFB. As part of this discussion, Greg and Marta, along with Dr Lauren Cadwallader and Dr Dave Gerrard from Cambridge University Libraries and David Carr from the Wellcome Open Research Team, authored a series of three blog posts on this issue, each from a different point of view: resource, institution and funders.

Greg and Marta also found the pilot project thought-provoking because they were exposed to other research groups and their needs around open research. They found it interesting to see how the demands and practices for open research need to serve various types of data and outputs.


Even prior to the pilot Greg and Marta believed that open research should be the norm. Being involved in the project has however, reinforced their view that researchers need more support to engage with and implement open research as standard practice. This support should take the form of funding, training and/or infrastructure, and should focus not only on targeting the end point of a research project, but on developing awareness and implementing an open research culture for new and existing students and staff.

From a research group point of view, Greg and Marta think it would be helpful to have training (online and/or in person) available for new staff and students. This training would need to be discipline-specific or specific for the group, for example in the case of the Jefferis group there is a need for training on writing code that can be shared and reused (similar to the training session organised during the pilot), and on data management.

From the point of view of resources such as VFB, the issue that still needs addressing is funding. There are currently no mechanisms in the UK or worldwide for the long-term support of resources that are used by an international community. In fact, some of the genomic resources funded in the US by NHGRI have recently seen their funding significantly reduced. Although the integration and curation of data that is integral to the work of groups such as this one increases data reuse and accelerates research, there is no current funding mechanism that recognises the added value of these resources. To add to the complexity of this issue, the users of these resources are international – not bound by country borders, but by research subject.


Published 28 February 2019

As told to by the Jefferis group to the Open Research Pilot Research Support Team.

Open Research Pilot case studies: sharing image data

January 2017 saw the launch of the Open Research Pilot Project.  This two-year initiative comprised four volunteer University of Cambridge research groups, University Research Support, and Wellcome Trust’s Open Research Team.  The aims of the project were to perceive what is needed for researchers to make openly available, and be rewarded for, all outputs of the research process (e.g. along with traditional publications, other outputs include negative results, protocols, source code).

In the first of a series of blogs, Dr Ben Steventon talks to the research support team for the pilot about his group’s involvement with the project.  His particular Open interests through this project have been the how, where and what related to sharing very large file size image data.


Dr Ben Steventon applied to join the project because, while considerable advances have been made in the open sharing of sequencing data over the last ten years, the same cannot be said for the sharing of image data. He thinks that this is due to a few practical reasons that make it very time-consuming for researchers to get their imaging data ready for uploading to repositories.

Firstly, there is an increasing number of different imaging modalities available to researchers. While this is undoubtedly a very good thing for research, it does mean that it is very difficult to think of a standardized way to perform data annotation and descriptions of each image dataset. In itself, the fact that different labs will have different microscopy images of the same samples, or different samples with the same microscope is a good motivation for sharing image data in the first place. So much time is spent trying out different imaging set-ups at the start of a particular experiment and being able to access the trial-and-error periods of other research groups would be a major advantage to research. The process of going back over all the imaging data relating to a particular project or publication and annotating it in a way that is ready to share is a very time-intensive task. Perhaps the way forward would be to think of a way to integrate data collection with appropriate annotation as it is generated, so that the data would be ready to deposit at a click of a button. But how does one predict the specific repository at the start of a project that may last several years? In many cases there are no guarantees that data format and annotation templates will remain standard throughout this time.

A second barrier to data sharing is the overall size of imaging datasets, particularly now that image acquisition speed is increasing dramatically with new technologies such as light-sheet imaging. A single raw image data-set could exceed several terabytes. While some repositories are willing to take data of such size, it is not at all clear that researchers would be interested in starting with this level of the data. Many would be happy to start with already-processed imaging datasets, or even directly with the feature extracted and analyzed data that come from them. How do we go about sharing data at multiple levels of analysis, when different repositories would be interested in only one or two of them? There has to be some centralized place for interested researchers to go to, and from there follow links to the various other places that host the data. Should such a website be hosted at the level of an individual lab, imaging facility, or should it be community based?

Ben saw the project as a chance to learn more from people within Cambridge University Libraries about the local resources available, and how they might be able to support the specific challenges outlined above. He was also interested to hear more from Wellcome Trust about new ways to share research data in a more general sense, in the hope that this would sharpen up his understanding of the challenges facing the open sharing of imaging data and how these might be overcome.


Through the interactions with the library, Ben was introduced to a number of different repositories in the local area that might be able to host the data being generated in the lab.  These interactions have focused his attention on thinking about a) how to build an integrated website that will allow researchers to link out to these various repositories and b) how to use tools such as electronic lab books to keep track of experimental and imaging information so that the required data annotated can be streamlined at the point of data deposition. Through interactions with Wellcome Trust, he has learnt of their researcher enhancement scheme for open research funding, and crucially about the specific things that they are looking for in such proposals. He feels in a much better position to take on the challenges outlined above and thinks that his team can make progress from this point on.

Recently, Ben has published a paper in Development, that in part rests on results gained from a modified light-sheet microscope in the Cambridge Advanced Imaging Centre. Keen on sharing this data to a broader audience, he made use of both the Image Data Repository to share the raw data and a Wellcome Open Research Data Note to describing the data itself directly, taking it out of the context of the research article in Development. Making use of the Wellcome Open Research platform to publish data notes alongside resources such as IDR promises to be an effective manner of gaining increased feedback and audience for the data, as a series of additional reviewers are invited to peer-review the manuscript openly online. One difficulty of this approach was in coordinating the acceptance and publication of the data note and data with the research article. However, Development was very helpful in this respect as it was willing to add in the data sharing descriptions and citations as a correction post-publication.

The biggest surprise to Ben has been the degree of interest from Cambridge University Libraries and funders such as Wellcome Trust in trying to support Open Research in a broad sense. He observes that it seems to be a priority area for both groups, with researchers actually lagging behind a little in thinking about the problems associated with making this happen for the specific aspects relating to their research: clearly a lot more needs to be done to get these groups of people interacting more.

Over the course of the project, Ben has become much more aware of the specific problems involved relating to the open sharing of his research. He realises that the issues are going to be very field specific, but at the same time there will always be shared aspects between different fields. One positive aspect of the pilot was having researchers from these different fields together in the same room.


The principle thing Ben needs to get going is having a person in the lab with the appropriate skills and the time to focus on developing the framework required to have a sustainable system in place to share the team’s research outputs. Fortunately, this has now been provided by enhancement funding from Wellcome Trust. The specific issues in terms of locating repositories and obtaining the appropriate support have been provided from his interactions with Cambridge University Libraries staff. He very much hopes that this relationship can continue.

Furthermore, he is very interested in working with the Cambridge Advanced Imaging Center in the hope that the framework that is developed in his lab can be expanded to other users of the facility. Having Open Research practices at the level of initial data acquisition and processing could be a very interesting way to move forward.

As told to the Open Research Pilot Research Support Team