Category Archives: Love Data Week

Open Research Pilot case studies: promoting greater openness in research

In the second blog in our series marking the end of the Open Research Pilot ( a two-year initiative involving University of Cambridge research groups, University Research Support, and Wellcome Trust’s Open Research Team), Dr Laurent Gatto tells us about his group’s involvement with the project.  His particular Open interests during this time have been how to influence the research community in general towards greater openness.

START OF PROJECT

Dr Gatto initially applied to be part of the project so he could learn about other researchers’ views on open research, and to contribute to and promote open research. In particular, he thought that participating in a project initiated by the Office of Scholarly Communication (OSC) at the University of Cambridge and the Wellcome Trust seemed a good opportunity to influence the UK research environment towards greater openness. His greatest hope was to promote – directly or indirectly – greater openness in research as widely as possible, thanks to the reputation of the project organisers.

PROJECT IN PROGRESS

While some participating groups may have progressed towards greater openness individually, it was Dr Gatto’s hope to achieve a wider impact, beyond those around the table. However, for him, the project has been arguably too contained for that and so he is unclear about what has been achieved overall.

All participants had interesting inputs and some were already well versed in open research. He thought that the project could therefore have been much more ambitious through making use of the collective wisdom and experience by being open and collaborative, such as through asking for input from the community at large, opening up the discussion channels, and when specific questions arose, asking experienced members from the open community for advice.

LOOKING AHEAD

Dr Gatto suggests that there are two types of support that researchers need:

Firstly, technical support, helping researchers to discover and use open research platforms. In a minority of cases, new platforms might need to be developed (for really massive data for example, or for distributed computing requirements), but for the vast majority of researchers, reasonable technical solutions and support are readily available on-line. Local, in-person support is helpful for providing a point of contact for face-to-face training, and for redirecting researchers to the right resources.

The second type of support needed should come from the institutions – senior academics, funders, etc. – to support researchers in being open and making them successful by being open. For example, funders are in a position to redefine priorities in research by promoting and funding researchers that demonstrate open and reproducible research. This type of support is something that has been generally missing in Dr Gatto’s experience; he believes the current priorities of senior management do not support the provision of adequate rewards for open researchers. This is the kind of support he would have needed as a researcher in Cambridge.

Dr Gatto welcomes the publication of peer review reports (signed or anonymous) and the promotion of pre-prints (including open, public review and discussion of pre-prints) as important current advances. He likes the Wellcome Trust’s recent call of Open Research Projects and the publication of all proposals. He thinks that such efforts promote open research throughout the community, across senior and early career researchers and students, demonstrating that openness is not only an afterthought any more, but becoming the default practice. He believes these measures will drive researchers to explore how to implement their research openly and explore technical solutions. Finally, he believes that educating under- and post-graduates about open research, either explicitly, or as part of other courses, should also lead to greater openness.

As told to the Open Research Pilot Research Support Team

Me, Myself and Data – David Marshall

For Love Data Week (12th-16th February 2018) we are featuring data-related people. Today we talk to David Marshall, FutureLib Project Coordinator, University Library

Telling Stories with Data

Let’s start with an easy one. What kind of data do you work with and what do you do with it?

I deal primarily with qualitative data collected through working with people, gathered in a number of different ways. Research methods tend to include things such as in-depth interviews, observation and shadowing, diary studies, as well as various remote data capture mechanisms. I am usually looking for a mix of attitudinal and behavioural data; comparing what people say with what they actually do. I use the insights and findings arrived at through the analysis of this data to make recommendations for service design and delivery on behalf of Cambridge University libraries.

Tell us how you think you can use data to make a difference in your field.

I am an advocate for the importance of using research and research data to understand the wider lives of people who use a product or service. This has long been an established principle in service design and delivery in the commercial sector, and libraries in UK Higher Education are learning to adopt this in order to tailor their services to the approaches, goals, needs and behaviours of their users. The data I work with often highlights aspects of the study and research lives of Cambridge students and academic staff which it would be difficult to fully uncover and explore through more ‘traditional’, quantitative methods, such as usage statistics and surveys. This in-depth, qualitative study of people provides valuable insights which can be used to inform the development of services and working practices that affect those people.

My ‘field’ is working within and for University of Cambridge library services; slightly oddly I am often conducting research, with researchers as the subjects of that research, with the aim of developing services that support research!

How do you talk about your data to someone outside of academia?

I’m going to turn this one on its head as, although I work within academia, I’m not involved in what would typically be described as academic research. I tend to refer to what I do as design research, i.e. with the end goal of using the data gathered and insights arrived at to inform service design and delivery. I often talk of ‘stealing’ methods from academic disciplines and areas such as anthropology and ethnography, and from the commercial design world. This can involve immersive research techniques such as ethnographic observation, or quick, easily-deployed techniques such as card sorting exercises and ad-hoc interviews. In terms of the data itself, I often talk of patterns emerging and insights developing. Immersing myself in the data over the course of its collection, through activities and tasks such as transcription, and again through the analysis process helps things to ‘take root’ and for these patterns and insights to become more clear.

Connected Conversations

What data-related challenges do you have to deal with in your research environment?

Collecting the data in the first place is one of the biggest challenges for me. To do my work I need data from real people, leading real, busy lives. Finding and connecting with the people I need to work with is a constant challenge. Happily, the need to go to people where they are and work around their schedules in fact leads to better data; I would much rather talk to people about their studies, research, or other aspects of their lives when they are in the middle of doing those things! Another challenge is finding the right tool for the job; over the years I have been lucky enough to work with and learn from people who have extensive experience in a wide variety of research techniques, but it can still be tricky to match the appropriate method/s to a specific question or area of study.

To add a more data-related challenge to the list of data-related challenges…: I deal with a lot of personal data, not just names and demographic information but a large amount of qualitative data gathered from individuals about their lives; their goals, motivations, points of frustration, and so on. This leads to challenges in terms of how data is collected, stored and used, even how it is considered during the analysis process.

How do you think these challenges might be overcome?

For my first points: the old Carnegie Hall adage… practice, practice, practice! Relationship building and communication is a huge part of what I do day-to-day; each time I need to find research participants it becomes a little easier due to the continuous work done in this regard over the years I have been working in my role.

For the latter: I think appropriate awareness is part of the battle. Working with research data, particularly that gained from working with people, demands high levels of awareness and an emphasis on reflection, and so it should! It is important to see qualitative data in context, for many reasons, and to be constantly aware of the ethical implications of its analysis and use.

If you were in charge what data-related rule would you introduce?

That every person I’m interested in finding out more about needs to supply me with it tout suite, please and thank you. No, that might be going too far…!

Without being specific, anything which increases the transparency of what will happen to data after it has been gathered is a good thing. I rarely struggle to get people to consent to participating in research once I have found and approached them, and am as transparent as I can be about why I need their data, what I’m planning to do with it, and where it will end up. Maybe I’m blessed by the context within which I work, and might be slightly naïve, but I can’t help but think that on any scale and in any circumstance this emphasis on transparency might be quite a useful thing.

We are Data

Tell us about your happiest data moment.

Around two years ago we (Futurelib) finished the data gathering phase of a project, Protolib, looking at the design of physical study spaces. We had prototyped different study spaces based on the findings of a collaborative design process conducted with Cambridge students and researchers. We conducted hours (as in 300+ hours…) of observation in these prototype spaces, and gathered data in various other ways, such as interviews with people leaving the spaces, feedback walls, comment cards and questionnaires. The first thing we did as researchers after this was to brainstorm the insights we had arrived at from this work. To see themes and ideas emerging so quickly, and to see them backed up and added to by the research data, was amazingly fulfilling. This is what ‘sold’ me on the value of ethnographic techniques; we had immersed ourselves so fully in the environments under study that we understood them to an extent which I would not have previously thought possible.

What advice do you have for someone who is just embarking on a career in your field?

Want to learn. Get interested in people; who they are, how they think and what they do. I don’t much like the idea of the cold, disinterested researcher. Whilst being aware of your own potential biases, and biases based on what you learn and uncover, care about the people you are working with and try to emphasise as far as is possible with what is important to them. If you don’t like talking to people and finding out about the way they work, this is possibly not quite the right job for you. Of course, there are areas of research in which disinterestedness is probably a very valuable characteristic, I just don’t think this applies to what I do.

What do you think the future of research data looks like?

Speaking about the context within which I work day-to-day, I think the future looks bright! Libraries and HE institutions are becoming increasingly interested in finding out more about the people their services support. In my area of work, usage statistics, quantitative survey mechanisms and other similar methods will always provide the broad strokes, and this is great. It is, however, absolutely not where gathering data should stop. I cannot over-emphasise the value of qualitative approaches and qualitative data in providing actionable evidence for service design.

In terms of the future of research data more generally, I don’t feel too qualified to comment… I would tentatively assume and hope that data will become more accessible, less owned, more malleable, and through this invite more discussion, criticism and conversation.

There is A LOT of data out there about all sorts of things and it is being collected all the time. Does anything frighten you about data?

From a personal standpoint, potentially losing focus. It is almost the modus operandi of what I do to collect as much data as possible about the lives of people studying and working at the University of Cambridge, so I do feel that I collect ‘A LOT’. I sometimes wonder about the nature of the data I gather, as I’m keen to emphasise with participants that I’m interested in all aspects of the ways in which they work, and more widely, the ways in which they live. This does, on occasion, lead to people sharing quite personal aspects of their lives. There are obvious concerns around how this data is handled and used, but, as mentioned previously, I feel that an appropriate level of awareness and diligence in this regard is a good starting point for working with this kind of data in a sensible, conscientious way.

Published 16 February 2018
Written by David Marshall
@futurelib
Creative Commons License

Me, Myself and Data – Keren Limor-Waisberg

For Love Data Week (12th-16th February 2018) we are featuring data-related people. Today we talk to Keren Limor-Waisberg , Founder and CEO of the Scientific Literacy Tool. Advocating for open access, citizen science and scientific literacy.

Telling Stories with Data

Let’s start with an easy one. What kind of data do you work with and what do you do with it?

I help people from all walks of life access, understand, and/or use scientific data and literature concerning their scientific topics of interest.

Tell us how you think you can use data to make a difference in your field.

A scientifically literate society is a society in which people are empowered with knowledge they can use to achieve their different goals. As we look at data and understand it, we acquire skills that are essential for both our personal and our societal development.

How do you talk about your data to someone outside of academia?

When I talk about data with someone outside academia, I will take the time to define any new terminology and make sure we understand each other.

Connected Conversations

What data-related challenges do you have to deal with in your research environment?

The main data-related challenge non-academics have is the access to data. Many datasets are simply not accessible to the public. Sadly, some datasets are not even available for other academics.

Once access is achieved, people will struggle with different formats, lack of metadata, different units and/or the lack of tools to process and analyse the data.

How do you think these challenges might be overcome?

The challenge of open data, making data accessible, is currently addressed by many countries. The European commission for communications networks, content and technology (CNECT), for example, are formulating directives that aim to open up and help reuse publicly funded research data in Europe.

Different organisations are now developing tools and packages that will help people work with datasets.

If you were in charge what data-related rule would you introduce?

As a citizen of the World, I advocate for open access, citizen science and scientific literacy so as to promote the understanding that knowledge empowers both individuals and societies to develop and prosper. To make this progress, I think we need to agree on common ethical guidelines – from the right of access to the right of use of publicly funded data.

We are Data

Tell us about your happiest data moment.

My happiest data moment was during my PhD. I calculated the performance of some viral elements using different tests. I had a lot of data and it took a while for the scripts to run. It was nerve-racking. I can still remember sitting there listening to the screeching sounds of the computer. And then one by one I got the results, and they all confirmed my hypothesis. It was great. It was a small piece of scientific knowledge, but I was the first person in the world to know about it.

What advice do you have for someone who is just embarking on a career in your field?

For someone embarking on a career in the field of promoting scientific literacy, I would recommend to be very patient. It is a slow process and there are many obstacles, but at the end it is a very rewarding profession.

What do you think the future of research data looks like?

I think that in the near future we will have much more publicly funded research data accessible. We will see more and more tools emerging to handle this data. More and more people will use this data and tools to make their statements, to dispute ideas, to create products and services, to entertain, or perhaps just to enjoy finding something new.

There is A LOT of data out there about all sorts of things and it is being collected all the time. Does anything frighten you about data?

I think it is important to make sure privacy and identities are protected when data is collected and shared.

Published 15 February 2018
Written by Keren Limor-Waisberg
@TheLiteracyTool, @OpenResCam
Creative Commons License

Me, Myself and Data – Melissa Scarpate

For Love Data Week (12th-16th February 2018) we are featuring data-related people. Today we talk to Melissa Scarpate, Research Associate in the Faculty of Education, PEDAL Centre.

Telling Stories with Data

Let’s start with an easy one. What kind of data do you work with and what do you do with it?

I work with large longitudinal data sets and large cross-cultural data. I really enjoy running latent growth models with the longitudinal data to assess changes over time in my variables of interest (primarily child/adolescent self-regulation and parenting). I use the cross-cultural data to test for differences or similarities in parenting and adolescent developmental outcomes.

Tell us how you think you can use data to make a difference in your field

By using large data sets that either have many time points or have many different countries and cultures represented, I am able to assess relationships between study variables in an impactful way. For instance, if I find that parental monitoring predicts lower levels of adolescent anxiety in 13,000 adolescents across 10 countries then I feel this information has a larger impact on families in a more global way than using a local data set with a small sample size.

How do you talk about your data to someone outside of academia?

Very carefully! I eliminate jargon and speak about the data in broad, general terms rather than getting into details. I would rather the person come away from our conversation with an understanding of what I do, what I have found in my research, and how this impacts society than to impress them with fancy words and statistics.

Connected Conversations

What data-related challenges do you have to deal with in your research environment?

I work in an office with others and it is important for each of us to keep the data confidential and out of sight of one another.

How do you think these challenges might be overcome?

Screen protectors, earplugs in the case of video coding, not printing any data, etc.

If you were in charge what data-related rule would you introduce?

If I were in charge of all data then I would create a rule that all data could be shared in an easy and collaborative way whilst maintaining study participants’ anonymity.

We are Data

Your happiest data moment?

When I finally got my latent growth model to run!

What advice do you have for someone who is just embarking on a career in your field?

Take as many classes in data management, methods, and statistics that you can and get experience in these concepts with researchers that have excellent skills and training in these areas while in graduate school.

What do you think the future of research data looks like?

Open, transparent, simplified with data visualisation techniques, and impactful.

There is A LOT of data out there about all sorts of things and it is being collected all the time. Does anything frighten you about data?

Technological advances such as my phone being able to predict where I am driving before I leave and my Echo Dot/Alexa picking up all of my conversations make me nervous. The benefits, so far, outweigh potential negatives (at least as I have experienced so far).

Published 14 February 2018
Written by Melissa Scarpate
Creative Commons License

Me, Myself and Data – Kirsten Lamb

For Love Data Week (12th-16th February 2018) we are featuring data-related people. Today we talk to Kirsten Lamb, Deputy Librarian, Department of Engineering.

Telling Stories with Data

Let’s start with an easy one. What kind of data do you work with and what do you do with it?

I work with bibliometric data for the most part. I’m not very systematic and tend to be rather intuitive about how I gather and work with it in order to tease out insights. Because I don’t usually have a specific question to get out of the data I tend to just explore to find out what is interesting about a body of literature.

Tell us how you think you can use data to make a difference in your field.

The idea that librarians can help define a research landscape and identify gaps is relatively new. I like to think that by learning to work with bibliometric data I can help researchers better engage with information professionals and give librarians the confidence to use their skills in the research context.

How do you talk about your data to someone outside of academia?

I tell them that by looking at patterns in publishing researchers can see where trends and gaps are, as well as exploiting those patterns to have a larger impact. But I also make sure to point out the fact that basing insights off of metrics is flawed. You have to understand what each metric is and isn’t measuring. None of the metrics are an indication of the quality of an individual piece of research and there’s no replacement for critical analysis to determine that.

Connected Conversations

What data-related challenges do you have to deal with in your research environment?

First, I don’t have a background in statistics or programming, so there’s a limit to how complex an analysis I can do. Second, the metrics themselves are limited, so communicating the value of the information embedded in the data is a challenge. Third, a lot of bibliometric software is based on use of a particular database’s API so it’s difficult to combine results from different databases to give a broader picture.

How do you think these challenges might be overcome?

All of them would be helped if I could learn to programme! Collaborating with someone who knew how to do the actual analysis bit would be great because that way I could provide the insights and figure out exactly what I wanted to measure and they could make it happen.

If you were in charge what data-related rule would you introduce?

People who write software that does data analysis would make it more user-friendly for people who don’t know how to code. Basically there’d be a WYSIWYG/Microsoft Excel-style programme for doing bibliometric analyses and generating beautiful graphics based on it that didn’t require any coding.

We are Data

Tell us about your happiest data moment.

I was pleased when I discovered that Web of Science does a lot of the analysis I wanted to do but thought I could only do if I had InCites or similar. As much as I like knowing what’s being measured and having an intimate knowledge of the data, sometimes it’s nice to just be able to click a few buttons and get a nice graph!

What advice do you have for someone who is just embarking on a career in your field?

I’d want to tell them that they don’t need to be a maths or programming whiz to do it, but I’m not yet convinced of that myself! I think the main thing is not to think of some metrics as good and some as bad. They’re all just tools and you need to know what they do in order to pick which ones you want to use. Always look under the bonnet!

What do you think the future of research data looks like?

While I’d love for it to be open, interoperable, integrated and well-indexed, I’m not sure that’s going to happen any time soon. Each time someone develops a new standard to rule them all, it just gets added to the growing list of standard

There is A LOT of data out there about all sorts of things and it is being collected all the time. Does anything frighten you about data?

Yes. I don’t think that as a species we’ve really figured out what it means to live in a data-rich ecosystem, and I mean that both metaphorically and literally. The rate at which data is growing is currently unsustainable from the perspectives of preservation, legislation, interpretation and energy use. While I’m definitely uncomfortable with how much certain companies know about me, I’m more concerned with the fact that collecting and managing that amount of data about everyone and everything is bad for the planet and we haven’t figured out how to make sure it’s safe. We need legislation and curation to catch up with technology instead of lagging about a decade behind.

Published 13 February 2018
Written by Kirsten Lamb
@library_sphinx
Creative Commons License

Me, Myself and Data – Dr Sudhakaran Prabakaran

For Love Data Week (12th-16th February 2018) we are featuring data-related people. Today we talk to Dr Sudhakaran Prabakaran, Lecturer/ Group leader, Department of Genetics.

Telling Stories with Data

Let’s start with an easy one. What kind of data do you work with and what do you do with it?

We use population level sequencing data sets from TCGA, mutation datasets from COSMIC, ClinVar, HGMD, curated database from other labs. We use discarded datasets, negative datasets, already published datasets, anything and everything. We develop and use structural genomics, mathematical modelling and machine learning tools to analyse mutations that map to noncoding regions of the human genome.

Tell us how you think you can use data to make a difference in your field.

We live on these datasets. Biological data is going to exceed 2.5 Exabytes in the next two years, and the bottleneck is the analysis of these datasets. Our job is to find patterns in these datasets. Rare variants and driver mutations become significant and identifiable only when we look for them in a population context.

How do you talk about your data to someone outside of academia?

​For us it is not difficult. The datasets we are using are generated and curated by governmental and international consortiums. They have done the bulk of publicity. For example, the TCGA dataset has all kinds of data from thousands of cancer patients and is curated by the NIH. The power of this data is for all to see. I just say we try to aid in cancer diagnosis by crawling through these datasets to find patterns.

Connected Conversations

What data-related challenges do you have to deal with in your research environment?

We are happy with the publicly available datasets. Our problem starts with the datasets we collect. How to store, analyse, and make it available for everyone to use are the questions we are trying to answer all the time.

How do you think these challenges might be overcome?

I am an ardent proponent of cloud-storage and computation. I believe that is the future. I am also aware that some countries are concerned with data migration outside their geographical boundaries.

If you were in charge what data-related rule would you introduce?

I am not going to make up anything new. Past US Presidents have made laws like any data generated with public funds should be made available.

Governmental organisations should demystify cloud based storage and computation processes. People are unduly worried. People are giving away more personal data wilfully on Facebook, Twitter, Instagram than through genome sequences collected by public consortiums.

Tell us about your happiest data moment.

It is not one moment, it is a series of moments up until now. I can run a viable research program with no startup money or funds just by scavenging through publicly available datasets.

What advice do you have for someone who is just embarking on a career in your field?

Learn machine-learning and cloud-computing

What do you think the future of research data looks like?

Lots of data analysis than data generation

There is A LOT of data out there about all sorts of things and it is being collected all the time. Does anything frighten you about data?

I am in fact excited. I believe we need to train more data scientists. We are in good times. Data is becoming truly democratic!

Published 12 February 2018
Written by Dr Sudhakaran Prabakaran
@wk181
Creative Commons License