Opening Your Research Data – Tips for Researchers
Authors:
Kristiina Väänänen, Project Manager and Senior Researcher, Karelia University of Applied Sciences
Tiina Muhonen, Project Specialist and Junior Researcher, Karelia University of Applied Sciences
Kaisa Varis, Information Specialist, Karelia University of Applied Sciencesonen, Karelia UAS, Finland
Photo by Anna Nekrashevich on Pexels
Open Research, Development, and Innovation (RDI) activities involve utilising open operational models within the RDI actions in research organisations. The types of openness are, for example, open operating culture, open publication of research results, opening research data, and using open research methods.
The openness of research data — such as surveys, interview data, or different types of observational data — enhances the research’s quality, transparency, and reliability. Making data accessible allows for its reuse in future studies. However, opening data requires that different aspects of data management procedures are planned carefully before data collection. Key considerations include informed research participant consent, data handling procedures, adhering to legal requirements (such as GDPR within the EU), and complying with the conditions imposed by research funders.
In the INVEST4EXCELLENCE project, one of the goals was to pilot practices that increase openness in research. Alongside open publishing, we explored the practical aspects of opening the data, the key factors that must be considered during research planning and how to share the data effectively. This text provides tips to researchers on how and what kinds of data can be opened and shared.
FAIR Principles for Openness
All research must ensure that collected data is accessible. This means that both people and systems must be able to find the data, understand the information it contains, and be capable of reusing, analysing and combining it. This goal is often described by the FAIR principles (Findable, Accessible, Interoperable, Re-usable). By following these principles, data is managed securely, and assigned permanent identifiers, which allows it to be referenced in other studies. It is essential to incorporate the FAIR principles during your research planning because it will simplify data handling later in the process and improve its accessibility and usability.
Image 1. The FAIR Principles (Findable, Accessible, Interoperable, and Re-usable)
If you are planning to open your research data for the first time, it is important to note that research institutions provide expert consultation and support services to help with different stages of the research process.
Data management planning and implementation
Karelia University of Applied Sciences requires all research and development projects to create a data management plan at the beginning of the project. The data management plan outlines e.g. the types of data that will be handled in the project, the methods for storing the data, and post-project data management procedures. The plan covers the entire life cycle of data from collection and processing to publication or archiving for future use. Many Finnish universities recommend using the DMPTuuli to create data management plans. It should be noted that some funders require a data management plan to be submitted during the project application stage. When planning research, it is also important to consider whether the data contains sensitive information. If there are ethical, legal, or contractual constraints for opening the data, it should not be made publicly available. In such cases, only descriptive data (metadata) can be published.
During the research planning and implementation phases, it is important to note that only anonymous data can be opened, and participants must be informed about the storage and potential opening of the data. To ensure research subjects’ consent for the reuse and opening of data, they must be made aware that the data collected in the study may be opened anonymously. Participants can be informed about the reuse and opening of data, for example, through a research permission letter that is provided to them for approval during the recruitment phase. Increasingly, funders may also require the data to be opened at least partially if possible. It is, thus, important to consider early in the research planning phase how the data will be stored and handled securely.
Data Handling in Research
The core principle is that the data to be opened should be the original data, which other researchers can reuse. If the data has been processed, such as translated into another language, this must be clearly stated in the metadata. During this phase, the data must be stored in a format that is easily accessible to others. In our INVEST4EXCELLENCE project, we conducted qualitative focus group interviews, transcribed the data, translated it into English, anonymised it, and organised it thematically using spreadsheet software.
Ensuring anonymity is the most challenging part of making research data open. The design of the anonymisation process must always be based on the characteristics of the dataset. When planning our anonymisation process, we followed the general guidelines on anonymisation outlined by the Finnish Social Science Data Archive’s Data Management Handbook. A key consideration in this process is determining whether an individual is identifiable in the data directly (e.g., via personal data) or indirectly by combining information from different sources.
In our INVEST4EXCELLENCE study, anonymising direct personal identifiers, such as names, workplaces, positions, or job titles, was relatively straightforward. However, in a small sample focused on a specific professional group within a particular region, individuals could be identified through other means. For example, a participant might mention that they a) work in a sector that is small regionally, b) operate in a specific area, and c) collaborate closely with a particular company. In such cases, the individual might be identifiable to those familiar with the region and industry. Therefore, the anonymisation process must consider how various details affect the research subject’s identifiability.
In our INVEST4EXCELLENCE study, the anonymisation process had two stages. First, one researcher designed and implemented the anonymisation. Then, another researcher, familiar with the field studied but not involved in data collection, reviewed the dataset to check for any remaining identifiable information.
In addition to general anonymisation guidelines, international open data repositories may have specific anonymisation guidelines, which should be considered during the planning process.
Choosing a data repository and licencing the data
There are many national and international repositories available for sharing research data. When choosing a repository, it’s important to consider the following:
- The repository is well-known and trusted by researchers in your field
- The repository assigns a permanent identifier (e.g. DOI, URN, or Signum) to the dataset
- The repository has a reliability certification
- The repository allows you to select terms for licensing for how the data can be reused
In the INVEST4EXCELLENCE project, we chose to open our research data using the Zenodo repository, which is an international archive for the outputs of EU-funded projects. Other repositories can be found, for example, in the registry of research data repositories, re3data. Karelia University of Applied Sciences has also an organisational database for long-term storage of research data. This database is suitable for datasets that cannot be published openly but can be accessed with special permission. However, it is generally recommended to publish metadata about the dataset even if the data itself cannot be opened.
At this stage, it is also defined how your shared data can be processed further and under what conditions it can be reused. These usage rights are typically defined using Creative Commons 4.0 licenses (CC licenses). In open science guidelines, the recommended license type is CC-BY-4.0. This license allows the data to be freely modified and shared, as long as the source is credited, and any modifications are indicated. This license also permits commercial use. For university researchers, it is worth noting that continuing education services are often considered commercial use, so allowing commercial use through this license type may be appropriate.
Publishing the Data
Once the data has been licensed, the data is ready to be published in a repository. In the INVEST4EXCELLENCE study, we used Zenodo, which provides a permanent identifier (DOI) for the data, making it easier to find and cite for other researchers. This permanent identifier allowed us to link the data as supplementary material to our research publication. Additionally, it enabled us to directly cite the parts of the data that we wanted to highlight in the study.
By following these steps, researchers can navigate the complexities of opening research data and contribute to the broader goals of open science.
The INVEST4EXCELLENCE project is a research-focused project of the INVEST university alliance that is funded by the Horizon2020 (SwafS) funding program. Karelia University of Applied Sciences has been particularly responsible for providing strategic guidance for the joint planning of RDI activities and developing tools to support the activities of INVEST and strengthen the competence of the staff. The INVEST university alliance is a network of seven European higher education institutions that are guided by a common vision towards a more sustainable and responsible Europe.
References
Creative Commons. CC BY 4.0 Attribution 4.0 international. https://creativecommons.org/licenses/by/4.0/deed.en Accessed 13.09.2024.
Creative Commons Suomi. https://creativecommons.fi/. Accessed 13.09.2024.
DMPTuuli. Data Management Plan. https://www.dmptuuli.fi/. Accessed 13.09.2024.
Fairdata.fi. FAIR Principles. https://www.fairdata.fi/en/about-fairdata/fair-principles/. Accessed 13.09.2024.
Finnish Social Science Data Archive. Anonymisation and Personal Data. https://www.fsd.tuni.fi/en/services/data-management-guidelines/anonymisation-and-identifiers/. Accessed 13.09.2024.
INVEST. Innovations of Regional Sustainability: European University Alliance. https://www.karelia.fi/invest/. Accessed 13.09.2024.
INVEST4EXCELLENCE (2024). Horizon2020 (SwafS) project 101035815. https://www.invest4excellence.eu/. Accessed 13.09.2024.
Karelia University of Applied Sciences. Opas avoimeen TKI-toimintaan Karelia-ammattikorkeakoulussa. https://libguides.karelia.fi/c.php?g=670780&p=4762766. Accessed 13.09.2024.
Office of Data Protection in Finland. What is Personal Data? https://tietosuoja.fi/en/what-is-personal-data. Accessed 13.09.2024.
Re3data.org. Registry of research data repositories. https://www.re3data.org/. Accessed 13.09.2024.
Research Council of Finland. Data Management Plan: Planning data management. https://www.aka.fi/en/research-funding/apply-for-funding/how-to-apply-for-funding/az-index-of-application-guidelines2/data-management-plan/data-management-plan/. Accessed 13.09.2024.
Tiedejatutkimus.fi. Search for Information on Research in Finland. https://research.fi/en/. Accessed 13.09.2024.
Zenodo. https://zenodo.org/. Accessed 13.09.2024.