Bringing discipline specificity to a generic research data management workshop concept
Justine Vandendorpe 1Birte Lindstädt 1
1 ZB MED – Information Centre for Life Sciences, Cologne, Germany
Abstract
Research funders have increasing requirements for research data management (RDM). Given that research data (RD) is generated in a discipline-specific context, RDM must follow suit. General-purpose RD centres have thus turned to specialist institutions or libraries such as ZB MED to bring discipline-specific content to their general training offer. As a result, we developed a training workshop on RDM in (bio-)medicine for researchers to be conducted jointly with RD centres from German universities with a medical faculty. RD centres cover general aspects of RDM and introduce their local infrastructure and services; ZB MED complements this with (bio-)medicine-specific aspects of RDM, introduces ZB MED services and presents results from the national research data infrastructure (NFDI) consortia we are part of. In this context, open educational resources (OERs) were created to support researchers in handling their data. The details of the workshop outlined in the article and the free licensing enable transferability to other disciplines and NFDI consortia.
Keywords
research data management, RDM, training, national research data infrastructure, NFDI, bio-medicine, medicine, specialist libraries, engagement methods
Introduction
ZB MED – Information Centre for Life Sciences is an infrastructure and research centre for information and data in the life sciences. ZB MED is also the world’s largest library in medicine, health, environment, nutrition and agriculture. ZB MED aims at ensuring national provision of information and literature in these fields for the purpose of practical applications, teaching and research. Researchers at ZB MED conduct applied research (e.g. on semantic technologies, information management [1], and Omics data Analytics [2]) to improve ZB MED’s services and provide support in the life sciences. Last but not least, ZB MED also fosters open access and open data (see e.g. ZB MED Research Data Policy [3], in German only).
Research funders have increasing requirements on research data management (RDM). For instance, Horizon Europe now requires beneficiaries to establish a data management plan (DMP) and to regularly update it [4]. Biernacka et al. identified a gap between funder requirements and the knowledge of researchers [5]. Although generic training materials on RDM can be found (e.g. [6], [7]), discipline-specific ones are more difficult to come across. Yet, discipline-specific training materials are essential as the context in which research data is generated is discipline-specific and, in some cases, varies greatly. For instance, personal health data collection requires informed consent, their access has to be monitored and restricted where necessary, and their analysis can require mobile algorithms (i.e. algorithms that are sent over the internet to access and analyse a remote data set (data set that does not move, virtually or otherwise) [8]).
After giving requested training workshops on RDM in (bio-)medicine at three institutions, ZB MED decided to develop a concept that would be suitable as an offer for German universities with a faculty of medicine, health sciences, public health, or similar. These research fields were selected because of the demand, and because of ZB MED involvement in the National Research Data Infrastructure for Personal Health Data (NFDI4Health) which was being launched at the time of concept development. The aim of the workshop is to support (bio-)medical researchers with training opportunities tailored to their local RDM infrastructure and services. We promoted our concept by email to 35 RD centres. This effort resulted in 17 institutions collaborating with us. As a result, 16 workshops and three lectures were held with these institutions (see e.g. [9]), with a further four workshops and one lecture in preparation (as of August 2023).
Creating a (bio-)medicine-specific RDM training workshop
Developing the underlying concept
To develop our concept, we collaborated with RD centres from German universities with a faculty of medicine, health sciences, public health, or similar (referred to as partners below). Collaborating with such partners enables us to reach researchers and tailor our workshop to the RDM infrastructure and services of their institution.
We contact potential partners by email and then have a first online meeting following a protocol template to ensure we cover all the main topics (i.e. concept, number of participants, registrations, platform for online training, moderation, slides, engagement methods, and feedback). We also contact established partners by email to offer further workshops.
Workshop design
Steps in designing the workshop concept
To develop our workshop concept, we first gathered existing materials from previous workshops and from presentations on specific topics (e.g. on electronic lab notebooks (ELNs)). To adjust these materials, we then determined our target group. We selected the target group at the intersection between those of ZB MED and NFDI4Health, i.e. all-qualification-level (bio-)medical researchers. We then wrote an outline, based on which we created freely reusable materials: a registration template (detailed below), a teaching script (see Appendix A (Attachment 1 [Att. 1])) and slides [10]. As suggested by Biernacka et al. [5], we included diverse activation methods to increase active participation. To test our concept, we gave a first pilot training workshop after which we gathered feedback. The feedback received after this workshop and subsequent ones is summarised in the section Prospects. Based on this feedback, as well as on researchers’ needs and wishes, we regularly update our teaching materials and rethink our strategy.
During the workshop, partners cover general aspects of RDM and introduce their local infrastructure and services; we complement this with (bio-)medicine-specific aspects of RDM, introduce ZB MED services and present results from the National Research Data Infrastructure (NFDI) consortia we are part of: NFDI for Personal Health Data (NFDI4Health), NFDI for Microbiota Research (NFDI4Microbiota) and FAIR Data Infrastructure for Agrosystems (FAIRagro) (see Figure 1 [Fig. 1]). As FAIRagro has just been launched in March 2023, results from this consortium will only be presented in the future and for a different target group.
Figure 1: Layers of our training workshop
Outline
The workshop starts with welcoming the participants, introducing ourselves, and explaining the practical details.
We then continue with the fundamental concepts of RDM and associated NFDI4Health results: research data, research data management (RDM), metadata & metadata standards (e.g. NFDI4Health Metadata Schema [11]), FAIR data principles, good scientific practice and policies & guidelines on managing research data (e.g. NFDI4Health Publication Policy [12]).
This part is then followed by the different steps of the research data life cycle (Figure 2 [Fig. 2]) and associated results from NFDI4Health and NFDI4Microbiota, along with ZB MED services. The majority of ZB MED services we present are offered by PUBLISSO, which is the open access publishing portal for life sciences run by ZB MED. For the planning step, we introduce PUBLISSO – RDMO4Life (https://rdmo.publisso.de/), a dedicated version of the Research Data Management Organiser (RDMO) for all research institutions that work in the field of life sciences. For the data collection step, we present the NFDI4Microbiota collection of standard operating procedures (SOPs) and we show PUBLISSO – ELN video tutorials that highlight the benefits, usability and workflows of ELNs [13]. We also present two helpful resources for institutions and researchers that are in the process of selecting and implementing the best ELN for their institution: the PUBLISSO – ELN Guide [14] and the ELN Finder (https://eln-finder.ulb.tu-darmstadt.de/home) (the latter being developed together with the University and State Library Darmstadt). For the data sharing & publishing step, we present the NFDI4Health concept for accessing personal health data. We also introduce PUBLISSO – DOI Service, by which Digital Object Identifiers (DOIs) can be assigned to researchers’ output, and PUBLISSO – Repository for Life Sciences (https://repository.publisso.de/), which publishes researchers’ output. For the data preservation step, we introduce PUBLISSO – Digital Preservation Service (https://www.publisso.de/en/digital-preservation), which archives researchers’ output, ideally forever. Finally, for the data discovery & reuse step, we introduce the Study Hub NFDI4Health COVID-19 (https://covid19.studyhub.nfdi4health.de/) and LIVIVO – Search Portal for Life Sciences (https://www.livivo.de/app), which allow researchers to discover and reuse text publications and datasets.
Figure 2. The research data life cycle and associated ZB MED services
We then close the core content of the workshop by talking about legal issues and associated NFDI4Health results: sensitive personal data (e.g. NFDI4Health data access including DataShield [15], Personal Health Train clients [16], and licences (e.g. NFDI4Health Overview CC Licences [17]).
The workshop ends with a Q&A session.
The workshop takes one full day (09:00 to 16:30) with one lunch break and two coffee breaks, two half days (09:00 to 12:30) with two coffee breaks, or four times two hours (09:00 to 11:00) with one coffee break. Most participants prefer the last two options, and we also noticed they engage more when the workshop is split.
Online concept
Because of the COVID-19 pandemic, we developed our workshop concept for an online environment. After the pandemic, we continued with online workshops as trainers find them more flexible and participants prefer them and find them more accessible. In this online environment, we focus on engagement methods that can be used anonymously (e.g. discussions on online whiteboards, instant polls) as we noticed they involve participants more successfully.
Practical implementation of the training workshop
Number of participants
The group is kept small (up to 30 people) to ensure successful learning outcomes with the participants [5]. If there are less than 10 registrants, we postpone the workshop, as it requires a lot of preparation. If there are more than 30 registrants, we convert the workshop into a two-hour Introductory Lecture on RDM, ending with an offer of subsequent workshops on specific topics.
Slides
We provide partners with an online neutral slide template and encourage every trainer to directly work in the template to maintain a certain homogeneity. We also provide slides from past workshops and our own full slide deck [10] as examples. The slides are shared with the participants with a CC BY licence after the workshop.
Shared folder
To best prepare for the workshop, we share the following documents with partners in a shared folder of a cloud service: an outline, a teaching script (see Appendix A (Attachment 1 [Att. 1])), a registration template, a slide template, examples of past slide decks, and a list of engagement methods and instant polls for inspiration.
Interaction with researchers
Registration
We leave the responsibility of registration and promotion to partners, but we provide them with a registration template. This template includes questions whose answers allow us to adapt the workshop to the audience:
- What is your field of research? [Free-form answer]
- What is your career stage?
- Ph.D. candidate
- Postdoctoral researcher
- Junior professor
- Professor
- Director of an institute
- Other (please specify)
- Do you generate data in a wet lab? [Yes / No]
- Do you handle big data? [Yes / No]
- Do you handle sensitive data? [Yes / No]
- Do you write scripts, code, and/or software? [Yes / No]
- What are your expectations about the workshop? [Free-form answer]
- Please write any questions you might have about research data management.
At the beginning of the workshop, we reflect on the participants’ expectations and, for topics we are not covering, we refer them to relevant resources. We also adjust our teaching materials regularly based on expectations and questions that often arise. For instance, we added a section on Data Organisation as it was often requested.
Engagement methods
During the workshop, we include engagement methods such as Q&As, instant polls, discussions, videos, exercises, or word clouds.
For Q&A, we give participants the opportunity to ask questions before and during the workshop, and we also have a dedicated Q&A session at the end of each day.
Instant polls (see Appendix B (Attachment 2 [Att. 2])) can serve several purposes. They can act as support for the content of the workshop, for instance by asking participants to share their opinion or experience (e.g. “What data formats do you use the most?”). Polls can also be used as comprehensive checks, i.e. to test the audience's understanding (e.g. “FAIR data have to be open” [Yes / No]). Finally, polls can help in getting feedback (e.g. “Which section was the most understandable?”).
Discussions can take place with the whole group or in breakout rooms, as well as verbally or on online whiteboards. During our workshop, we typically discuss good scientific practice and privacy issues using online whiteboards. These discussions are initiated by raising questions. For more details, see our slide deck [9].
Playing a video during a workshop can also serve several purposes: to help hit some points, to drive a discussion, to tell stories, to re-engage or test the audience. When playing a video, it is good to express the objective right before and discuss it after. During our workshop, we typically show videos about DMPs [18] and ELNs [13], mainly to hit some specific points.
The exercises we include are mainly derived from Engelhardt et al. [6], but we have also developed our own. In Table 1 [Tab. 1] an exercise on metadata we created is presented as an example.
Table 1: Description of the exercise on metadata
Post-workshop questionnaire
We agree on how to collect feedback during the first meeting with the partner. As the audience is composed of researchers from the partner’s institution, we encourage them to use their own feedback system. If they do not have one, we use an instant poll (see Appendix B (Attachment 2 [Att. 2])). As a more detailed follow-up of the instant poll, we also developed a feedback questionnaire with an online survey tool, broadly based on the survey from our partner Friedrich Schiller University Jena.
Training staff
We do not have staff dedicated only to the workshops, but rather members of different teams and NFDI consortia (“Data Stewards”) collaborating on them. These people have backgrounds in information science and/or specific disciplines (e.g. biology, geography). Moreover, our staff takes part in train-the-trainer workshops such as the ones offered by fdm.nrw [5] or The Carpentries (https://carpentries.org/become-instructor/), which is itself based on Deans for Impact 2015 [19].
Timeline
For first-time cooperation, we start preparing the workshop six to twelve months in advance. This time is required to tailor the workshop to both the partner’s institution (based on their RDM infrastructure and services) and the audience (according to their fields of research and answer to the registration questions). Six to twelve months prior to the workshop, we thus meet the partner for the first time. During this meeting, we find (a) date(s) for the workshop and agree on multiple points: the outline, the distribution of the sections and time allocation, the moderation, when the slides should be ready, engagement methods, and how to collect feedback. About two weeks before the workshop, we close the registration and check whether we have enough participants. We also have a second meeting with the partner to discuss the workshop in further detail. Seven to ten days before the workshop, the partner sends the practical details to the registrants.
For subsequent workshops with the same partner, the preparation is shortened. Potential participants must still be notified about six weeks in advance to be available for the workshop.
After the workshop, poll reports and feedback results are anonymised and uploaded to the shared folder. Poll results and online whiteboard figures are added to the slides, slides are merged into a single PDF, and the resulting slides are uploaded to the shared folder and shared with the participants by the partner.
Prospects
Our concept not only brings discipline specificity to RDM, but is also tailored to participants’ institutions by collaborating with their local RD centres which can introduce local infrastructure and services. Our concept is rather well accepted, both by partners and participants. Seven partners renewed their collaboration with us for a second joint training workshop.
With regard to participants, an average of 73.48% of the registrants attend the workshops, even though attendance is not mandatory. We have already trained over 500 researchers. As we encourage partners to use their own feedback survey, it is difficult to statistically compare the feedback results. Additionally, our own feedback poll (see Appendix B (Attachment 2 [Att. 2])) has evolved over time and has recently become a feedback survey. However, some results can be combined, and general trends can be drawn from participants’ feedback.
On a scale from 1 to 5 (with 5 being the highest), participants have an average likelihood of recommending the workshop of 3.9, and they assign it an average helpfulness of 3.49.
Participants enjoy the online format, the concept, the content (particularly how to FAIRify data), the structure, the engagement methods, the numerous resources and tools we present, and having different instructors. Indeed, our workshop is built as a discussion between experts and the participants, rather than an authoritative way of presenting concepts.
On the other hand, participants would prefer to have a longer workshop split over multiple days and with coffee breaks of at least 15 minutes. They would also prefer a shorter section on fundamental concepts, and they wish for improvement on the sections of coding best practices, data repositories and digital preservation. They would also like to see more information on Big Data, more engagement methods, and more examples.
Overall, we think our workshop can be overwhelming, especially if planned as a full day event. We thus plan on reducing the number of concepts we present and, per concept, reducing theory in favour of engagement methods (e.g. worksheets, statement slam, keyword strips [5]). To make sure these engagement methods successfully involve participants, we think it is essential to make them engage at the very beginning of each workshop day.
Based on our own experience and the feedback results, we have started to offer two-hour Introductory Lectures on RDM, followed by specialised workshops on topics chosen by participants. So far, the topics of DMPs, ELNs and reproducible data analysis have been selected by participants. We have also thought about developing “Bring Your Own Data” workshops or a type of flipped classroom: we would record a video on a specific topic, let people watch it on their own, and organise a follow-up discussion or Q&A session on the topic.
We are planning on extending our concept to the subject of agriculture with a different target group and requirements. Indeed, (bio-)medicine and health sciences have their own special requirements that justify their own concept: own and well-used metadata standards and ontologies, less developed data sharing practice in medicine, privacy issues, and big data.
Appendices
Appendix A: Teaching script
Our teaching script can be found as a spreadsheet in Attachment 1 [Att. 1]. The teaching script shows the order of the units, the time expected for each unit, engagement methods and intended learning outcomes (IOLs, broadly based on Engelhardt et al. [6]). The columns “Partner institution” and “ZB MED” indicate the possible speakers, but this is flexible.
Appendix B: Polls
The instant polls used can be found in in Attachment 2 [Att. 2]:
- Icebreaker poll
- Polls on metadata, the FAIR data principles, ELNs, data processing and analysis, data sharing, and digital preservation
- Feedback poll
Notes
ORCIDs of the authors
- Justine Vandendorpe: 0000-0002-9421-8582
- Birte Lindstädt: 0000-0002-8251-1597
Acknowledgements
This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 442326535 and 460129525. We thank Petra Kneib for her help with graphic design.
Competing interests
The authors declare that they have no competing interests.
References
[1] ZB MED – Information Centre for Life Sciences. Research in the department Research Management. ZB MED; [Retrieved 2023 Jul 6]. Available from: https://www.zbmed.de/en/research/research-at-zb-med/research-knowledge-management[2] ZB MED – Information Centre for Life Sciences. Research in the department Data Science and Services. ZB MED; [Retrieved 2023 Jul 6]. Available from: https://www.zbmed.de/en/research/research-at-zb-med/research-data-science-and-services
[3] ZB MED – Information Centre for Life Sciences. Forschungsdatenpolicy von ZB MED – Informationszentrum Lebenswissenschaften. ZB MED; 2020 [Retrieved 2023 Jul 6]. Available from: https://www.zbmed.de/en/about/policies/research-data-policy
[4] Horizon Europe (HORIZON); Euratom Research and Training Programme (EURATOM). General Model Grant Agreement. EIC Accelerator Contract. (HE MGA – Multi & Mono). Version 1.1. 2022 Apr 15. Available from: https://ec.europa.eu/info/funding-tenders/opportunities/docs/2021-2027/common/agr-contr/general-mga_horizon-euratom_en.pdf
[5] Biernacka K, Helbig K, Buchholz P. Adaptable Methods for Training in Research Data Management. Data Science Journal. 2021;20:14. DOI: 10.5334/dsj-2021-014
[6] Engelhardt C, et al. D7.4 How to be FAIR with your data. A teaching and training handbook for higher education institutions (V1.2.1). Zenodo; 2022. DOI: 10.5281/ZENODO.6674301
[7] GitBook Bot, Heller L, datawomanHUB, mcancellieri, Kramer B, Ross-Hellauer T, ilabastida, helenebr, Fernandes P, Tennant J. Open Science Training Handbook. Version 1.1. Zenodo; 2018. DOI: 10.5281/ZENODO.1212538
[8] Organisation for Economic Co-operation and Development (OECD). Why open science is critical to combatting COVID-19. OECD; [Updated 12 May 2020]. Available from: https://www.oecd.org/coronavirus/policy-responses/why-open-science-is-critical-to-combatting-covid-19-cd6ab2f9/
[9] ZB MED – Information Centre for Life Sciences. Research Data Management – Introductory Lecture [Video]. YouTube; 2022 Nov 28. Available from: https://youtu.be/QgZBzfTg37I
[10] Vandendorpe J, Lindstädt B, Shutsko A, Markus K. Online Training Workshop on Research Data Management in (Bio-)Medicine. ZB MED – Information Centre for Life Sciences; 2023. DOI: 10.4126/FRL01-006452660
[11] Abaza H, Shutsko A, Golebiewski M, Klopfenstein SAI, Schmidt CO, Vorisek CN; NFDI4Health; NFDI4Health Task Force COVID-19. Metadata schema of the NFDI4Health and the NFDI4Health Task Force COVID-19 (V3_1). NFDI4Health – Nationale Forschungsdateninfrastruktur für Personenbezogene Gesundheitsdaten; 2023. DOI: 10.4126/FRL01-006450625
[12] Lindstädt B, Shutsko A; ZB MED – Information Centre for Life Sciences; NFDI4Health Consortium; NFDI4Health Task Force COVID-19. Publication Policy of the National Research Data Infrastructure for Personal Health Data (NFDI4Health) and the NFDI4Health Task Force COVID-19. NFDI4Health – Nationale Forschungsdateninfrastruktur für Personenbezogene Gesundheitsdaten; 2022. DOI: 10.4126/FRL01-006431467
[13] ZB MED – Information Centre for Life Sciences. Electronic Lab Notebooks (ELN): Screencast – Infos – und mehr [Video]. YouTube; 2022 Mar 10. Available from: https://www.youtube.com/playlist?list=PLJYlS0FDTMq17tvYMeuI2Ct5XtykRFy0K
[14] ZB MED – Information Centre for Life Sciences, editor. Electronic laboratory notebooks in the context of research data management and good research practice – a guide for the life sciences. Cologne, Germany: PUBLISSO; 2021. DOI: 10.4126/FRL01-006425772
[15] NFDI4Health. Fostering collaborative research environments using DataSHIELD. [Retrieved 2023 Jul 6]. Available from: https://www.nfdi4health.de/en/service/fostering-collaborative-research-environments-using-datashield.html
[16] NFDI4Health. Distributed Privacy-Preserving Data Analysis with the Personal Health Train. [Retrieved 2023 Jul 6]. Available from: https://www.nfdi4health.de/en/service/personal-health-train.html
[17] NFDI4Health. Publish study documents and survey instruments quickly and easily [Flyer]. Available from: https://www.nfdi4health.de/images/PDF/NFDI4Health%20Task%20Force%20COVID-19_flyer_Publish%20study%20documents%20and%20survey%20instruments%20quickly%20and%20easily_DINlang_V2_1.pdf
[18] Research Data Netherlands. The what, why and how of data management planning [Video]. YouTube; 2014 Apr 01. Available from: https://youtu.be/gYDb-GP1CA4
[19] Deans for Impact. The Science of Learning. Austin, TX: Deans for Impact; 2015.
[20] Domingo C, Mejía JE. Long-term immunity against yellow fever in children vaccinated during infancy: a longitudinal cohort study [Dataset]. Version 1.0.1. Zenodo; 2020. DOI: 10.5281/ZENODO.3333025
 
                                                        


