FAIR metadata and data in research projects
Brief instructions for FAIR metadata and data
- Researchers who apply for and/or receive funding.
- Data stewards who assist with the grant application and/or the project or advise the researcher.
- During the preparation of the project (during the grant application or during the writing of a DMP (after awarding the application)).
- During the realisation of the project.
In which situations?
In the case of extra instructions from programmes (and calls for grant applications) for the production of FAIR metadata and data.
What does it cover?
- General explanation about how ZonMw allows the production of FAIR metadata and data.
- Specific instructions for the applicant or recipient of funding in the call for grant applications.
General explanation about ZonMw's approach for FAIR metadata and data
In the background information it is stated that in a growing number of programmes, ZonMw encourages the application of FAIR principles during data management. Below we provide more information about FAIR metadata in connection with ZonMw's instructions for this in its calls for grant applications.
FAIR in brief
By applying FAIR principles, you make your data (or other sources for research) Findable, Accessible, Interoperable and Reusable. The outcome is that data are findable, and can be understood and used by people and computers. GO FAIR Foundation (GFF) explains the process of making data FAIR in the three-point-FAIRification framework’ (3-PFF).
Metadata schemes for ZonMw projects
Here we consider metadata in greater detail. Amongst other things, metadata describes where the data can be found, the context in which it was produced, the conditions for obtaining access to it and the information needed to be able to use it.
FAIR metadata means that the metadata can be read by a computer, in other words it is machine-readable’ or ‘machine-actionable’. Here we therefore talk about ‘metadata-for-machines’, abbreviated to M4M. Researchers and/or data stewards make metadata that describe the project and the data files (and any other possible sources for the research), that are used (‘used assets’) and generated (‘produced assets’) in it. Metadata as recorded in metadata schemes.
ZonMw works with three main subjects each of which has a separate metadata scheme. An overview of the schemes can be found on the M4M resource page of GO FAIR Foundation (GFF). For the user such a scheme functions as a sort of questionnaire. The answers are in effect the metadata about the data (or other source for research) concerned.
The three metadata schemes for describing the data in a ZonMw project concern:
- M4M Project Admin: the metadata about the project in which the data are used (whether that is existing data or newly produced data). It mainly concerns administrative information about the project and the data producer.
- M4M Project Content: substantive information about the project (research domain, themes, target groups, setting, etc.).
- M4M Dataset (Catalog, Form, Distribution): information about the actual datasets such as the subjects the data is about and the variables and units in the dataset. But also information about the place where the dataset is stored, the conditions for access to the data, etcetera.
Generic and domain-specific
The metadata schemes M4M Project Admin and M4M Dataset (catalogue, form and distribution) are generic, in other words the metadata elements (the ‘questions’) are the same for every project and every dataset. The metadata scheme M4M Project Content is domain-specific, because researchers and data stewards in the same research domain and/or consortium have made agreements together about the metadata elements with which they want to describe their datasets.
For example, ZonMw has commissioned schemes for the subjects COVID-19, infectious diseases and antimicrobial resistance. On the M4M page there are also schemes about other subjects produced at the initiative of other organisations. The metadata schemes are open and can be used by everybody. If necessary, schemes can be expanded.
It is helpful to know a few technical characteristics of these M4M metadata schemes, namely:
- The metadata schemes refer to each other so that it is always clear that the information about this dataset belongs to the information about the project.
- The metadata schemes and the answers of the user are readable for people in the questionnaire. ‘Under the bonnet’ of the form, the information is in computer language and ‘persistent identifiers’ are linked to the information so that for the computer there can also be no misunderstanding about what people (the user of the metadata scheme) mean.
- Schemes work as much as possible with standardised control terms , the so-called controlled vocabularies. An extensive overview of these can be found on BioPortal. For the answers to a question in the questionnaire (metadata scheme) a list of terms appears that the user can choose from. That way the answers from all users can be properly compared and via a code (persistent identifier) are also understandable for the computer . An example of a standardised list (i.e. controlled vocabulary) is SNOMED to document and code medical data.
Readable for both computers and people
The computer can find the metadata produced with these schemes, understand them and analyse them. So that people who are not at home in computer science can also do this, Health-RI has produced a catalogue in which all metadata are published that are or will be included in the COVID-19 programme: the COVID-19 data portal of Health-RI.
In 2023, Health-RI is working on the expansion of the data portal so that metadata for other subjects in the health domain can also be made findable and usable.
Access to data
Via the data portal, a request can be sent to the data producer for permission to use the data. Eventually, it will also be possible (under certain conditions) to directly obtain access to the data via the portal. With the help of metadata, the data producer can precisely define who (or which algorithm) may or may not obtain access to the data. At the very least this is important for privacy-sensitive data. Metadata, in principle, contain no sensitive information and can therefore be made public.
Expansion of the approach
In 2020, ZonMw, together with GFF and Health-RI, started to develop the M4M metadata schemes for the COVID-19 programme and the introduction of these into the funding procedure. In 2021, metadata schemes were also developed for infectious diseases and antimicrobial resistance (including metadata for describing biobanks). Based on the experiences with these and further developments by GFF and Health-RI, the approach will be applied to new programmes and metadata schemes will be developed from all subjects. Other organisations and research funding agencies are also taking the initiative to develop FAIR and domain-specific metadata schemes.
Specific instructions for FAIR metadata and data in a call for grant applications
Explanation for the grant applicant
In the call for grant applications, it is stated whether it is required to produce FAIR (meta)data. The activities for this are in addition to the activities for regular data management. Grant applicants can prepare for this as follows:
- Determine whether the data steward has experience with FAIR (meta)data, has attended an M4M workshop and/or has the time to learn more about this subject. If the programme organises a FAIR data-workshops, the data steward must participate. The workshops include a training component.
- Determine whether metadata schemes for the discipline concerned already exist. Also, determine which standards for recording data are now commonplace in the discipline. This information is required as input for a possible expansion of existing metadata schemes or for the development of new ones.
- Reserve sufficient budget so that (1) the data steward and researcher can participate in activities (e.g. workshops) for the development of metadata schemes and (2) the data steward can help with filling these in. For this aspect and the 'standard' data management, about 3-5% of the project budget should be allowed.
Extra step toward FAIR data
ZonMw’s FAIR metadata instructions initially concern the production of FAIR metadata that describe the context of the project and the dataset(s) used in the project. A next step in FAIRification could be for example to produce metadata for variables and units in the data file. In a call for grant applications, ZonMw will specify the steps in data FAIRification that are needed.
Explanation for the project leader
If the project is awarded funding, then the project leader incorporates the planned activities and approach for FAIR (meta)data in the DMP, whether or not with extra instructions in the award letter.
ZonMw informs the project leaders about any meetings that will be held. Once the metadata schemes are available, then these must be completed (as a questionnaire) for the project. The project leader must report about the outcome of data FAIRification in the list of key items.
Approach and time estimation
In several ZonMw programmes, FAIR metadata schemes (M4M) have been or will be developed that project leaders must use to describe their projects, datasets and/or other sources for research. Examples are the M4M schemes for COVID-19, infectious diseases, and antimicrobial resistance. These can be found on the M4M resource page. In the guidelines on this page, you can read more about the approach for filling in M4M schemes and estimating the time required.
More information about approaches and resources for making data and metadata FAIR can also be found in the FAIR Cookbook.