1.2 FAIR data
The FAIR data principles applied to Soil data
2023-09-15
Findable - Accessible - Interoperable - Reusable
go-fair.org
Findable
- The first step in (re)using data is to find them.
- Metadata and data should be easy to find for both humans and computers.
- Machine-readable metadata are essential for automatic discovery of datasets and services.
- Persistent identification
Persistent identification
Persistent identification, for continued findability
- Consider that a proper id can outlive a project (or organisation)
- Choice of domain and path (owned, authoritative, neutral, prevent names)
- Set up an identification proxy (doi.org/w3id.org)
Catalogue
- Records are brought into a catalogue, where they can be searched and assessed
- Catalogues can exchange records to increase discoverability
- Catalogues can cross borders between communities by transforming metadata to relevant community standards and protocols
Search engines
- Search engines crawl the content of catalogues
- If a catalogue supports schema.org annotations, the content can also be extracted in a structured way
- Example
Accessibility
- (Meta)data are retrievable by their identifier using a standardised communications protocol
- Metadata are accessible, even when the data are no longer available
Persistence
- Move the resource to a shared environment (backup)
- Consider a URL strategy
- Use a facade identifier (DOI)
Data lifecycle
- Consider upfront when to remove a resource (10 yrs?)
- What happens to the URI of a resource which is archived?
- Metadata should stay available, even if the data are no longer
Repository software
- Webdav (or webserver software)
- Zenodo, Dataverse
- Document Management Systems (DMS)
- Cloud storage (google drive, dropbox, Amazon, Sharepoint)
Interoperable
- (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
- (Meta)data use vocabularies that follow FAIR principles
- (Meta)data include qualified references to other (meta)data
Adopt common vocabularies
Adopting a standardised model enables aggregation of data.
- Relational models
- UML/GML models
- Semantic web ontologies
Relevant vocabularies
- ISO28258 / INSPIRE / GLOSIS Web Ontology
- Agrovoc
- WRB / FAO Soil Classification Guidelines
Reusable
(Meta)data are richly described with a plurality of accurate and relevant attributes
(Meta)data are released with a clear and accessible data usage license
(Meta)data are associated with detailed provenance
(Meta)data meet domain-relevant community standards