Research Ideas and Outcomes (Oct 2022)
A portal for indexing distributed FAIR digital objects for catalysis research
Abstract
Read online Read online Read online
A research object (RO) is defined as a semantically rich aggregation of (potentially distributed) resources that provide a layer of structure on top of information delivered as linked data (Bechhofer et al. 2013, Soiland-Reyes et al. 2022). A RO provides a container for the aggregation of resources, produced, and consumed by common services and shareable within and across organisational boundaries. This work sees research digital objects as composites which may consist of objects hosted in different repositories.In catalysis research, the characterisation of a sample may require analysing experimental data obtained from an instrument, data from a computer model, and/or comparing to data from a specialized database. Additionally, data may need to be reduced and cleaned before analysis, resulting in intermediate data. In this scenario the composite research object is integrated by all these data objects and their corresponding metadata. UK Catalysis Hub (UKCH) researchers perform these tasks as part of their day-to-day work. However, most of the time they need to manually collect, catalogue, and preserve all these data assets.The UKCH aims to support researchers with tools and services for the management and processing of data, through the development of the Catalysis Data Infrastructure (CDI Nieva de la Hidalga et al. 2022a) and the Catalysis Research Workbench (CRW Nieva de la Hidalga et al. 2022b). This work is integrated in the context of the Physical Sciences Data Infrastructure (PSDI Coles and Knight 2022). The PSDI aims to provide a layer that enables transparent access to existing resources whilst ensuring that they remain dedicated to its specific application. The intention is to explore the concept of the composite research digital object and the services required to facilitate both human and programmatic interactions with those objects to browse, review, retrieve, and use digital objects in the context of the research produced by UKCH scientists. The CDI will act as a thematic portal presenting data managed through the PSDI and serve as an example for the development of similar portals targeting specific research domains.In this case, the CDI is in the process of being redesigned with a sematic metadata model. The basic ontologies being considered for this model are: DCAT (Albertoni et al. 2022) will encode the metadata of digital objects; PROV-O (Belhajjame et al. 2013) will track the generation of digital objects. SPAR (Peroni and Shotton 2018) to encode publications data; SCHOLIX to encode the links between publications and data objects (Burton et al. 2017); FOAF (Brickley and Miller 2014) to encode researcher information; the Organization Ontology (ORG Reynolds 2014) to encode institution information; EXPO (Soldatova et al. 2006) to encode experiment information; and various domain specific ontologies for adding metadata about experiments, for instance CHEBI (Hastings et al. 2011), CHEMINF (Hastings et al. 2015), and FIX (Chebi-Administrators 2005).The implementation of the CDI using these ontologies will provide a roadmap for the integration of FAIR data object repositories with a service infrastructure which supports reproducibility, reuse of data, reuse of processing tools and implementation of advanced processing tools.The integration of the CDI and CRW with existing and new infrastructures will further support the work of catalysis scientists. In this context, a researcher can access the CDI to look for publications, see if there are data objects linked to them, and then look for processing tools which can be used to reproduce the results. An experiment for an early use case demonstrated the feasibility of reproducing published results using data and metadata linked to existing publications (Nieva de la Hidalga et al. 2022b). In the experiment, papers citing processable data were used to retrieve, process, and reproduce published results with no need for contacting the authors. Fig. 1 presents a view of the experiment performed.The current practices of publishing catalysis research data can be seen as aligned to the FAIR data principles, for instance Fig. 1 above can be also seen as Fig. 2Reproducing results required several human-centered activities, partly due to the encoding of the metadata as text documents. The challenge is to accelerate and automate these processes. It is important to highlight the role of cataloguing interfaces, such as the CDI, containing DO crates with only metadata and links to the different data assests that constitute the composite digital objects. The users of these interfaces will in turn rely on transparent services which do not require them to manually track the location and formats of the data assets they want to retrieve and use.
Keywords