The IG provides a forum for HPC centers and users working towards a simplified re-use and FAIR handling of Earth System Science data on large computing systems. Topics include, but are not limited to, proper metadata annotation and federated data analysis.
There is an ever increasing demand for using High-Performance Computing (HPC) infrastructures for solving geoscientific questions in different domains. Common examples for this can be found e.g. in the weather and climate forecasting and the remote sensing community which need to cope with the generation, management and analysis of a rapidly increasing amount of high-resolution multidimensional data sets. However, the distribution and community-driven reuse of such large data sets that are hosted and produced at HPC facilities following the FAIR principles still poses a considerable challenge.
The overall scope of this interest group is to increase and simplify the re-use of Earth System Science (ESS) data that are hosted and derived on HPC systems. It is aimed to open-up a forum for HPC centers and for users of HPC infrastructure across different task areas of the NFDI4Earth consortium to discuss and find solutions for relevant topics in the FAIRification of ESS data from HPC infrastructure. With the FAIR principles in mind, we can readily define two focus areas for the interest group:
1) “Metadata, Interoperability and Reproducibility”
We will help to devise and harmonize methods for the automatic enrichment of ESS data on HPC systems with a sufficient and standardized set of metadata, following NFDI4Earth and global standards. Metadata schemas will be recommended such that simulation data will be increasingly reproducible (by sufficient description of the execution environment and methodology) and interoperable (by sufficient description of data formats, etc.).
2) “Federated access, Findability and Accessibility”
In order to make practical use of FAIR data, easy access to data storage and analysis facilities in the HPC centres is a must. Working - beyond NFDI4Earth -with national and international initiatives, we aim at making efficient “in-place” ESS data analysis possible by providing access to geographically-distributed computing systems (e.g. via federated identities). Where necessary, data collection from different systems has to be facilitated. Orchestrated, distributed ESS data analysis will be the logical next step.
The participants of this group, being mostly HPC providers or users, see themselves as a bridge to other NFDIs tackling similar issues (e.g. NFDI4Ing TA “Doris”, and prospectively NFDIxCS) as well as common NFDI activities on the topic. The IGs activities and outcomes will be communicated and embedded to Measure 3.3 in the context of Research Data Commons, i.e., in action 1 on “Connecting to NFDI cross-cutting topics and to other NFDI Consortia”.
We will operate the interest group by organizing monthly virtual meetings, discussing the outlined topics in an iterative manner. In addition, physical meetings and workshops will be co-organized with regular NFDI4Earth events (such as the NFDI4Earth conferences, plenary meetings, etc.). Initially, the two focus areas mentioned above will be handled in detail, but additional topics of interest may be included depending on the interest and input of the participants.
Illustrating material & further links
N4E-Konferenz_Block-III_SIG-1_HPC_Earth_Kurtz (slides of 1st N4E conference Nov-2020)
https://doi.org/10.5281/zenodo.6565404 (1. concept paper, May 2022)
Current agenda & next meetings
regular meetings: every second Monday of the month at 9.00 am
HPC-NFDI4Earth-Workshop (Online): Federated and FAIR Data in HPC, November 10, 2022, 2:00 pm - 5:30 pm