NFDI4Earth Incubator Lab

Brief Description

The Incubator Lab fosters novel data science developments for ESS in dedicated focused projects. The objective of this task is to steer the exploration of new, potentially relevant building blocks to be included in NFDI4Earth and related NFDIs. Examples are tools for  automatic metadata extraction and annotation, semantic mapping and harmonization, machine learning, data fusion, visualization, and interaction. The Incubator Lab also serves as a forum where novel requirements can be formulated and trends presented in terms of a user consultation process. In this way, scouting for new trends and opportunities is achieved. The forum will materialize in annual meetings of NFDI4Earth-Experiment, where both achievements will be presented (e.g. from Lab projects but also from Pilots) and demands will be formulated (e.g. from the participants) which will trigger new ideas and potential projects. The results of the projects as well as the consultation process will be continuously monitored, evaluated and updated, resulting in a living document that describes current and future trends and records their implementation. The measure lead must oversee and monitor that compliance rules concerning the software and infrastructural developments are fulfilled while at the same time innovative blue sky developments should also be encouraged.


If you are interested in current or future incubator projects, please contact the coordination office.
For contact persons of specific projects see descriptions below.

Bucket Sampling for Earth System Data Cubes 

Domain: Geophysics and Geodesy 
Contact: Josefine Umlauft. ScaDS.AI – Center for Scalable Data Analytics and Artificial Intelligence 
Email:   josefine.umlauft@uni-leipzig.de
Cooperators:  Anja Neumann, Enrico Lohmann, Daniel Obraczka, Tobias Jagla. ScaDS.AI  
Duration: 6 months

Sampling remotely sensed time series data for training a Machine Learning model is not trivial due to their inherent characteristics, such as their uneven data distribution in space and time, auto-correlation effects for data points in close spatio-temporal vicinity and their high data volume which needs to be handled in an efficient manner. Based on previously developed basic Machine Learning Tools for remotely sensed data in the form of an Earth System Data Cubes (ESDC), we will introduce and implement a new bucket sampling strategy that accounts for the special characteristics of geo-spatial data. We will develop and provide an open-source software (wrapper).

GitLab
Digitization of analogue records in time (DO IT!)

Domain:: Water Research
Contact: Hannes Müller-Thomy. TU Braunschweig, LWI-Department of Hydrology and River Basin Management
Email:    h.mueller-thomy@tu-braunschweig.de
Duration:
 ​6 months

Thousands of years of analogue records exist in earth system sciences, but are still an unburied treasure in the age of digitization. The digitization of these data needs manpower – but not necessarily with scientific background. Breaking down the digitization to an easy applicable smartphone application enables the involvement of citizen scientists to overcome the manpower bottleneck. This app can be applicable to all kind of analogue data and hence useful for numerous scientific fields.

GitLab
Steam ScrAIber

Domain: Mineralogy, Petrology and Geochemistry
Contact: Artem Leichter. Institute of Mineralogy, Leibniz University Hannover
Email:    leichter@ikg.uni-hannover.de
Cooperators:  Renat Almeev and ​Francois Holtz, ​Institut of Mineralogy, Leibniz University Hannover
Duration:
 ​​6 months

A central obstacle to the widespread use of new methods (e.g., machine learning, artificial intelligence) is of a purely practical nature. Namely, the lack of practical know-how and experience in the use of relevant programming languages, frameworks and libraries. The programming landscape is so diverse that even among computer scientists, depending on the domain, the knowledge of different frameworks and programming languages is unevenly distributed, depending on the tasks at hand. For specialists from other fields (e.g., Earth Science System), for whom programming is primarily a means to an end and domain-specific content must be prioritized, the corresponding know-how for programming is even more sporadically distributed. To address this problem, we propose the use of Large Language Model (LLM) assistance systems. These systems have proven to be very successful in automatic code generation. Combined with a secure environment where automatic code can be executed directly, they allow users to become developers and significantly reduce the programming skills required.

GitLab
Prompt accessed GeoQAmap for Earth Data

Domain: Geophysics, Geodesy
Contact:Yu Feng.  Chair of Cartography and Visual Analytics, Technical University of Munich
Email:    y.feng@tum.de
Cooperators:  Guohui Xiao and  Liqiu Men. Norwegian University of Science and Technology and University of Bergen
Duration:
 ​​6 months

In Earth System Science (ESS), geodata presents notable challenges due to its diverse standards generated by different agencies or individuals, making it difficult to access and query the data in an integrated manner. This heterogeneity requires significant effort from researchers to access, integrate, and effectively utilize the data. Moreover, users, especially beginners, may often encounter difficulties when interacting with the data through SQL or SPARQL queries. To tackle this, the project proposes utilizing Virtual Knowledge Graphs (VKGs) and Large Language Models (LLMs) as a solution. By leveraging VKGs and LLMs, the project aims to develop a system that enables users to access ESS-related data through natural language queries, facilitating integrated access and reducing the complexity of querying certain geo-entities. The project's ultimate goal is to provide researchers in ESS with an efficient and user-friendly approach to accessing and exploring heterogeneous geodata, empowering them to conduct data-driven studies and gain valuable insights for ESS research.

GitLab
 Web-based, user-driven climate impact data extraction

Domain:  Water Research
Contact: Jochen Klar. Potsdam Institute for Climate Impact Research (Leibniz)
Email:   jochen.klar@pik-potsdam.de
Duration: ​​6 months

The ISIMIP Repository of the ISIMIP project holds the world's largest collection of global climate impact model data. However, both the format (NetCDF3) and file sizes represent a major barrier for many users. We here propose to build an innovative web-based service that allows users to extract, process and download subsets of the data. User-defined extraction, chained processing and data interaction through scripts and interactive Jupyter notebooks will largely widen the user base. Users can initiate processing tasks in the cloud and download the resulting files in different formats. The code will be released as open source software and, as the application is not tied to ISIMIP or the ISIMIP conventions, can be adopted for similar archives of NetCDF files.

GitLab

Our team at the Leibniz Universität Hannover

We use cookies
Some of them are essential for the operation of the site, while others help us to improve this site and the user experience (tracking cookies). You can decide for yourself whether you want to allow cookies or not. Please note that if you reject them, you may not be able to use all the functionalities of the site.