NFDI4Earth Incubator Lab

Brief Description

The Incubator Lab fosters novel data science developments for ESS in dedicated focused projects. The objective of this task is to steer the exploration of new, potentially relevant building blocks to be included in NFDI4Earth and related NFDIs. Examples are tools for  automatic metadata extraction and annotation, semantic mapping and harmonization, machine learning, data fusion, visualization, and interaction. The Incubator Lab also serves as a forum where novel requirements can be formulated and trends presented in terms of a user consultation process. In this way, scouting for new trends and opportunities is achieved. The forum will materialize in annual meetings of NFDI4Earth-Experiment, where both achievements will be presented (e.g. from Lab projects but also from Pilots) and demands will be formulated (e.g. from the participants) which will trigger new ideas and potential projects. The results of the projects as well as the consultation process will be continuously monitored, evaluated and updated, resulting in a living document that describes current and future trends and records their implementation. The measure lead must oversee and monitor that compliance rules concerning the software and infrastructural developments are fulfilled while at the same time innovative blue sky developments should also be encouraged.


Incubator Projects 2025
In the third round 5 incubators out of 17 submissions were selected and will run between June 2025 - November 2025

ELaborate Particle Analysis from Satellite Observations - EL PASO

Domain: ​Astrophysics 
Contact: Bernhard Haas – ​GFZ German Research Centre for Geosciences, Potsdam 
Email:   bhaas@gfz-potsdam.deI  
Duration: 4 months

We propose a new Python package for processing satellite data of particle measurements in Earth’s magnetosphere. While most particle measurements of the radiation belts are openly available, they still must be processed (which may include time binning, calculation of equatorial pitch angles and invariants, converting flux to phase space density, etc) to be used in publications. Currently, each research group processes data individually without a standard procedure, making published data challenging to reproduce. An open, easy-to-use package would help new students handle the data and reproduce previous results.

Proposal 
PhenoMapping: A participatory visual tool for curation and verification of historical phenological data

Domain: Atmospheric Science, Oceanography and Climate Research
Contact: Yu Feng. TU München,  Lina Hörl.  Bavarian State Archives (GDA)  
Email:    y.feng@tum.de
Duration:
 ​6 months

Phenological data, especially long-term data, is crucial for understanding the impact of climate change. However, due to the lack of digitization, comprehensive phenological data from before the 20th century is highly incomplete, hindering long-term climate and ecological research and leaving this area underrepresented in Earth System Science (ESS). The innovation of this project lies in developing an interactive tool called PhenoMapping, which helps digitize and geocode historical records and matches the priorities of scientists’ demands and historians’ capacity through an online platform. It encourages volunteers’ contributions and also allows public users to explore phenological trends from decades and centuries ago. Using an archive collection from 1856 with 7,000 phenological observations as an example, we can demonstrate the value of this tool. Expected outputs include web-based data visualization and transcription tools, along with the acknowledgement of data contributions to existing phenological databases. The expertise of both teams, GDA in handling historical documents and TUM in geospatial data visualization, will support the project’s goal of bridging historical and modern phenological data for ESS research.

Proposal 
Visual Question-Answering for Thematic Maps

Domain: Geography
Contact: Eftychia Koukouraki. ​ University of Münster  
Email:    eftychia.koukouraki@uni-muenster.de
Duration:
 ​6 months

Visual question-answering (QA) helps users interpret complex visual information, making it easier and faster to gain insights also from maps and geospatial data in a variety of contexts. This project aims to create an open dataset tailored for thematic map-based QA systems, accompanied by a baseline model to demonstrate its usage. By compiling map images annotated with question-answer pairs, the dataset will enable Artificial Intelligence (AI) models to extract and interpret geographic and information from maps. The deliverables will include a curated dataset, a baseline model, documentation, and an evaluation report, all of which will be released under a permissive license to support further research on the topic.

Proposal 
LLM-enabled I-ADOPT Variable Extraction using Semantics

Domain: Atmospheric Science, Oceanography and Climate Research
Contact: Christoph Lorenz,  ​Karlsruhe Institute of Technology
Email:    Christof.Lorenz@kit.edu
Cooperators:  Barbara Magagna (GO FAIR Foundation, Leiden, Netherlands), Arvin Rastegar, Christof
Lorenz, Christian Chwala (Karlsruhe Institute of Technology – Institute for Meteorology and
Climate Research, Garmisch-Partenkirchen, Germany)
Duration:
 ​​6 months

Researchers annotate data with keywords for describing the physical properties that are observed or modeled. For ensuring findability and interoperability of this metadata, the keywords should be machine-readable and adhere to standardized vocabularies or ontologies. The I-ADOPT framework provides guidelines for expressing such keywords in alignment with the FAIR principles; however, transforming commonly used terms into atomic I-ADOPT components remains a highly manual task requiring both semantic and domain expertise. In response, we propose an LLM-based workflow to generate FAIR-compliant descriptions of variables that align with the I-ADOPT Framework.

Proposal 
Mapping Research in Earth System Sciences - MaRESS

Domain:  Physische Geographie
Contact: Marco Otto, ​Technische Universität Berlin
Email:   marco.otto@tu-berlin.de
Duration: ​​6 months

The project aims to develop a web application, MaRESS (Mapping Research in Earth System Sciences), designed to map research data from peer-reviewed literature to help researchers identify thematic research gaps and geographic knowledge voids. MaRESS will support researchers in formulating targeted questions and objectives within Earth System Sciences (ESS), advancing scientific understanding across ESS by providing a structured framework to build specific knowledge bases. Using a modular design, MaRESS will integrate geographic data (“geographic mapping”), an open-access reference management tool for knowledge organization (“semantic mapping”), support for data integration (“data mapping”), and AI-assisted categorization. These components will enhance data accessibility and information management for all research areas within ESS. The software will be portable and deployable using containerization (e.g., Docker or LXC) and will include comprehensive documentation, supporting FAIR data principles and facilitating open access. Initially, MaRESS will be applied to an existing knowledge base on High Mountain Wetlands, with potential for global expansion as additional regions and datasets are incorporated.

Proposal 
Bucket Sampling for Earth System Data Cubes 

Domain: Geophysics and Geodesy 
Contact: Josefine Umlauft. ScaDS.AI – Center for Scalable Data Analytics and Artificial Intelligence 
Email:   josefine.umlauft@uni-leipzig.de
Cooperators:  Anja Neumann, Enrico Lohmann, Daniel Obraczka, Tobias Jagla. ScaDS.AI  
Duration: 6 months

Sampling remotely sensed time series data for training a Machine Learning model is not trivial due to their inherent characteristics, such as their uneven data distribution in space and time, auto-correlation effects for data points in close spatio-temporal vicinity and their high data volume which needs to be handled in an efficient manner. Based on previously developed basic Machine Learning Tools for remotely sensed data in the form of an Earth System Data Cubes (ESDC), we will introduce and implement a new bucket sampling strategy that accounts for the special characteristics of geo-spatial data. We will develop and provide an open-source software (wrapper).

GitLab
Digitization of analogue records in time (DO IT!)

Domain: Water Research
Contact: Hannes Müller-Thomy. TU Braunschweig, LWI-Department of Hydrology and River Basin Management
Email:    h.mueller-thomy@tu-braunschweig.de
Duration:
 ​6 months

Thousands of years of analogue records exist in earth system sciences, but are still an unburied treasure in the age of digitization. The digitization of these data needs manpower – but not necessarily with scientific background. Breaking down the digitization to an easy applicable smartphone application enables the involvement of citizen scientists to overcome the manpower bottleneck. This app can be applicable to all kind of analogue data and hence useful for numerous scientific fields.

GitLab
Steam ScrAIber

Domain: Mineralogy, Petrology and Geochemistry
Contact: Artem Leichter. Institute of Mineralogy, Leibniz University Hannover
Email:    leichter@ikg.uni-hannover.de
Cooperators:  Renat Almeev and ​Francois Holtz, ​Institut of Mineralogy, Leibniz University Hannover
Duration:
 ​​6 months

A central obstacle to the widespread use of new methods (e.g., machine learning, artificial intelligence) is of a purely practical nature. Namely, the lack of practical know-how and experience in the use of relevant programming languages, frameworks and libraries. The programming landscape is so diverse that even among computer scientists, depending on the domain, the knowledge of different frameworks and programming languages is unevenly distributed, depending on the tasks at hand. For specialists from other fields (e.g., Earth Science System), for whom programming is primarily a means to an end and domain-specific content must be prioritized, the corresponding know-how for programming is even more sporadically distributed. To address this problem, we propose the use of Large Language Model (LLM) assistance systems. These systems have proven to be very successful in automatic code generation. Combined with a secure environment where automatic code can be executed directly, they allow users to become developers and significantly reduce the programming skills required.

GitLab
Prompt accessed GeoQAmap for Earth Data

Domain: Geophysics, Geodesy
Contact:Yu Feng.  Chair of Cartography and Visual Analytics, Technical University of Munich
Email:    y.feng@tum.de
Cooperators:  Guohui Xiao and  Liqiu Men. Norwegian University of Science and Technology and University of Bergen
Duration:
 ​​6 months

In Earth System Science (ESS), geodata presents notable challenges due to its diverse standards generated by different agencies or individuals, making it difficult to access and query the data in an integrated manner. This heterogeneity requires significant effort from researchers to access, integrate, and effectively utilize the data. Moreover, users, especially beginners, may often encounter difficulties when interacting with the data through SQL or SPARQL queries. To tackle this, the project proposes utilizing Virtual Knowledge Graphs (VKGs) and Large Language Models (LLMs) as a solution. By leveraging VKGs and LLMs, the project aims to develop a system that enables users to access ESS-related data through natural language queries, facilitating integrated access and reducing the complexity of querying certain geo-entities. The project's ultimate goal is to provide researchers in ESS with an efficient and user-friendly approach to accessing and exploring heterogeneous geodata, empowering them to conduct data-driven studies and gain valuable insights for ESS research.

GitLab
 Web-based, user-driven climate impact data extraction

Domain:  Water Research
Contact: Jochen Klar. Potsdam Institute for Climate Impact Research (Leibniz)
Email:   jochen.klar@pik-potsdam.de
Duration: ​​6 months

The ISIMIP Repository of the ISIMIP project holds the world's largest collection of global climate impact model data. However, both the format (NetCDF3) and file sizes represent a major barrier for many users. We here propose to build an innovative web-based service that allows users to extract, process and download subsets of the data. User-defined extraction, chained processing and data interaction through scripts and interactive Jupyter notebooks will largely widen the user base. Users can initiate processing tasks in the cloud and download the resulting files in different formats. The code will be released as open source software and, as the application is not tied to ISIMIP or the ISIMIP conventions, can be adopted for similar archives of NetCDF files.

GitLab

Our team at the Leibniz Universität Hannover

We use cookies
Some of them are essential for the operation of the site, while others help us to improve this site and the user experience (tracking cookies). You can decide for yourself whether you want to allow cookies or not. Please note that if you reject them, you may not be able to use all the functionalities of the site.