Changes between Version 13 and Version 14 of ImplementationConceptOnline


Ignore:
Timestamp:
May 17, 2025, 12:31:25 PM (19 hours ago)
Author:
lnerger
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ImplementationConceptOnline

    v13 v14  
    1919Here we describe the extensions of the model code for the online mode of PDAF. The online mode offers two implementation variants. The first one, called ''fully-parallel'', assumes that you have a sufficient number of processes when running the data assimilation program so that all ensemble states can be propagated concurrently. The parallelism allows for a simplified implementation. The second implementation variant, called ''flexible'', allows to run the assimilation program in a way so that a model task (set of processors running one model integration) can propagate several ensemble states successively. This implementation variant is a bit more complicated, because one has to ensure that the model can jump back in time.
    2020
    21 If the data assimilation can be run with a sufficient number of processors to use the 'fully-parallel' variant, we recommend to use it. With PDAF Version 1.10 particular interface routines `PDAF_assimilate_X` have been introduced to support this implementation style.
     21If the data assimilation can be run with a sufficient number of processors to use the ''fully-parallel'' variant, we recommend to use it. Here, we will first focus on the ''fully-parallel'' variant. More informatio on the 'flexible'' variant is provided further below.
    2222
    23 The assimilation system is built by adding calls to PDAF-routines to the general part of the model code. As only minimal changes to the model code are required, we refer to this as "attaching" PDAF to the model.
     23The assimilation system is built by adding subroutine calls to the general part of the model code. In these routines one can define variables for PDAF, use-include the PDAF module and call the PDAF core subroutines. Usually only single lines of subroutine calls are inserted into the model code. As only minimal changes to the model code are required, we refer to this as "attaching" PDAF to the model.
    2424
    25 The general concept is depicted in figure 1. The left hand side shows a typical abstract structure of a numerical model. When the program is executed, the following steps are performed:
     25The general concept is depicted in figure 1. The left hand side shows a typical abstract structure of a model. When the program is executed, the typical steps don ein the program are the following:
    2626 1. The model is initialized. Thus arrays for the model fields are allocated and filled with initial fields. Thus, the model grid is build up virtually in the program.
    2727 2. After the initialization the time stepping loop is performed. Here the model fields are propagated through time.
    28  3. When the integration of the model fields is completed after a defined number of time steps, various post-processing operations are performed. Then the program stops.
     28 3. When the integration of the model fields is completed after a defined number of time steps, various post-processing operations can be performed. This usually includes file output, e.g. writing restart files. Afterwards, the program stops.
    2929
    30 [[Image(//pics/da_extension2x.png)]]
    31 [[BR]]'''Figure 1:''' (left) Generic structure of a model code, (center) extension for ''fully-parallel'' data assimilation system with PDAF, (right) extension for ''flexible'' data assimilation system with PDAF.
     30[[Image(//pics/DAextension_PDAF3.png)]]
     31[[BR]]'''Figure 1:''' (left) Generic structure of a model code, (right) modified structure for ''fully-parallel'' data assimilation system with PDAF. The figures assumes that the model is parallelized, such that it initializes its parallelization in the step `initial parallelization`. If the model is not parallelized this step does not exist.
    3232
    3333
    3434'''Extensions for the fully-parallel assimilation system'''[[BR]]
    3535The center of Figure 1 shows the extensions required for the ''fully-parallel'' assimilation system (marked yellow):
    36  * `init_parallel_pdaf`: This routine is inserted close to the start of the model code. If the model itself is parallelized the correct location is directly after the initialization of the parallelization in the model code. `init_parallel_pdaf` creates the parallel environment that allows to perform several time stepping loops ("model tasks") at the same time.
    37  * `init_pdaf`: This routine is added after the initialization part of the model. In `init_pdaf`, parameters for PDAF can be defined and then the core initialization routine `PDAF_init` is called. This core routine also initializes the array of ensemble states. Subsequently, the PDAF-core routines `PDAF_get_state` is called. This routine initializes model fields from the array of ensemble states and initializes the number of time steps that have to be computed and ensures that the ensemble integration is performed correctly.
    38  * `assimilate_pdaf`: This routine is added to the model code at the end of the time stepping loop (just before the ''END DO'' in a Fortran program). The routine declares the names of user-supplied routines and calls a filter-specific PDAF-core routine `PDAF_assimilate_X` (with, e.g., X=estkf, for the ESTKF). This routine has to be called at the end of each time step. It counts whether all time steps of the current forecast phase have been computed. IF this is not the case, the program continues integrating the model. IF the forecast phase is completed, the analysis step (i.e. the actual assimilation of the observations) is computed. Subsequently, the next forecast phase is initialized by writing the analysis state vector into the model fields and setting the number of time steps in the next forecast phase. (Please note: The routines `PDAF_assimilate_X` have been introduced with Version 1.10 of PDAF.)
     36 * **init_parallel_pdaf**: This subroutine is inserted close to the start of the model code. If the model itself is parallelized the correct location is directly after the initialization of the parallelization in the model code. `init_parallel_pdaf` creates the parallel environment that allows to compute the time stepping for several models ("model tasks") at the same time.
     37 * **init_pdaf**: This subroutine is added after the initialization part of the model, just before the time stepping loop. This this subroutine one defines parameters for PDAF  and then one calls the core initialization routine `PDAF_init`. This core routine also initializes the array of ensemble states using a user-provided call-back routine. Subsequently, the PDAF-core routine `PDAF_init_forecast` is called (in implementations of PDAF before version 3.0, this routine was called `PDAF_get_state`). This routine initializes model fields from the array of ensemble states using a call-back routine. In addition it returns the the number of time steps that have to be computed in the following forecast phase.
     38 * **assimilate_pdaf**: This routine is added to the model code at the end of the time stepping loop (usually just before the ''END DO'' in a Fortran program). The routine declares the names of user-supplied subroutines and calls a PDAF-core routine `PDAF3_assimilate`. (in implementations of PDAF before version 3.0, this routine was called `PDAFomi_get_state_X` with, e.g., X=`local`, for local filters). This routine has to be called at the end of each time step. It counts whether all time steps of the current forecast phase have been computed. If this is not the case, the program continues integrating the model. If the forecast phase is completed, the analysis step (i.e. the actual assimilation of the observations) is computed. Subsequently, the next forecast phase is initialized by writing the analysis state vector into the model fields and setting the number of time steps in the next forecast phase.
    3939
    40 
    41 '''Extensions for the flexible assimilation system'''[[BR]]
    42 The right hand side of Figure 1 shows the extensions required for the ''flexible'' assimilation system (marked yellow):
    43  * `init_parallel_pdaf`: This routine is inserted close to the start of the model code. If the model itself is parallelized the correct location is directly after the initialization of the parallelization in the model code. `init_parallel_pdaf` creates the parallel environment that allows to perform several time stepping loops ("model tasks") at the same time.
    44  * `init_pdaf`: This routine is added after the initialization part of the model. In `init_pdaf`, parameters for PDAF can be defined and then the core initialization routine `PDAF_init` is called. This core routine also initializes the array of ensemble states.
    45  * Ensemble loop: In order to allow for the integration of the state ensemble an unconditional loop is added around the time stepping loop of the model. This loop will allow to compute the time stepping loop multiple time during the model integration. PDAF provides an exit-flag for this loop. (This external loop can be avoided with the ''fully-parallel'' implementation variant.)
    46  * `get_state_pdaf`: Inside the ensemble loop, a call to the interface routine is added to the code. In this routine the names of user-supplied routines are declared and the PDAF-core routine `PDAF_get_state` is called. This routine initializes model fields from the array of ensemble states and initializes the number of time steps that have to be computed and ensures that the ensemble integration is performed correctly.
    47  * `put_state_pdaf`: At the end of the external loop, the call to the interface routine `put_state_pdaf` is added to the model code. The routine declares the names of user-supplied routines and calls a PDAF_core routine that is specific for each filter. E.g. for the ESTKF, the routine `PDAF_put_state_estkf` is called. This routine writes the propagated model fields back into a state vector of the ensemble array. Also it checks whether the ensemble integration is complete. If not, the next ensemble member will be integrated. If the ensemble integration is complete, the analysis step (i.e. the actual assimilation of the observations) is computed.
    4840
    4941With the implementation strategy of PDAF, calls to three to four routines have to be added to the model code. In case of the flexible implementation variant also the external loop has to be added. While this looks like a large change in figure 1, this change does actually only affect the general part of the model code. In addition, the amount of source code of the numerical model will be much longer than the addition for the data assimilation system. Please note that the calls to the routines `PDAF_get_state`, `PDAF_put_state_X`, `PDAF_assimilate_X` (X chosen according to the filter) could also be added directly in the model code. However, using the interface routines (`get_state_pdaf` and `put_state_pdaf`) reduces the additions to the model code.
     
    6961[[Image(//pics/parallelization.png)]]
    7062[[BR]]'''Figure 3:''' Two-level parallelization of PDAF: During the forecast phase several model tasks can be concurrently performed, while each model can be parallelized by itself. In the analysis step one the parallelized filter included in PDAF is applied.
     63
     64
     65== The flexible parallelization variant ==
     66
     67
     68[[Image(//pics/da_extension2x.png)]]
     69[[BR]]'''Figure 1:''' (left) Generic structure of a model code, (center) extension for ''fully-parallel'' data assimilation system with PDAF, (right) extension for ''flexible'' data assimilation system with PDAF.
     70
     71'''Extensions for the flexible assimilation system'''[[BR]]
     72The right hand side of Figure 1 shows the extensions required for the ''flexible'' assimilation system (marked yellow):
     73 * `init_parallel_pdaf`: This routine is inserted close to the start of the model code. If the model itself is parallelized the correct location is directly after the initialization of the parallelization in the model code. `init_parallel_pdaf` creates the parallel environment that allows to perform several time stepping loops ("model tasks") at the same time.
     74 * `init_pdaf`: This routine is added after the initialization part of the model. In `init_pdaf`, parameters for PDAF can be defined and then the core initialization routine `PDAF_init` is called. This core routine also initializes the array of ensemble states.
     75 * Ensemble loop: In order to allow for the integration of the state ensemble an unconditional loop is added around the time stepping loop of the model. This loop will allow to compute the time stepping loop multiple time during the model integration. PDAF provides an exit-flag for this loop. (This external loop can be avoided with the ''fully-parallel'' implementation variant.)
     76 * `get_state_pdaf`: Inside the ensemble loop, a call to the interface routine is added to the code. In this routine the names of user-supplied routines are declared and the PDAF-core routine `PDAF_get_state` is called. This routine initializes model fields from the array of ensemble states and initializes the number of time steps that have to be computed and ensures that the ensemble integration is performed correctly.
     77 * `put_state_pdaf`: At the end of the external loop, the call to the interface routine `put_state_pdaf` is added to the model code. The routine declares the names of user-supplied routines and calls a PDAF_core routine that is specific for each filter. E.g. for the ESTKF, the routine `PDAF_put_state_estkf` is called. This routine writes the propagated model fields back into a state vector of the ensemble array. Also it checks whether the ensemble integration is complete. If not, the next ensemble member will be integrated. If the ensemble integration is complete, the analysis step (i.e. the actual assimilation of the observations) is computed.