= Implementation Concept of the Online Mode = 

{{{
#!html
<div class="wiki-toc">
<h4>Implementation Concept</h4>
<ol>
<li><a href="GeneralImplementationConcept">General Concept</a></li>
<li>Online Mode</li>
<li><a href="ImplementationConceptOffline">Offline Mode</a></li>
</ol>
</div>
}}}

[[PageOutline(2-3,Contents of this page)]]

== Online mode: Attaching PDAF to a model ==

Here we describe the extensions of the model code for the online mode of PDAF. The online mode offers two implementation variants. The first one, called ''fully-parallel'', assumes that you have a sufficient number of processes when running the data assimilation program so that all ensemble states can be propagated concurrently. The parallelism allows for a simplified implementation. The second implementation variant, called ''flexible'', allows to run the assimilation program in a way so that a model task (set of processors running one model integration) can propagate several ensemble states successively. This implementation variant is a bit more complicated, because one has to ensure that the model can jump back in time. 

If the data assimilation can be run with a sufficient number of processors to use the ''fully-parallel'' variant, we recommend to use it. Here, we will first focus on the ''fully-parallel'' variant. More informatio on the 'flexible'' variant is provided further below.

The assimilation system is built by adding subroutine calls to the general part of the model code. In these routines one can define variables for PDAF, use-include the PDAF module and call the PDAF core subroutines. Usually only single lines of subroutine calls are inserted into the model code. As only minimal changes to the model code are required, we refer to this as "attaching" PDAF to the model.

The general concept is depicted in figure 1. The left hand side shows a typical abstract structure of a model. When the program is executed, the typical steps don ein the program are the following:
 1. The model is initialized. Thus arrays for the model fields are allocated and filled with initial fields. Thus, the model grid is build up virtually in the program. 
 2. After the initialization the time stepping loop is performed. Here the model fields are propagated through time.
 3. When the integration of the model fields is completed after a defined number of time steps, various post-processing operations can be performed. This usually includes file output, e.g. writing restart files. Afterwards, the program stops.

[[Image(//pics/DAextension_PDAF3.png)]]
[[BR]]'''Figure 1:''' (left) Generic structure of a model code, (right) modified structure for ''fully-parallel'' data assimilation system with PDAF. The figures assumes that the model is parallelized, such that it initializes its parallelization in the step `initial parallelization`. If the model is not parallelized this step does not exist.


'''Extensions for the fully-parallel assimilation system'''[[BR]]
The center of Figure 1 shows the extensions required for the ''fully-parallel'' assimilation system (marked yellow):
 * **init_parallel_pdaf**: This subroutine is inserted close to the start of the model code. If the model itself is parallelized the correct location is directly after the initialization of the parallelization in the model code. `init_parallel_pdaf` creates the parallel environment that allows to compute the time stepping for several models ("model tasks") at the same time.
 * **init_pdaf**: This subroutine is added after the initialization part of the model, just before the time stepping loop. This this subroutine one defines parameters for PDAF  and then one calls the core initialization routine `PDAF_init`. This core routine also initializes the array of ensemble states using a user-provided call-back routine. Subsequently, the PDAF-core routine `PDAF_init_forecast` is called (in implementations of PDAF before version 3.0, this routine was called `PDAF_get_state`). This routine initializes model fields from the array of ensemble states using a call-back routine. In addition it returns the the number of time steps that have to be computed in the following forecast phase. 
 * **assimilate_pdaf**: This routine is added to the model code at the end of the time stepping loop (usually just before the ''END DO'' in a Fortran program). The routine declares the names of user-supplied subroutines and calls a PDAF-core routine `PDAF3_assimilate`. (In implementations of PDAF before version 3.0, different routine named `PDAFomi_assimilate_X` with, e.g., X=`local`, for local filters are used). This routine has to be called at the end of each time step. It counts whether all time steps of the current forecast phase have been computed. If this is not the case, the program continues integrating the model. If the forecast phase is completed, the analysis step (i.e. the actual assimilation of the observations) is computed. Subsequently, the next forecast phase is initialized by writing the analysis state vector into the model fields and setting the number of time steps in the next forecast phase.

With the implementation strategy of PDAF, calls to four routines are added to the model code. These are usually only single lines of code and the changes only affect the general part of the model code.  

== Important aspects of the implementation concept ==

* The implementation concept of PDAF attaches the data assimilation functionality to the model. With this approach, the data assimilation is finally run analogous to a normal model run, but with additional processors and additional options for the data assimilation.
* The implementation with PDAF does not require that the time stepping part of the model is implemented as a subroutine. Instead calls to subroutines that control of the ensemble integration are added to the model code before and after the code parts performing the time stepping. This minimizes the changes in the model code.
* In the ''fully parallel'' mode described here, we use as many model tasks as ensemble members. Thus, the model always moves forward in time. 
* PDAF uses the concept to 'pull' information at the time when it is needed. All model-specific operations, like the initialization of the array of ensemble states in `PDAF_init`, are performed by user-supplied routines which are called through the standard interface of PDAF as call-back routines. Details on the interface and the required routines are given on the pages describing the implementation steps. The concept of the call-back routines is depicted in Fig. 2.
* The assimilation system is controlled by the user-supplied routines that are called through PDAF as call-back routines.  With this strategy, the assimilative model program is essentially driven by the model part of the program. Thus, the model is not a sub-component of the assimilation system, but the implementation with PDAF results in a model extended for data assimilation.  
* The user-supplied call-back routines can be implemented in the context of the model analogously to the model code. For example, if the model is implemented using Fortran modules (or even common blocks), these can be used to implement the user-supplied routines, too. This simplifies the implementation of the user-supplied routines knowing about the particularities of their model. 

[[Image(//pics/PDAF_callback_online_v3.png)]]
[[BR]]'''Figure 2:''' Use of user-supplied call-back functions in PDAF. The call-back routines are called by PDAF and can use model information that is provided by Fortran modules.




== Parallelization of the data assimilation program ==

PDAF adds the possibility to perform parallel ensemble forecasts, even for models that by themselves do not use parallelization. The structure of the parallelized data assimilation program is displayed in figure 2. In the forecast phase of the data assimilation application, several model state integrations can be performed at the same time by several model tasks. If the numerical model it parallelized by itself, the parallel ensemble forecast adds a second level of parallelization. For the analysis step, in which the filter combines the ensemble of model states with the observations, PDAF provides several parallelized filter algorithms. If the model uses domain decomposition for the parallelization, the same decomposition is typically used in the filter. Before the analysis step, all ensemble members are gathered by the processes that compute the filter analysis. Subsequently to the analysis step, the ensemble members are distributed to all model tasks to enable the next parallel ensemble forecast. These operations are performed within PDAF, so that a user can directly benefit from the second level of parallelization. For the required extension of the parallelization configuration of the model a fully implemented template routine is provided with PDAF. The adaptation of the parallelization is described in the [ImplementationGuide Implementation Guide].

[[Image(//pics/parallelization.png)]]
[[BR]]'''Figure 3:''' Two-level parallelization of PDAF: During the forecast phase several model tasks can be concurrently performed, while each model can be parallelized by itself. In the analysis step one the parallelized filter included in PDAF is applied. 


== The flexible parallelization mode ==

The ''flexible'' parallelization mode allows to run the assimilation program in a way so that a model task (set of processors running one model integration) can propagate several ensemble states successively. This approach allows to use a smaller number of processes compared to the ''fully parallel'. 

Implementing the ''flexible'' mode requires additional changes to the model code. These are shown in Figure 4. In particular an external loop has to be added. This change only affects the general part of the model code. Compared to the newer implementation variant, a combination of calls to `PDAF_get_state` and `put_state_PDAF` (which calls `PDAFomi_put_state_X` or `PDAF_put_state_X` for a specific DA-method 'X') is used. In this case, `put_state_PDAF` is called after the full integration of an ensemble member state. Then `PDAFomi_put_state_X` counts the number of ensemble members for which the forecast is complete. If all members were integrated, the analysis step is executed to compute the assimilation update. This structure does not allow to perform additional operations during the time stepping like apply incremental analysis updates. In contrast the [wiki:ImplementationConceptOnline recommended implementation introduced with PDAF V3.0], performs a call to PDAF at each time step.

The ''flexible'' parallelization requires that the model can jump back in time. Jumping back in time will be required if the number of model tasks used to evolve the ensemble states is smaller than the number of ensemble members. In this case a model task has integrate more than one model state and will have to jump back in time after the integration of each ensemble member.

[[Image(//pics/DAextension_flexible_PDAF2.png)]]
[[BR]]'''Figure 4:'''  Extension for ''flexible'' data assimilation system.

'''Extensions for the flexible assimilation system'''[[BR]]
Figure 4 shows the extensions required for the ''flexible'' assimilation system (marked yellow):
 * `init_parallel_pdaf`: This routine is inserted close to the start of the model code. If the model itself is parallelized the correct location is directly after the initialization of the parallelization in the model code. `init_parallel_pdaf` creates the parallel environment that allows to perform several time stepping loops ("model tasks") at the same time.
 * `init_pdaf`: This routine is added after the initialization part of the model. In `init_pdaf`, parameters for PDAF can be defined and then the core initialization routine `PDAF_init` is called. This core routine also initializes the array of ensemble states.
 * Ensemble loop: In order to allow for the integration of the state ensemble, an unconditional loop is added around the time stepping loop of the model. This loop will allow to compute the time stepping loop multiple times to integrate all ensemble states. PDAF provides an exit-flag for this loop. (This external loop is avoided with the ''fully-parallel'' implementation variant.)
 * `get_state_pdaf`: Inside the ensemble loop, a call to this interface routine is added to the code. In this routine the names of user-supplied routines are declared and the PDAF-core routine `PDAF_get_state` is called. This routine initializes model fields from the array of ensemble states and initializes the number of time steps that have to be computed and ensures that the ensemble integration is performed correctly.
 * `put_state_pdaf`: At the end of the external loop, the call to the interface routine `put_state_pdaf` is added to the model code. The routine declares the names of user-supplied routines and calls a PDAF_core routine that is specific for each filter. E.g., the routine `PDAF_put_state_estkf` is called for the ESTKF. This routine writes the propagated model fields back into a state vector of the ensemble array. Also it checks whether the ensemble integration is complete. If not, the next ensemble member will be integrated. If the ensemble integration is complete, the analysis step (i.e. the actual assimilation of the observations) is computed.