Context Navigation

Changes between Version 2 and Version 3 of ImplementationConceptOffline

Timestamp:: May 18, 2011, 2:30:28 PM (15 years ago)
Author:: lnerger
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ImplementationConceptOffline

-              v2
+              v3
 In the forecast phase the user has to run the numerical model as many times as there are ensemble members. Each model forecast has to be initialized by the state fields of the ensemble member. At the end of each single forecast integration the forecast fields are written into regular output files of the model. Within the offline mode of PDAF, we leave it to the user to control the integrations. As these are regular model integrations, it should be easiest to use the regular scripts used also to perform a free model integration without data assimilation. However, the user has to take care that the output files from each ensemble member are stored separately.
 The assimilation program is a simplified implementation of what is required for the online mode. In particular no explicit linkage to the model code and no forecast phase are required. The general structure of the program for the offline assimilation program is depicted in figure 1.
+The assimilation program is a simplified implementation of what is required when PDAF is used in its online mode. In particular, no explicit linkage to the model code and no forecast phase are required. The general structure of the program for the offline assimilation program is depicted in figure 1.
 [[Image(//pics/da_extension.png)]]
 [[BR]]'''Figure 1:''' (left) Generic structure of a model code, (right) extension for data assimilation with PDAF
+[[Image(//pics/PDAF_offline.png)]]
+[[BR]]'''Figure 1:''' Structure of the assimilation program with PDAF in offline mode.
+The right hand side of Figure 1 shows the extensions required for the assimilation system (marked yellow):
+ * Close to the start of the model code the routine `init_parallel_pdaf` as added to the code. If the model itself is parallelized the correct location is directly after the initialization of the parallelization in the model code. `init_parallel_pdaf` creates the parallel environment that allows to perform several time stepping loops at the same time.
+ * After the initialization part of the model, a routines `init_pdaf` is added. In this routine, parameters for PDAF can be defined and then the core initialization routine PDAF_init is called. This core routine also initializes the array of ensemble states.
+ * In order to allow for the integration of the state ensemble an unconditional loop is added around the time stepping loop of the model. This will allow to compute the time stepping loop multiple time during the model integration. PDAF provide an exit-flag for this loop. (There are some conditions, under which this external loop is not required. Some note on this are given further below.)
+ * Inside the external loop the PDAF core routine `PDAF_get_state` is added to the code. This routine initializes model fields form the array of ensemble states and initialized the number of time step that have to be computed and ensured that the ensemble integration is performed correctly.
+ * At the end of the external loop, the PDAF core routine `PDAF_put_state` is added to the model code. This routine write the propagated model fields back into a state vector of the ensemble array. Also it checks whether the ensemble integration is complete. If not, the next ensemble member will be integrated. If the ensemble integration is complete, the analysis step (i.e. the actual assimilation of the observations) is computed.
+With the implementation strategy of PDAF, four routines and the external loop have to be added to the model code. While this looks like a large change in figure 1, this change does actually only affect the general part of the model code. In addition, the amount of source code of the numerical model will be much longer than the addition for the data assimilation system.
+The structure of the assimilation program is the following:
+ * At the beginning of the program, the routine `init_parallel_pdaf` is executed. `init_parallel_pdaf` creates the parallel environment for PDAF. In the offline mode, it is possible to execute the assimilation program on a single processor, even if the model is parallelized.
+ * Subsequently, a routine `init_pdaf` is executed. In this routine, parameters for PDAF can be defined and then the core initialization routine PDAF_init is called. This core routine also initializes the array of ensemble states. In case of the offline mode, this means that the ensemble is read from the output files of the model.
+ * Finally, the PDAF core routine `PDAF_put_state` is executed. As for the offline mode there is no ensemble integration in the assimilation program, this subroutine directly computes the analysis step (i.e. the actual assimilation of the observations). In a user-supplied subroutine of `PDAF_put_state`, the ensemble of analysis states is finally written into restart files for the next forecast phase conducted by direct model integrations initialized from these files.
 == Important aspects of the implementation concept ==
+ * The implementation concept of PDAF does not require that the time stepping part of the model is implemented as a subroutine. Instead calls to subroutines that control of the ensemble integration are added to the model code before and after the code parts performing the time stepping. If the time stepping part is implemented as a subroutine, this subroutine can be called in between the additional routines.
+ * Depending on the parallelization, there can be cases in which the model has to jump back in time and cases in which the time always moves forward:
+  * Jumping back in time will be required if the number of model tasks used to evolve the ensemble states is smaller than the number of ensemble members. In this case a model task has integrate more than one model state and will have to jump back in time after the integration of each ensemble member.
+  * If there are as many model tasks as ensemble members, the model time always moves forward. In this case, one can implement PDAF also without the external ensemble loop. That is, one can add calls to `PDAF_get_state` and `PDAF_put_state` directly into the code of the model's time stepping loop. This strategy might also be called for, if a model uses staggered loops (like a loop over minutes inside a loop over hours).
+ * With the offline mode of PDAF, no direct coupling between PDAF and the model code is required. The exchange of information between the model and the assimilation program is performed solely through the output and restart files of the model. It requires that the user implements routines to read the model fields from the forecast files. In addition, routines are necessary that write the analysis state ensemble into restart files of the model.
  * Model-specific operations like the initialization of the array of ensemble states in `PDAF_init` are actually performed by user-supplied routines. These routines are called through the standard interface of `PDAF`. Details on the interface and the required routines are given on the pages describing the implementation steps.
- * The assimilation system is controlled by the user-supplied routines that are called through PDAF.  With this strategy, the assimilation program is essentially driven by the model part of the program. Thus, logically the model is not a sub-component of the assimilation system, but the implementation with PDAF results in a model extended for data assimilation.
  * The user-supplied routines can be implemented analogously to the model code. For example, if the model is implemented using Fortran common blocks or modules of the model code, these can be used to implement the user-supplied routines, too. This simplifies the implementation of the user-supplied routines knowing about the particularities of their model.
+ * With regard to the parallelization, the assimilation program can be run on a single processor, i.e. without parallelization. The variables for the parallelization still have to be initialized by a call to `init_parallel_pdaf`. However, one does not need to compile with an MPI library, but it is sufficient to use the dummy implementation of the MPI-routines that is supplied with PDAF. If the model is parallelized, one need to ensure that the model fields are read correctly. In particular is in the parallel model each process write a separate files, one has to read these files sequentially in order to initialize an ensemble array holding the global state information.
+ * For large-scale models, it can be useful to execute the assimilation program with parallelization. Perhaps, following the domain decomposition of the model is the easiest strategy for this. In this case the decomposition information from the model has to be read into the assimilation program in order to initialize the state dimension of the sub-domains as well as the coordinates for each sub-domain.