wiki:AddFilterAlgorithm

Version 47 (modified by lnerger, 2 weeks ago) (diff)

--

Adding an Assimilation Method to PDAF

PDAF provides an internal interface that makes it easy to add another assimilation method. Here we describe the implementation strategy and internal structure of PDAF valid for version 2.0 and later. In this text, we assume that the reader is already familiar with PDAF to the extend that it is known how PDAF is connected to a model as is described in the Implementation Guide.

The internal structure of PDAF is organized into a generic part providing the infrastructure to perform ensemble forecasts and filter analysis steps. This generic part is independent of the particular filter algorithm and only distinguishes between ensemble based filters (all filters except SEEK) and mode based filters (currently only SEEK, which integrates r modes plus one central model state). The specific routines for a DA method are called through an internal interface.

In PDAF, each DA algorithm consists of 3 mandatory routines plus 2 optional routines. All routines are described below. They are called through the internal interface of PDAF, except for the "PDAF_assimilate" or "PDAF_put_state" routine ('PDAF_assimilate_X' or PDAF_put_state_X where X is the name of the selected DA method), which is directly called in the model code.

PDAF's Internal Interface

Before explaining the filter-specific routines and the calling interface of each routine, we provide an overview of the internal interface routines of PDAF. The structure of the internal interface of PDAF is depicted in Figure 1 (For the filter-specific routines, 'X' is the name of the filter algorithm). Shown are only the routines that are relevant for the implementation of a new filter method grouped by type. To add a filter algorithm, new filter-specific routines (right column of Fig. 1) need to be implemented. These routines are registered in PDAF by modifying the internal interface routines in the middle column of Fig. 1.

/pics/internal_interface_omi.png
Figure 1: Structure of the internal interface of PDAF. There are 4 interface routines (middle column) that connect the generic part with filter-specific routines. For each filter there are 5 filter-specific routines (right column). The three routines marked in blue are called inside the model code, while the routines marked in yellow are internal routines of PDAF.

The separate routines are the following: The routine PDAF_init calls

PDAF_init_filters Interface routine to PDAF_X_init.
PDAF_X_init performs the filter-specific initialization of parameters and calls the user-supplied routine that initializes the initial ensemble of model states.
PDAF_alloc_filters Interface routine to PDAF_X_alloc.
PDAF_X_alloc allocates the filter-specific arrays.
PDAF_options_filters interface routine to PDAF_X_options.
PDAF_X_options is an optional routine. Its purpose is to display an overview of available options for the filter algorithm.

The routine PDAF_print_info only includes the interface to PDAF_X_memtime

  • PDAF_X_memtime displays information on the execution duration of the different parts of the assimilation process as well as information on the amount of allocated memory. This functionality is optional.

The routine PDAF_put_state_X or PDAF_assimilate_X are called directly from the model code. There is a separate routine for each filter, mainly because of the fact that different user-supplied routines may be needed for the analysis step of the filter. However, the operations performed directly in PDAF_put_state_X are widely generic and the filter-specific analysis step is typically implemented as another subroutine. The standard implementation calls

  • PDAF_X_update
    • This routine contains the actual assimilation or analysis step of the filter algorithm.

When PDAF_init is called, the filter algorithm is chosen by its ID number. Internally to PDAF, each filter is identified by a string that is defined in PDAF_init_filters. The interface routines have a very simple structure. In general, they select the filter-specific routine based on the string identifying the filters. When a filter algorithm is added, a line for the corresponding filter-specific routine has to be inserted to each of the interface routines. One can also remove a filter from PDAF by deleting the corresponding lines form the internal interface routines.

Internal dimensions

PDAF internally stores the dimensions of the assimilation system. The dimensions are declared in the Fortran module PDAF_mod_filter. Important are the following dimensions:

dim_p The size of the state vector (with parallelization the size of the local state vector for the current process)
dim_ens The overall size of the ensemble
dim_ens_l If the ensemble integration is distributed over several ensemble tasks, this variable stores the size of the sub-ensemble handled by the current process. (dim_ens_l equals dim_ens if no parallelization or if only a single model task is used.)
rank The maximum rank of the ensemble covariance matrix. In almost all cases, it is dim_ens-1.
dim_eof For mode based filters (currently only SEEK), this is the number of modes used in the state covariance matrix.

Internal arrays

Several internal arrays are allocated when PDAF is initialized. These arrays are decraed in PDAF_mod_filter. hey are allocated in PDAF_X_alloc (see below for details) and remain allocated throughout the assimilation process. For the processes that compute the analysis (those with filterpe=.true.) the following arrays are defined:

Array Dimension Comment
state dim_p State vector. Used in all filters.
eofV dim_p x dim_ens Ensemble array. Used in all filters.
eofU dim_ens-1 x dim_ens-1 (SEEK, SEIK, ESTKF)
dim_ens x dim_ens (ETKF)
Eigenvalue matrix U from P=VUVT (SEEK, SEIK) or transform matrix A (ETKF, ESTKF). Not used in EnKF.
state_inc dim_p state increment vector. Only allocated if incremental analysis updates are used

For the processes that only compute model forecasts but are not involved in the analysis step (i.e. filterpe=.false.), only one array is defined:

Array Dimension Comment
eofV dim_p x dim_ens_l Ensemble array on non-filter processes. Used in all filters.

Filter-specific routines

When a filter algorithm is added, the following filter routines have to be implemented and inserted to each interface routines described above.

  • PDAF_X_init
  • PDAF_X_alloc
  • PDAF_X_options (optional)
  • PDAF_X_memtime (optional)

In addition, the routines

  • PDAF_put_state_X and PDAF_assimilate_X

have to be implemented that are called directly in the model code.

We recommend to base on the routines of an existing filter, as most of the routines can be easily adapted to a new filter method.

PDAF_X_init

The routine PDAF_X_init performs the initialization of filter-specific parameters. In addition, it prints information about the configuration.

The interface is as follows:

  SUBROUTINE PDAF_X_init(subtype, param_int, dim_pint, param_real, dim_preal, &
                                 ensemblefilter, fixedbasis, verbose, outflag)

with the following arguments:

  • subtype: The subtype index of the filter algorithm [integer, input].
  • param_int: The array of integer parameters [integer(dim_pint), input].
  • dim_pint: The number of parameters in param_int [integer, input].
  • param_real: The array of real parameters [real(dim_preal), input].
  • dim_preal: The number of parameters in param_real [integer, input]
  • ensemblefilter: Flag, whether the filter is an ensemble filter or a mode-based filter [logical, output].
  • fixedbasis: Flag, whether the chosen subtype is a filter with fixed ensemble, such that only the ensemble mean is integrated by the model [logical, output].
  • verbose: Verbosity flag [integer, input]. Valid are the values provided to PDAF_init.
  • outflag: Error flag [integer, output]

The required operations are to initialize the PDAF-internal parameter variables from the provided values of subtype, param_int, and param_real. In the addition, the logical flags ensemblefilter and fixedbasis have to be set. The existing implementations also include some screen output about the configuration.

Please note:

  • The routine should check, whether the provided value of subtype is a valid choice. If this is not the case, the error flag should be set to 2.
  • Only parameters from param_int and param_real up to the value dim_pint and dim_preal should be considered in the initialization. The array may be bigger, but the user defined which parameters are to be used be setting the values of dim_pint and dim_preal.
  • The error flag outflag is initial set to 0.
  • The internal parameters are declared and stored in the Fortran module PDAF_mod_filter. If a new filter algorithm requires additional parameters, their declaration should be added to the module.

PDAF_X_alloc

The routine PDAF_X_alloc allocates arrays for the data assimilation, like the ensemble array and a state vector. The success of the allocation is checked.

The interface is as follows:

  SUBROUTINE PDAF_X_alloc(subtype, outflag)

with the following arguments:

  • subtype: The subtype index of the filter algorithm [integer, input].
  • outflag: Error flag [integer, input/output]

All arrays that need to be allocated are declared in the Fortran module PDAF_mod_filter. Here, also the dimensions of the arrays are declared. For the allocation of arrays, one has to distinguish between processes that compute the analysis step and those that only participate in the ensemble forecast.

For the processes that compute the analysis (those with filterpe=.true.) it is mandatory to allocate the following two arrays:

  • state: The state vector of size dim_p.
  • eofV: This is the ensemble matrix in all ensemble-based filters. For SEEK it is the matrix holding eigenvectors. eofV has size (dim_p, dim_ens).

Depending on the filter algorithm some of the following arrays also need to be allocated:

  • eofU: This is the eigenvalue matrix U used in the SEIK and SEEK filters (here, its size is (rank,rank)). For ETKF, it is the matrix A of size (dim_ens,dim_ens). The array only needs to be allocated if the algorithm uses such a matrix. (For EnKF, which does not use this matrix, it is allocated with size (1,1).)
  • state_inc: The increment to the state vector computed in the analysis step. It only needs to be allocated in this routine, if incremental analysis updating is implemented. Otherwise, it is sufficient to allocate and deallocate state_inc in the routine performing the analysis step. The size of state_inc is dim_p.
  • bias: If the filter algorithm is implemented with bias correction, the vector bias with size dim_bias_p is allocated.

Processes that only participate in the computation of the ensemble forecast, but are not involved in computing the analysis step, operate only on a sub-ensemble. Accordingly, an ensemble array for this sub-ensemble has to be allocated. This is:

  • eofV: This is the ensemble matrix in all ensemble-based filters. For SEEK it is the matrix holding eigenvectors. For the processes with filterpe=.false., eofV has size (dim_p, dim_ens_l).

PDAF_X_options

The optional routine PDAF_X_options displays information on the available options for the filter algorithm.

The interface is as follows:

  SUBROUTINE PDAF_X_options()

It has no arguments!

The following display is recommended:

  • Available subtypes (At least '0' for standard implementation; '5' for offline mode)
  • Parameters used from the parameter arrays param_int and param_real.

PDAF_X_memtime

The optional routine PDAF_X_memtime displays information about allocated memory and the execution time of different parts of the filter algorithm.

The interface is as follows:

  SUBROUTINE PDAF_X_memtime(printtype)

with the following argument:

  • printtype: The type of the output to be done [integer, input]. For the filter algorithms that are included in the PDAF source code package the following choices are implemented:
    • 1: Display general timers
    • 2: Display allocated memory
    • 3: Display detailed timers

The timing operation are implement using the module PDAF_timer, which provides the function PDAF_timeit. Memory allocation is computed using PDAF_memcount, which is provided by the module PDAF_memcounting.

PDAFomi_put_state_X / PDAFomi_assimilate_X

One of these routines is directly inserted into the model code, if the online mode of PDAF is used. The text on the implementation of the analysis step in the Implementation Guide explains the interface for the algorithms that are included in the PDAF package. Apart from the usual integer status flag, the interface contains the names of the user-supplied routines that are required for the analysis step. Usually, the minimum set of routines are:

  • A routine to write model fields back into the ensemble state array (U_collect_state).
  • A routine that determines the size of the observation vector (U_init_dim_obs).
  • A routine that contains the implementation of the observation operator (U_obs_op).
  • The pre- and post-step routine in which the forecast and analysis ensembles can be analyzed (U_prepoststep).

Additional routines are possible depending on the requirements of the filter algorithm. When one implements a new filter, one should check, if the filter is compatible with the existing local or global filters. In this case, support for the new filter can be added into one of the existing routines (PDAFomi_assimilate_global/PDAFomi_assimilate_local or likewise PDAFomi_put_state_global/PDAFomi_put_state_local). For working without OMI one can check if one of the existing routines can be reused. This will facilitate the switching between different filter methods.

With regard to the operations, PDAFomi_put_state_X and PDAF_put_state_X prepare for the actual analysis step, that is called inside these routines as a separate routine. The required operations are:

  • Write model fields back into the ensemble array (by calling U_collect_state)
  • Increment the counter for the integrated ensemble members (named counter and provided by the module PDAF_mod_filter.
  • Check, if the ensemble integration is completed (in that case, it is member = local_dim_ens + 1). If not, exit the PDAF_put_state_X.
  • If the ensemble integration is completed, the following operations are required:
    • If more than one model task is used: Collect the sub-ensembles from all model tasks onto the processes that perform the analysis step. This operation is done by the subroutine PDAF_gather_ens.
    • Call the routine that computes the analysis step for the chosen filter algorithm (typically named PDAF_X_update).
    • Reset the control variables for the ensemble forecast (initevol=1, member=1, step=step_obs+1).

In general, the put_state routines of all ensemble-based filters are quite general structures. For the implementation of a new filter one should be able to base on an existing routine, e.g. that of for the ETKF. Then, one has to adapt the interface for the required user-supplied routines of the new filter. In addition, the call of the routine PDAF_X_update holding the analysis step has to be revised (name of the routine, required user-supplied routines).

The routine PDAF_assimilate_X is mainy an interface routine to PDAF_put_state_X. It counts the time steps and calls PDAF_put_state_X when the forecast phase is complete.