Version 3 (modified by 4 years ago) (diff) | ,
---|

# Ensemble Generation

## Overview

To start ensemble-based data assimilation one has to generate an ensemble of model states that represents both the state estimate (provided by the ensemble mean state) and the uncertainty of the state estimate, given by the ensemble spread. In case of ensemble Kalman filters, the uncertainty is estimate by the sample covariance matrix of the ensemble, i.e. the covariance matrix of the deviations of the ensemble states from the ensemble mean state.

There are different possibilities to generate an ensemble. With PDAF we currently provide support for one sampling method, the co-called second-order exact sampling from a model trajectory. This method was introduced by Pham et al. (see e.g. Pham, Monthly Weather Review 129 (2001) 1194-1207). The method is as follows:

- Run the model with which the data assimilation will be performed and write a trajectory of model snapshots into files.
- In a separate program read the model snapshots. Compute the temporal mean state and subtract it from all snapshots. Now compute the singular value decomposition of the resulting matrix. This is known as the EOF-decomposition. It provides the singular vectors (EOFS, empirical orthogonal functions) and singular values (EOF coefficients). The number of EOFs is usually
**r**if the trajectory contains**r+1**model snapshots. Store the singular vectors and singular values in a file. - Now perform the second-order exact sampling: This is usually in the ensemble-initialization routine, e.g.
`init_ens_pdaf.F90`

. Here, read the singular values and vectors into arrays. Multiply each singular vector with the corresponding singular value. Multiply the resulting matrix by a random matrix that preserves the mean and the covriances. This particular matrix can be generated using Householder reflections. PDAF provides the routine`PDAF_seik_omega`

to generate this matrix. Now multiply the result with the square-root of the ensemble size minus one. The resulting matrix holds ensemble perturbations. Thus the sum over each row is very. Finally add a desired central (mean) state to the matrix. This could e.g. by the model state at the time at which the data assimilation will be started. The ensemble size will be the number of singular vectors (**r**) plus one. Thus, to control the ensemble size (**m**), one can just read in the**m-1**leading singular vectors (those with the largest singular values) and perform the second-order exact sampling with these vectors.

The described sampling method is efficient because it uses the long time variability of the model dynamics to estimate the uncertainty. The trajectory and hence the full set of singular vectors hold the full information on the variability. If a large number of singular vectors is stored in step 2 into a file, one can freely choose the ensemble size up to **r+1**. Using the leading EOFs provides a systematic approximation of the model variability. Thus, a small ensemble will contain the leading patterns of model variability, which have the largest amplitudes. Increasing the ensemble will systematically add finer patterns.

## Ensemble generation with PDAF

In order to simplify the ensemble generation with second-order exact sampling, PDAF provides two routines:

### PDAF_eofcovar

For step 2, the routine PDAF_eofcovar helps to compute the singular value decomposition. For this task, one provides the routine with the matrix holding the model snapshots. One can also let the routine compute the temporal mean state, or one can compute and subtract the mean state before (this is particularly useful if one does not want to subtract the mean state over the full model trajectory). The routine computes the singular values decomposition and returns the singular vectors and values.

Using `PDAF_eofcovar`

, one has to read in the trajectory. Then one calls the routine, and finally one stores the singular vectors and values into a file. This procedure is demonstrated for the Lorenz-96 model in `testsuite/src/lorenz96/tools/generate_covar.F90`

.

For the documentation of the calling interface of `PDAF_eofcovar`

see the detail page about the routine.

### PDAF_sampleens

For step 3, the routine PDAF_sampleens performs the second-order exact sampling. For this task, one provides the routine the singular vectors and values from step 2 and a central state for the ensemble mean. The routine will then generate and return the ensemble to the program that called `PDAF_sampleens`

.

The ensemble generation is usually done when a data assimilation sequence is initialized. One read the singular vectors and values from the file generate in step 2. In addition one read the central state of the ensemble. then one calls `PDAF_sampleens`

and obtains an ensmeble for the data assimilation.