Version 1 (modified by 4 years ago) (diff) | ,
---|

# Ensemble Generation

To start ensemble-based data assimilation one has to generate an ensemble of model states that represents both the state estimate (provided by the ensemble mean state) and the uncertainty of the state estimate, given by the ensemble spread. In case of ensemble Kalman filters, the uncertainty is estimate by the sample covariance matrix of the ensemble, i.e. the covariance matrix of the deviations of the ensemble states from the ensemble mean state.

There are different possibilities to generate an ensemble. With PDAF we currently provide support for one sampling method, the co-called second-order exact sampling from a model trajectory. This method was introduced by Pham et al. (see e.g. Pham, Monthly Weather Review 129 (2001) 1194-1207). The method is as follows:

- Run the model with which the data assimilation will be performed and write a trajectory of model snapshots into files.
- In a separate program read the model snapshots. Compute the temporal mean state and subtract it from all snapshots. Now compute the singular value decomposition of the resulting matrix. This is known as the EOF-decomposition. It provides the singular vectors (EOFS, empirical orthogonal functions) and singular values (EOF coefficients). The number of EOFs is usually
**r**if the trajectory contains**r+1**model snapshots. Store the singular vectors and singular values in a file. - Now perform the second-order exact sampling: This is usually in the ensemble-initialization routine, e.g.
`init_ens_pdaf.F90`

. Here, read the singular values and vectors into arrays. Multiply each singular vector with the corresponding singular value. Multiply the resulting matrix by a random matrix that preserves the mean and the covriances. This particular matrix can be generated using Householder reflections. PDAF provides the routine`PDAF_seik_omega`

to generate this matrix. Now multiply the result with the square-root of the ensemble size minus one. The resulting matrix holds ensemble perturbations. Thus the sum over each row is very. Finally add a desired central (mean) state to the matrix. This could e.g. by the model state at the time at which the data assimilation will be started. The ensemble size will be the number of singular vectors (**r**) plus one. Thus, to control the ensemble size (**m**), one can just read in the**m-1**leading singular vectors (those with the largest singular values) and perform the second-order exact sampling with these vectors.

The described sampling method is efficient because it uses the long time variability of the model dynamics to estimate the uncertainty. The trajectory and hence the full set of singular vectors hold the full information on the variability. If a large number of singular vectors is stored in step 2 into a file, one can freely choose the ensemble size up to **r+1**. Using the leading EOFs provides a systematic approximation of the model variability. Thus, a small ensemble will contain the leading patterns of model variability, which have the largest amplitudes. Increasing the ensemble will systematically add finer patterns.