Changes between Version 3 and Version 4 of EnsembleGeneration

Dec 17, 2016, 9:45:32 PM (4 years ago)



  • EnsembleGeneration

    v3 v4  
    11= Ensemble Generation =
    35== Overview ==
    3032The ensemble generation is usually done when a data assimilation sequence is initialized. One read the singular vectors and values from the file generate in step 2. In addition one read the central state of the ensemble. then one calls `PDAF_sampleens`  and obtains an ensmeble for the data assimilation.
     34There are two variants to use `PDAF_sampleens`:
     351. One can use the routine in the data assimilation program in the call-back routine that initializes the ensemble array for PDAF (usually `init_ens_pdaf`). Thus one starts the data assimilation program. when `PDAF_init` is executed, it calls `init_ens_pdaf`. Here one can use `PDAF_sampleens` to generate the ensemble from the stored singular values and vectors and directly store it in the ensemble array provided by PDAF. Thus, one can directly start the data assimilation sequence. This variant is demonstrated for the Lorenz-96 model in `testsuite/src/lorenz96/init_ens_eof.F90`.
     361. One can use a separate program that reads the singular values and vectors, and a central ensemble state and then calls `PDAF_sampleens` to generate the ensemble. Then one can store the ensemble state in ensemble restart files. This variant can be useful for both the online- and offline modes of PDAF: [[BR]]In the online mode, one would implement `init_ens_pdaf` as a routine reading the ensemble states from files. This is useful, e.g., for the case of a large scale data assimilation when the execution time for the forecasts and the analysis steps is so long that one needs to checkpoint and restart the data assimilation. In this case the checkpointing could write ensemble restart files in the same format as the ensemble generation program does. Thus, one can use `init_ens_pdaf` to read ensemble restart files either at the very beginning of a data assimilation sequence or when the assimilation program is restarted. [[BR]] In the offline-mode the data assimilation program only contains the analysis step, while the forecasts are computed by the model program. Thus, the model program just reads model restart files, but does not execute `init_ens_pdaf`. In this case one would let the program that generates the ensemble used `PDAF_sampleens` write restart files for the model. The model will use these files to initialize the model fields so that the forecast for an ensemble state can be computed. At the end of a forecast phase, the model writes new restart files. The restart files for each ensemble state are then read by the data assimialtion program to compute the analysis step.
     38== Variants and options ==
     40=== Mean state ===
     42The routine `PDAF_eofcovar` has the option to compute and subtract the temporal mean state before computing the singular value decomposition. Alternatively, one can subtract the mean before calling `PDAF_eofcovar`. The first variant is only useful if the temporal mean state over the full model trajectory should be subtracted. In many cases this would not be a good choice. For example if one runs an ocean model over one year the state trajectory will contain the seasonal cycle. However, the seasonal cycle is usually not an uncertainty (If one starts the data assimilation e.g. in January, one knows the typical conditions for January). In this case it will be useful to compute a running mean state, e.g. over 2-3 months and subtract this mean state from the snapshots. This computation is case-specific and can only be done before calling `PDAF_eofcovar`.
     44=== Multivariate normalization ===
     46The multivariate normalization normalizes the different variability of different model variables. The routine `PDAF_eofcovar` allows to activate a multivariate normalization. The normalization is motivated by the fact that different model variables can show very different variability. For example, the temperature field in a global ocean model varies much more than the salinity or the velocities. Without multivariante normalization, the leading singular vectors can contain mainly information from those variables that show the largest variability. thus, if the ensemble is then generated from the leading singular vectors, the variability of some model variables might be underrepresented. The multivariate normalization avoid this case, because it normalizes the variability of each model variable to one before computing the singular value decomposition. Thus, the singular value decomposition treats each model fields equivalently. The model fields in the resulting singular vectors are then re-scaled to their original variability by `PDAF_eofcovar`. Due to the normalization, the leading singular vectors will contain contributions from all different model variables. thus, the different fields should be better represented in the ensemble states.
     48=== Central ensemble state ===
     50Using second-order exact sampling one can freely choose the central ensemble state  (i.e. the ensemble mean), which is usually used as the state estimate in ensemble Kalman filters. Usually, one should try to select the best state estimate one has available. A long-term mean state would be a very uninformed state. However, if one wants to access the data assimilation influence, also the long-time mean, e.g. the mean state over the whole trajectory in step 2 above could be a useful choice (This is e.g. used in the example of the Lorenz-96 model. However this model is chaotic without seasonality, so that this state does in deed represent the best estimate).
     53=== Scaling the initial ensemble ===
     55using the model variability does not guarantee that the estimated uncertainty is realistic. While the sampling provides estimates of the covariances that result from the model dynamics, the variances might not have the expected amplitude. One can check this by comparing the estimated variances (i.e. the diagonal of the ensmeble sample error covariance matrix, which is computed in the examples like the Lorenz-96 model) with the deviation of the ensemble mean from the observations. Often, one can plot both quantities like model fields (e.g. if ocean surface temperature is observed). Both quantities will be different, but if they are on the same order of magnitude the uncertainty estimate is usually realistic. If the uncertainty is overestimated, one can reduce the ensemble spread (by multiplying the ensemble perturbations by a factor between 0 and 1). Likewise one can increase the ensmeble spread, if it underestimates the difference between the ensemble mean and the observations.