Data Assimilation Diagnostics
The data assimilation diagnostics can be used to assess the quality of the ensemble. The first set of routines were ported from the tools developed by the SANGOMA project, in which the PDAF developers were involved. With PDAF V3.0 we extended the set of diagnostics.
This page lists the available diagnostics. Further diagnostics related to the assimilation observations can be computed using the PDAF-OMI Observation Diagnostics that were also introduce with PDAF V3.0.
PDAF_diag_ensmean
This routine computes the ensemble mean state vector.
For a documentation see the detail page on PDAF_diag_ensmean.
PDAF_diag_stddev
This routine computes the mean ensemble standard deviation as the square root of the spatial mean variance. This value is a common diagnostic of the ensemble spread, which is often computed in prepoststep_pdaf
.
There are two variants:
- PDAF_diag_stddev: This routine computes the standard deviation taking into account the parlalelization. Thus, if the ensemble array is decomposed in PDAF, the global standard deviation will be computed. The routine uses the parallelization as initialized by the call to
PDAF_init
. - PDAF_diag_stddev_nompi: This routine does not used parallelization and computes the standard deviation for the provided ensemble array. If this is for a sub-domain in case of a parallelized model, the resulting value is only valid for this sub-domain. This routine does not require that PDAF was initialized by a call to
PDAF_init
.
PDAF_diag_variance
This routine computes the variance field related to the ensemble spread of a provided ensemble array. The routine also return the mean ensemble standard deviation as the square root of the spatial mean variance, analogous to PDAF_diag_stddev
.
There are two variants:
- PDAF_diag_variance: This routine takes parallelization into account. Thus, it computes the variance field for the provided ensemble array. If the ensemble is decomposed by parallelization in PDAF, the global standard deviation will be computed. The routine uses the parallelization as initialized by the call to
PDAF_init
. - PDAF_diag_variance_nompi: This routines does not use parallelization. It computes the variance field and the ensmeble mean standard deviation for the provived ensemble array. If this is for a sub-domain in case of a parallelized model, the resulting mean standard deviation is only valid for this sub-domain. This routine does not require that PDAF was initialized by a call to
PDAF_init
.
PDAF_diag_rmsd
This routine computes root mean square difference (RMSD) beween two input vectors. These could be, for ex the ensemble mean state and a reference state.
there are two variants:
- PDAF_diag_rmsd: This routine takes parallelization. It assumes that both input vectors are decomposed according to the parallelization intiialized by the call to
PDAF_init
. It then computes the RMSD for the global vectors. - PDAF_diag_rmsd_nompi: The routine does not use parallelization. It only computes the RMSD for the provided input vectors. If these are decomposed by the parallelization the RMSD is only computed for the sub-domain. This routine does not require that PDAF was initialized by a call to
PDAF_init
.
Note: For computing the RMSD between a observation and the observed ensemble, one can use the routine PDAFomi_diag_rmsd provided by the PDAF-OMI observation diagnostics module.
PDAF_diag_effsample
This routine compute the effective sample size as used in particle filters. The effective sample size is define as the inverse of the sum of the squared particle weights: n_eff = 1 / sum[(w_i)2]. The effective sample size can range between one - if a single particle has the maximum weight and all other particles have zero weight - and the actual sample size - if all samples have the same weight.
For a documention on PDAF_diag_effsample
see the detail page on PDAF_diag_effsample. The routine is used in the NETF and LNETF filter methods of PDAF.
PDAF_diag_histogram
Rank histograms are frequently used to assess the distribution of an ensemble around an observation or, in twin experiments, the true state. The histograms use bins computed form the ensemble distribution and count how frequent e.g. the observation falls into which bin. A flat histogram typically indicates a good ensemble. A concave (U-shaped) histogram indicates too little ensemble spread, while a convex histogram is obtained when the ensemble spread is too large. Further, a sloped histogram indicates bias. (A discussion on the interpretation of rank histograms can be found in Hamill, Monthly Weather Review, 129 (2001) 550-560)
For a documention on PDAF_diag_histogram
see the detail page on PDAF_diag_histogram. The routine is used in the Lorenz-96 model example in models/lorenz96/compute_truerms.F90
.
PDAF_diag_ensstats
Ensemble Kalman filters assume that the ensemble is Gaussian distributed. In this case the distribution is symmetric and only the first and second moments of the distribution (the mean and standard deviation) are non-zero. The routine PDAF_diag_ensstats
allows a data assimilation program to check the values of the third (skewness) and fourth (kurtosis) moment of the distribution. As there are different definition of the kurtosis, please note that PDAF uses the definition used by Lawson and Hansen, Mon. Wea. Rev. 132 (2004) 1966.
For a documention on PDAF_diag_ensstats
see the detail page on PDAF_diag_ensstats. The routine is used in the Lorenz-96 model example in models/lorenz96/compute_truerms.F90
.
PDAF_diag_CRPS
This routine computes the continuous ranked probability score (CRPS) and its decomposition in reliability and resolution (Hersbach, Weather and Forecasting, 2000).
For use within the PDAF environment, i.e. after calling PDAF_initz
, there is the routine:
This routine is used in the Lorenz-63 model example in models/lorenz63/compute_truerms.F90
.
There are also routines that can be used outside the PDAF environment, i.e. they don't need that PDAF_init
was called before:
- PDAF_diag_CRPS_nompi: This routine computes CRPS and related quantities without parallelization
- PDAF_diag_CRPS_mpi: This routine computes CRPS and related quantities utilizing parallelization to obtain global scores for parallized models.
PDAF_diag_compute_moments
This routine computes the mean, the unbiased variance, the unbiased skewness, and the unbiased excess kurtosis from an ensemble. All moments are stored in a single array. One can select the highest order that should be computed. With parallelized models, the moments are computed for the process-local state.
The routine does not use parallelization and can also be used if PDAF was not initialized by calling PDAF_init
.
For a documentation see the page on PDAF_diag_compute_moments.
PDAF_diag_reliability_budget
This routines computes the reliability budget of an ensemble as proposed by Rodwell, M. J., Lang, S. T. K., Ingleby, N. B., Bormann, N., Holm, E., Rabier, F., ... & Yamaguchi, M. (2016). Reliability in ensemble data assimilation. Quarterly Journal of the Royal Meteorological Society, 142(694), 443-454.
The reliability budget derives a balance relationship that decomposes the
departure between the ensemble mean and observations:
Depar2 = Bias2 + EnsVar + ObsUnc2 + Residual |
under the assumption of a perfectly reliable ensemble. When the residual term is not small, one can identify the sources of problematic ensemble representation of the uncertainty by looking at each terms. One of the benefits of the reliability budget diagnostics is that it can identify spatially local issues in ensemble perturbations.
For a documentation see the page on PDAF_diag_reliability_budget.