= Data Assimilation Diagnostics = [[PageOutline]] The data assimilation diagnostics can be used to assess the quality of the ensemble. The first set of routines were ported from the tools developed by the SANGOMA project, in which the PDAF developers were involved. With PDAF V3.0 we extended the set of diagnostics. This page lists the available diagnostics. Further diagnostics related to the assimilation observations can be computed using the [wiki:OMI_observation_diagnostics_PDAF3 PDAF-OMI Observation Diagnostics] that were also introduce with PDAF V3.0. == PDAF_diag_ensmean == This routine computes the ensemble mean state vector. For a documentation see the [wiki:PDAF_diag_ensmean detail page on PDAF_diag_ensmean]. == PDAF_diag_stddev == This routine computes the mean ensemble standard deviation as the square root of the spatial mean variance. This value is a common diagnostic of the ensemble spread, which is often computed in `prepoststep_pdaf`. There are two variants: * [wiki:PDAF_diag_stddev]: This routine computes the standard deviation taking into account the parlalelization. Thus, if the ensemble array is decomposed in PDAF, the global standard deviation will be computed. The routine uses the parallelization as initialized by the call to `PDAF_init`. * [wiki:PDAF_diag_stddev_nompi]: This routine does not used parallelization and computes the standard deviation for the provided ensemble array. If this is for a sub-domain in case of a parallelized model, the resulting value is only valid for this sub-domain. This routine does not require that PDAF was initialized by a call to `PDAF_init`. == PDAF_diag_variance == This routine computes the variance field related to the ensemble spread of a provided ensemble array. The routine also return the mean ensemble standard deviation as the square root of the spatial mean variance, analogous to `PDAF_diag_stddev`. There are two variants: * [wiki:PDAF_diag_variance]: This routine takes parallelization into account. Thus, it computes the variance field for the provided ensemble array. If the ensemble is decomposed by parallelization in PDAF, the global standard deviation will be computed. The routine uses the parallelization as initialized by the call to `PDAF_init`. * [wiki:PDAF_diag_variance_nompi]: This routines does not use parallelization. It computes the variance field and the ensmeble mean standard deviation for the provived ensemble array. If this is for a sub-domain in case of a parallelized model, the resulting mean standard deviation is only valid for this sub-domain. This routine does not require that PDAF was initialized by a call to `PDAF_init`. == PDAF_diag_rmsd == This routine computes root mean square difference (RMSD) beween two input vectors. These could be, for ex the ensemble mean state and a reference state. there are two variants: * [wiki:PDAF_diag_rmsd]: This routine takes parallelization. It assumes that both input vectors are decomposed according to the parallelization intiialized by the call to `PDAF_init`. It then computes the RMSD for the global vectors. * [wiki:PDAF_diag_rmsd_nompi]: The routine does not use parallelization. It only computes the RMSD for the provided input vectors. If these are decomposed by the parallelization the RMSD is only computed for the sub-domain. This routine does not require that PDAF was initialized by a call to `PDAF_init`. Note: For computing the RMSD between a observation and the observed ensemble, one can use the routine [wiki:OMI_observation_diagnostics_PDAF3 PDAFomi_diag_rmsd] provided by the PDAF-OMI observation diagnostics module. == PDAF_diag_effsample == This routine compute the effective sample size as used in particle filters. The effective sample size is define as the inverse of the sum of the squared particle weights: '''n_eff = 1 / sum[(w_i)^2^]'''. The effective sample size can range between one - if a single particle has the maximum weight and all other particles have zero weight - and the actual sample size - if all samples have the same weight. For a documention on `PDAF_diag_effsample` see the [wiki:PDAF_diag_effsample detail page on PDAF_diag_effsample]. The routine is used in the NETF and LNETF filter methods of PDAF. == PDAF_diag_histogram == Rank histograms are frequently used to assess the distribution of an ensemble around an observation or, in twin experiments, the true state. The histograms use bins computed form the ensemble distribution and count how frequent e.g. the observation falls into which bin. A flat histogram typically indicates a good ensemble. A concave (U-shaped) histogram indicates too little ensemble spread, while a convex histogram is obtained when the ensemble spread is too large. Further, a sloped histogram indicates bias. (A discussion on the interpretation of rank histograms can be found in Hamill, Monthly Weather Review, 129 (2001) 550-560) For a documention on `PDAF_diag_histogram` see the [wiki:PDAF_diag_histogram detail page on PDAF_diag_histogram]. The routine is used in the Lorenz-96 model example in `models/lorenz96/compute_truerms.F90`. == PDAF_diag_ensstats == Ensemble Kalman filters assume that the ensemble is Gaussian distributed. In this case the distribution is symmetric and only the first and second moments of the distribution (the mean and standard deviation) are non-zero. The routine `PDAF_diag_ensstats` allows a data assimilation program to check the values of the third (skewness) and fourth (kurtosis) moment of the distribution. As there are different definition of the kurtosis, please note that PDAF uses the definition used by Lawson and Hansen, Mon. Wea. Rev. 132 (2004) 1966. For a documention on `PDAF_diag_ensstats` see the [wiki:PDAF_diag_ensstats detail page on PDAF_diag_ensstats]. The routine is used in the Lorenz-96 model example in `models/lorenz96/compute_truerms.F90`. == PDAF_diag_CRPS == This routine computes the continuous ranked probability score (CRPS) and its decomposition in reliability and resolution (Hersbach, Weather and Forecasting, 2000). For use within the PDAF environment, i.e. after calling `PDAF_initz`, there is the routine: * [wiki:PDAF_diag_crps] This routine is used in the Lorenz-63 model example in `models/lorenz63/compute_truerms.F90`. There are also routines that can be used outside the PDAF environment, i.e. they don't need that `PDAF_init` was called before: * [wiki:PDAF_diag_CRPS_nompi]: This routine computes CRPS and related quantities without parallelization * [wiki:PDAF_diag_CRPS_mpi]: This routine computes CRPS and related quantities utilizing parallelization to obtain global scores for parallized models. == PDAF_diag_compute_moments == This routine computes the mean, the unbiased variance, the unbiased skewness, and the unbiased excess kurtosis from an ensemble. All moments are stored in a single array. One can select the highest order that should be computed. With parallelized models, the moments are computed for the process-local state. The routine does not use parallelization and can also be used if PDAF was not initialized by calling `PDAF_init`. For a documentation see the [wiki:PDAF_diag_compute_moments page on PDAF_diag_compute_moments]. == PDAF_diag_reliability_budget == This routines computes the reliability budget of an ensemble as proposed by Rodwell, M. J., Lang, S. T. K., Ingleby, N. B., Bormann, N., Holm, E., Rabier, F., ... & Yamaguchi, M. (2016). Reliability in ensemble data assimilation. Quarterly Journal of the Royal Meteorological Society, 142(694), 443-454. The reliability budget derives a balance relationship that decomposes the departure between the ensemble mean and observations:[[BR]] || Depar^2^ = Bias^2^ + !EnsVar + !ObsUnc^2^ + Residual||[[BR]] under the assumption of a perfectly reliable ensemble. When the residual term is not small, one can identify the sources of problematic ensemble representation of the uncertainty by looking at each terms. One of the benefits of the reliability budget diagnostics is that it can identify spatially local issues in ensemble perturbations. For a documentation see the [wiki:PDAF_diag_reliability_budget page on PDAF_diag_reliability_budget].