# Hands-on Example: Building an Assimilation System with PDAF¶

In this practical, we will build a data assimilation system using pyPDAF. pyPDAF is a Python interface to PDAF (Parallel Data Assimilation Framework). The framework is mainly designed for ensemble data assimilation systems with high-dimensional complex weather and climae models. It was used for both research and operational purposes. The Python interface allows for the use of PDAF with Python models.

As per its name, the framework is designed with the aim to implement efficient parallelised data assimilation system. In this practical, to illustrate the workflow of PDAF, we provide a step-by-step tutorial on designing a DA system without using parallelisation.

## Install pyPDAF¶

Before discussing the DA system, let us install pyPDAF first. If you are familiar with Python, you might have the package manager `conda`

installed. This can be obtained using `anaconda`

or `miniconda`

. This functionality can be used on Google Colab as well.

### On local computer:¶

In your terminal or anaconda prompt, run `conda create -n pypdaf -c yumengch -c conda-forge pypdaf jupyter`

.

You can then open this notebook using the command `jupyter notebook`

### On Google Colab (skip this section when you're not using Google Colab):¶

The following step will install `conda`

on the Google Colab.

```
!pip install -q condacolab
import condacolab
condacolab.install()
```

Now, we can install pyPDAF using conda

```
%%capture
# installation of pyPDAF
!conda install -c yumengch -c conda-forge pypdaf
```

To provide a better view of PDAF output, we have to use wurlitzer

```
%%capture
# wurlitzer is a package that allows us to see PDAF output
!pip install wurlitzer
```

## A short introduction to the example 2D model¶

We need to first get the example data:

```
%%bash
git init; git remote add -f origin -t PDAF_V2.1 https://github.com/PDAF/PDAF.git
git sparse-checkout init
git sparse-checkout set "tutorial/inputs_online"
git pull
mv tutorial/inputs_online inputs_online
```

- Spatial Domain: Two dimensional domain grid domain with $(nx \times ny) = (36 \times 18)$ grid points
- Total steps: we will run this model by
`nsteps = 18`

time steps

With this information, we can define the model variable `field`

.

```
import numpy as np
# define the array for model field
nsteps = 18 # total time steps
nx = 36 # 36 columns
ny = 18 # 18 rows
# initial condition + 18 time steps, 18 rows and 36 columns
field = np.zeros((nsteps + 1, ny, nx))
```

The initial condition of the model is a sine wave in the diagonal direction:

```
"""read the initial condition of the model field"""
field[0] = np.loadtxt('inputs_online/true_initial.txt')
```

The model shifts the sine wave along the y-axis by one grid point per time step

```
def step(field):
"""Roll array elements of i-th time step along a the first axis."""
return np.roll(field, shift=1, axis=-2)
```

### How does the model look like?¶

We can visualise the model evolution:

```
import matplotlib.pyplot as plt
from matplotlib import animation
from IPython.display import HTML
fig = plt.figure('animation')
ax = fig.add_subplot(111)
pc = ax.pcolormesh(field[0], cmap='coolwarm', vmin=-1, vmax=1)
fig.colorbar(pc, ax=ax)
def draw_model(i):
"""Draw each model step
"""
# run the model
field[i+1] = step(field[i])
pc.set_array(field[i+1])
ax.set_title(f'Model step {i+1}')
return pc,
# make an animation
anim = animation.FuncAnimation(fig, draw_model, frames=nsteps, interval=1000, blit=True)
plt.close(fig)
HTML(anim.to_html5_video())
```

## Observations¶

- Select a set of observations at 28 grid points
- Observations are sampled from a Gaussian distribution with a standard deviation of 0.5
- full 2D field, -999 marks ‘no data’
- One observation file at each time step
- They're stored in 'obs_step*.txt'

```
# define observation array
obs = np.ma.zeros((nsteps, ny, nx))
fig = plt.figure('animationObs')
ax = fig.add_subplot(111)
pc = ax.pcolormesh(obs[0], cmap='coolwarm', vmin=-1, vmax=1)
fig.colorbar(pc, ax=ax)
def draw_model(i):
"""Draw obs. at each model step
"""
obs[i] = np.loadtxt(f'inputs_online/obs_step{i+1}.txt')
obs[i] = np.ma.masked_where(np.isclose(obs[i], -999.), obs[i])
pc.set_array(obs[i])
ax.set_title(f'Observation at step {i+1}')
return pc,
# make an animation
anim = animation.FuncAnimation(fig, draw_model, frames=18, interval=1000, blit=True)
plt.close(fig)
HTML(anim.to_html5_video())
```

### Let's check the observation error¶

Is it close to the standard deviation of 0.5?

```
# calculate the root mean squared obs. err in time
err = np.sqrt(np.sum((field[1:] - obs)**2, axis=0)/nsteps)
# plot the observation error
fig = plt.figure('R')
ax = fig.add_subplot(111)
pc = ax.pcolormesh(err, cmap='Blues', vmin=0., vmax=1.)
ax.set_title(f'Spatial averaged obs. err is {np.round(np.mean(err), 3)}')
fig.colorbar(pc, ax=ax)
```

<matplotlib.colorbar.Colorbar at 0x736f7466d5d0>

## Set up a data assimilation system using pyPDAF¶

We first import necessary packages. In this simple example, we only need pyPDAF, numpy and mpi4py. We will not use parallisation in this example, but we still need mpi4py for running PDAF.

```
import pyPDAF.PDAF as PDAF
import mpi4py.MPI as MPI
import numpy as np
# from wurlitzer import pipes
```

### Initialise PDAF¶

The initialisation of PDAF tells PDAF the our choice of data assimilation algorithms, ensemble size, inflation factor, and the dimension of the state vector. In this example, for the sake of simplicity, we do not use any localisation and no inflation is applied.

The **standard error space transform Kalman filter (ESTKF)** is used with **9** ensemble members. We will estimate the state of every model grid point, which gives us a state vector with the size of nx × ny = 36 × 18 = 648

```
# using error space transform Kalman filter (ESTKF)
filtertype = 6
# standard form
subtype = 0
# dimension of the state vector
# if model is parallelised, this is the dimension of state vector on each process
dim_state_p = nx*ny
# number of ensemble members
dim_ens = 9
# forget factor
forget_factor = 1.0
```

In addition to the basic information, PDAF also asks for an **initial ensemble**. This information is given by the user-supplied function. These functions have fixed interface. Therefore

- the input arguments and return variables should not be changed.

The initial ensemble is read from given text files here. In real applications, we may need to use algorithms to generate perturbations to create an initial ensemble. For the sake of conciseness, documentation of the input arguments and return variable of this function can be found in pyPDAF documentation and PDAF documentation

```
def init_ens_pdaf(filtertype, dim_p, dim_ens, state_p, uinv, ens_p, status_pdaf):
"""Here, only ens_p variable matters while dim_p and dim_ens defines the
size of the variables. uinv, state_p are not used in this example.
status_pdaf is used to handle errors which we will not do it in this example.
"""
for i in range(dim_ens):
ens_p[:, i] = np.loadtxt(f'inputs_online/ens_{i+1}.txt').ravel()
return state_p, uinv, ens_p, status_pdaf
```

With these information, we can call PDAF function `PDAF.init`

to initialise the DA system

```
# this gives the verbose level of the PDAF, here we use 3 which is very verbose
screen = 3
# current step of the model which is 0
current_step = 0
# with pipes() as (out, err):
_, _, status = PDAF.init(filtertype, subtype, current_step,
np.array([dim_state_p, dim_ens]),
np.array([forget_factor, ]),
MPI.COMM_WORLD.py2f(), MPI.COMM_WORLD.py2f(),
MPI.COMM_WORLD.py2f(),
task_id=1, n_modeltasks=1, in_filterpe=True,
py__init_ens_pdaf=init_ens_pdaf,
in_screen=screen)
# print PDAF screen output
#print (out.read())
```

PDAF ++++++++++++++++++++++++++++++++++++++++++++++++++++++ PDAF +++ PDAF +++ PDAF +++ Parallel Data Assimilation Framework +++ PDAF +++ +++ PDAF +++ Version 2.1 +++ PDAF +++ +++ PDAF +++ Please cite +++ PDAF +++ L. Nerger and W. Hiller, Computers and +++ PDAF +++ Geosciences, 2013, 55, 110-118, +++ PDAF +++ doi:10.1016/j.cageo.2012.03.026 +++ PDAF +++ when publishing work resulting from using PDAF +++ PDAF ++++++++++++++++++++++++++++++++++++++++++++++++++++++ PDAF: Initialize filter PDAF ++++++++++++++++++++++++++++++++++++++++++++++++++++++ PDAF +++ Error Subspace Transform Kalman Filter (ESTKF) +++ PDAF +++ +++ PDAF +++ Nerger et al., Mon. Wea. Rev. 140 (2012) 2335 +++ PDAF +++ doi:10.1175/MWR-D-11-00102.1 +++ PDAF ++++++++++++++++++++++++++++++++++++++++++++++++++++++ PDAF ESTKF configuration PDAF filter sub-type = 0 PDAF --> Standard ESTKF with ensemble integration PDAF --> Deterministic ensemble transformation PDAF --> Use fixed forgetting factor: 1.00 PDAF --> ensemble size: 9 PDAF: FILTER-PE - model task: 1 ; mype(w)= 0 ; mype(m)= 0 ; mype(f)= 0 ; npes(f)= 1 PDAF: model task 1 mype(m) 0 ; local ensemble size= 9 PDAF: PE(filter) 0 ; local displacements for ensemble= 0 PDAF: Initialize Parallelization PDAF Parallelization - Filter on model PEs: PDAF Total number of PEs: 1 PDAF Number of parallel model tasks: 1 PDAF PEs for Filter: 1 PDAF # PEs per ensemble task and local ensemble sizes: PDAF Task 1 PDAF #PEs 1 PDAF N 9 PDAF: estkf_alloc - allocate eofV of size 9 on pe(f) 0 PDAF: Call routine for ensemble initialization PDAF: Initialization completed PDAF --- duration of PDAF initialization: 0.005 s

### Distribution of the ensemble from PDAF¶

After PDAF initialisation, PDAF should distribute the ensemble back to the model for the following forecast. Three user-supplied functions are needed:

- PDAF should distribute the initial ensemble to the model for following model forecasts (
`distribute_state`

) - The ensemble might need some processing before being distributed to the model (
`initial_process`

) - PDAF will need to know how many steps the forecast will take. In other words, when do we do the next data assimilation analysis. That is, when will the next observation arrive? (
`next_observation`

)

In this example, we define a class of `PDAF_distributor`

for these user-supplied functions.

```
class PDAF_distributor:
def __init__(self, nx, ny, dim_ens):
# counter for the i-th ensemble member when distribute
self.i_ens_pdaf = 0
# define the model field based on the ensemble
self.nx, self.ny = nx, ny
self.field = np.zeros((dim_ens, ny, nx))
```

In this non-parallel code, PDAF will distribute ensemble members one by one with `distribute_state`

method. It is useful to keep a counter for the index of currently distributed ensemble member.

```
class PDAF_distributor(PDAF_distributor):
def distribute_state(self, dim_p, state_p):
"""PDAF will distribute state vector (state_p) to model field
"""
self.field[self.i_ens_pdaf] = state_p[:].reshape((self.ny, self.nx))
self.i_ens_pdaf += 1
return state_p
def reset_ens_index(self):
"""reset ensemble index to 0
"""
self.i_ens_pdaf = 0
```

In the initial-processing, we only show screen output of root mean squared error based on sampled variance.

```
class PDAF_distributor(PDAF_distributor):
def initial_process(self, step, dim_p, dim_ens, dim_ens_p, dim_obs_p, state_p, uinv, ens_p, flag):
"""initial processing of the ensemble before it is distributed to model fields
"""
print (f'RMS error according to sampled variance: {np.sqrt(np.mean(np.var(ens_p, axis=1)))}')
return state_p, uinv, ens_p
```

When obtaining the initial ensemble, users also need to provide information about when do we do the next analysis based on the arrival of the new observations. In our case, we have observations for each time step.

```
class PDAF_distributor(PDAF_distributor):
def next_observation(self, stepnow, nsteps, doexit, time):
# next observation will arrive at `nsteps' steps
nsteps = 2
# doexit = 0 means that PDAF will continue to distribute state
# to model for further integrations
doexit = 0
# model time is not used here as we only use steps to define the time
return nsteps, doexit, time
```

Here, we call pyPDAF's `get_state`

function where it will also execute our user-supplied functions.

```
status = 0
distributor = PDAF_distributor(nx, ny, dim_ens)
distributor.reset_ens_index()
steps_for = 0
doexit = 0
# loop over all dimensions
for i in range(dim_ens):
# with pipes() as (out, err):
steps_for, time, doexit, status = PDAF.get_state(steps_for, doexit,
distributor.next_observation,
distributor.distribute_state,
distributor.initial_process,
status)
# print (out.read())
# put model variable in distributor back to model
field = distributor.field
```

PDAF ---------------------------------------------------------------- PDAF +++++ ASSIMILATION +++++ PDAF ---------------------------------------------------------------- PDAF Call pre-post routine at initial time RMS error according to sampled variance: 0.5434035532444028 PDAF --- duration of prestep: 0.001 s PDAF Forecast ------------------------------------------------------- PDAF Evolve state ensemble

The `get_state`

function returns `steps_for`

variable, which is the number of forecast time steps before the next analysis. This is equal to the `nsteps`

defined in `next_observation`

function.

### Forward data assimilation system¶

Data assimilation combines the model forecast and the observations. Hence, the data assimilation system does two things

- model forecast
- in this example, the
`step`

function is used as we defined previously - data assimilation, where PDAF carries out the following operations based on user-supplied functions
- collecting state vector from model variables
- collecting observations
- distributing analysis back to model variables

At each analysis step, PDAF must collect the new forecast from the model and handle observations. To ensure the flexibility of the framework, these information depends on the user-supplied functions. As observations and models are always case-specific.

Here, we define two classes, `PDAF_collector`

and `Obs`

. The `PDAF_collector`

obtains the model forecast and the `Obs`

will use the `Observation Module Infrastructure`

in PDAF, a scheme to ease the difficulty in handling observations.

```
class PDAF_collector:
def __init__(self, nx, ny, field):
# counter for the i-th ensemble member when distribute
self.i_ens_pdaf = 0
self.nx = nx
self.ny = ny
# define the model field based on the ensemble
self.field = field
class Obs:
def __init__(self, i_obs):
# i_obs-th observations in the system starting from 1
self.i_obs = i_obs
```

#### Collecting forecast¶

Before going into the details of observation handling, we first get functions that collects the model forecast (`collect_state_pdaf`

) and preprocess the ensemble before the data assimilation. In the pre-processing step, we only calculate the forecast error and save the forecast ensemble. After the completion of analysis, the analysis will be postprocessed before being distributed to the model. For example, this could be dealing with bounded variables, or some transformation of the state vector. The pre- and post- processing are handled in (`preprocess`

), where argument `step`

determines whether it is pre-processing or post-processing. `step`

< 0 represents pre-processing while `step`

> 0 represents post-processing.

```
class PDAF_collector(PDAF_collector):
def collect_state(self, dim_p, state_p):
"""PDAF will collect state vector (state_p) from model field
"""
state_p[:] = self.field[self.i_ens_pdaf].ravel()
self.i_ens_pdaf += 1
return state_p
def reset_ens_index(self):
"""reset ensemble index to 0
"""
self.i_ens_pdaf = 0
def preprocess(self, step, dim_ens, ens_p):
"""preprocessing of the ensemble before it is used by DA algorithms
"""
print (f'Forecast RMS error according to sampled variance: {np.sqrt(np.mean(np.var(ens_p, axis=1)))}')
for i in range(dim_ens):
np.savetxt(f'outputs/ens_{i+1}_step{-step}_for.txt' , ens_p[:, i].reshape((self.ny, self.nx)) )
def postprocess(self, step, dim_ens, ens_p):
"""initial processing of the ensemble before it is distributed to model fields
"""
print (f'Analysis RMS error according to sampled variance: {np.sqrt(np.mean(np.var(ens_p, axis=1)))}')
for i in range(dim_ens):
np.savetxt(f'outputs/ens_{i+1}_step{step}_ana.txt' , ens_p[:, i].reshape((self.ny, self.nx)) )
def prepostprocess(self, step, dim_p, dim_ens, dim_ens_p, dim_obs_p, state_p, uinv, ens_p, flag):
if step < 0:
self.preprocess(step, dim_ens, ens_p)
else:
self.postprocess(step, dim_ens, ens_p)
return state_p, uinv, ens_p
```

#### Handling observations¶

Another essential ingredient of data assimilation is observation. Here, user-supplied functions give all information about the observations to PDAF. We use the OMI scheme in PDAF to handle observations. Without any localisations, only two user-supplied functions are required with the OMI scheme.

Before we use the OMI scheme, we need to provide the number of observation types by `PDAF.omi_init`

. Here, we use only one type of observations.

```
PDAF.omi_init(1)
```

In this very simple example, it may look a bit verbose, but it can be useful for more complex systems.

The OMI has four mandatory properties:

`doassim`

: whether this observation is assimilated. If`doassim = 1`

, this observation will be assimilated. If`doassim = 0`

, it will not be assimilated.`disttype`

: In localisation, how do we calculate the distance between grid points? e.g., Cartesian, geographic, or great circle distance on a sphere. We do not use localisation in this example, but we still have to provide this option.`ncoord`

: Number of coordinates used for computation in localisation. In our example, as we have a 2D domain, the number should be 2.`id_obs_p`

: Indices of observed field in state vector. This is a 2D array that should have the same length as the observector vector for each dimension. If the observations do not need interpolation (e.g., observations are co-located with model grid points), the first dimension is 1. In this case, if the i-th observation is at the j-th element of the state vector, the i-th element of`id_obs_p`

is`j`

. If interpolation is needed, each dimension is the adjacent model grid points.

In the PDAF, these properties can be given to derived type `obs_f`

. In the pyPDAF, setter functions are provided. PDAFomi collects the observation vector, error variance, and the spatial coordinate of the observations in the `PDAF.omi_gather_obs`

function. This function returns the dimension of the full observation vector if observation reading are performed in parallel. Therefore, we put this function call in the user-supplied function, `init_dim_obs`

.

```
class Obs(Obs):
def init_dim(self, step, dim_obs):
# We always assimilate the observation
PDAF.omi_set_doassim(self.i_obs, 1)
# Type of distance computation to use for localization
# It is mandatory for OMI even if we don't use localisation
PDAF.omi_set_disttype(self.i_obs, 0)
# Number of coordinates use for distance computation
PDAF.omi_set_ncoord(self.i_obs, 2)
# read observations
obs = np.loadtxt(f'inputs_online/obs_step{step}.txt')
# get the dimension of the model grid
ny, nx = obs.shape
# flatten the observations
obs = obs.ravel()
# a mask for observed gridpoints
condition = np.logical_not(np.isclose(obs, -999))
# observation vector
y = obs[condition]
# The relationship between observation and state vector
# we only have 28 osbervations and each observation corresponds to
# the grid point of one element in the state vector
# id_obs_p gives the indices of observed field in state vector
# the id starts from 1
id_obs_p = np.zeros((1, len(y)))
id_obs_p[0] = np.arange(1, len(obs) + 1, dtype=int)[condition]
PDAF.omi_set_id_obs_p(self.i_obs, id_obs_p)
# inverse of observation variance
ivar_obs_p = 1./0.5/0.5*np.ones_like(y)
# coordinate of each observations
ocoord_p = np.zeros((2, len(y)))
ocoord_p[0] = np.tile(np.arange(nx), ny)[condition]
ocoord_p[1] = np.repeat(np.arange(ny), nx)[condition]
# not being used here, only used for localisation
local_range = 0.
dim_obs = PDAF.omi_gather_obs(self.i_obs, y,
ivar_obs_p, ocoord_p, local_range)
return dim_obs
```

The other user-supplied function in this example will be the observation operator (`obs_op`

). In this simple example, the state vector in the observation space can be conviently obtained by the OMI scheme using the information provided in the `init_dim_obs`

function.

```
class Obs(Obs):
def op(self, step, dim_p, dim_obs_p, state_p, ostate):
"""observation operator
"""
return PDAF.omi_obs_op_gridpoint(self.i_obs, state_p, ostate)
```

#### Forward loop¶

Now, we can write code for the forward DA system:

```
!mkdir -p outputs
```

```
# create directory to
current_step = 0
# full DA system integration loop
while current_step < nsteps:
# model integration
for _ in range(steps_for):
field = step(field)
current_step += 1
# PDAF does assimilation
collector = PDAF_collector(nx, ny, field)
obs = Obs(1)
collector.reset_ens_index()
for i in range(dim_ens):
# with pipes() as (out, err):
status = PDAF.omi_put_state_global(collector.collect_state,
obs.init_dim, obs.op,
collector.prepostprocess)
# print (out.read())
# PDAF distribute analysis back to model
distributor = PDAF_distributor(nx, ny, dim_ens)
distributor.reset_ens_index()
for i in range(dim_ens):
# with pipes() as (out, err):
# here, the distributor does not call init_process function at all
steps_for, time, doexit, status = PDAF.get_state(steps_for, doexit,
distributor.next_observation,
distributor.distribute_state,
distributor.initial_process,
status)
# print (out.read())
field = distributor.field
```

Forecast RMS error according to sampled variance: 0.5434035532444028 PDAF --- duration of forecast phase: 0.180 s PDAF Call pre-post routine after forecast; step 2 PDAF --- duration of prestep: 0.011 s PDAF Analysis ------------------------------------------------------- PDAF 2 Assimilating observations - ESTKF PDAFomi --- Use process-local observations for global filters PDAFomi --- Number of full observations 28 PDAF --- PE-domain 0 dimension of observation vector 28 PDAF Perform ensemble transformation PDAF --- use symmetric square-root of A PDAF --- Compute deterministic Omega PDAF --- Ensemble update: use blocking with size 200 PDAF --- update duration: 0.001 s PDAF Call pre-post routine after analysis step Analysis RMS error according to sampled variance: 0.12016145640351501 PDAF --- duration of poststep: 0.008 s PDAF Forecast ------------------------------------------------------- PDAF Evolve state ensemble PDAF --- duration of forecast phase: 0.001 s PDAF Call pre-post routine after forecast; step 4 Forecast RMS error according to sampled variance: 0.12016145640351501 PDAF --- duration of prestep: 0.010 s PDAF Analysis ------------------------------------------------------- PDAF 4 Assimilating observations - ESTKF PDAFomi --- Use process-local observations for global filters PDAFomi --- Number of full observations 28 PDAF --- PE-domain 0 dimension of observation vector 28 PDAF Perform ensemble transformation PDAF --- use symmetric square-root of A Analysis RMS error according to sampled variance: 0.08690759785000984 PDAF --- Compute deterministic Omega PDAF --- Ensemble update: use blocking with size 200 PDAF --- update duration: 0.001 s PDAF Call pre-post routine after analysis step PDAF --- duration of poststep: 0.009 s PDAF Forecast ------------------------------------------------------- PDAF Evolve state ensemble PDAF --- duration of forecast phase: 0.001 s PDAF Call pre-post routine after forecast; step 6 Forecast RMS error according to sampled variance: 0.08690759785000983 PDAF --- duration of prestep: 0.007 s PDAF Analysis ------------------------------------------------------- PDAF 6 Assimilating observations - ESTKF PDAFomi --- Use process-local observations for global filters PDAFomi --- Number of full observations 28 PDAF --- PE-domain 0 dimension of observation vector 28 PDAF Perform ensemble transformation PDAF --- use symmetric square-root of A PDAF --- Compute deterministic Omega PDAF --- Ensemble update: use blocking with size 200 PDAF --- update duration: 0.001 s PDAF Call pre-post routine after analysis step Analysis RMS error according to sampled variance: 0.0715325428457624 PDAF --- duration of poststep: 0.007 s PDAF Forecast ------------------------------------------------------- PDAF Evolve state ensemble PDAF --- duration of forecast phase: 0.000 s PDAF Call pre-post routine after forecast; step 8 Forecast RMS error according to sampled variance: 0.0715325428457624 PDAF --- duration of prestep: 0.007 s PDAF Analysis ------------------------------------------------------- PDAF 8 Assimilating observations - ESTKF PDAFomi --- Use process-local observations for global filters PDAFomi --- Number of full observations 28 PDAF --- PE-domain 0 dimension of observation vector 28 PDAF Perform ensemble transformation PDAF --- use symmetric square-root of A PDAF --- Compute deterministic Omega PDAF --- Ensemble update: use blocking with size 200 PDAF --- update duration: 0.001 s PDAF Call pre-post routine after analysis step Analysis RMS error according to sampled variance: 0.062203994612229155 PDAF --- duration of poststep: 0.006 s PDAF Forecast ------------------------------------------------------- PDAF Evolve state ensemble PDAF --- duration of forecast phase: 0.000 s PDAF Call pre-post routine after forecast; step 10 Forecast RMS error according to sampled variance: 0.062203994612229155 PDAF --- duration of prestep: 0.007 s PDAF Analysis ------------------------------------------------------- PDAF 10 Assimilating observations - ESTKF PDAFomi --- Use process-local observations for global filters PDAFomi --- Number of full observations 28 PDAF --- PE-domain 0 dimension of observation vector 28 PDAF Perform ensemble transformation PDAF --- use symmetric square-root of A PDAF --- Compute deterministic Omega PDAF --- Ensemble update: use blocking with size 200 PDAF --- update duration: 0.002 s PDAF Call pre-post routine after analysis step Analysis RMS error according to sampled variance: 0.05577468320086351 PDAF --- duration of poststep: 0.007 s PDAF Forecast ------------------------------------------------------- PDAF Evolve state ensemble PDAF --- duration of forecast phase: 0.000 s PDAF Call pre-post routine after forecast; step 12 Forecast RMS error according to sampled variance: 0.055774683200863526 PDAF --- duration of prestep: 0.008 s PDAF Analysis ------------------------------------------------------- PDAF 12 Assimilating observations - ESTKF PDAFomi --- Use process-local observations for global filters PDAFomi --- Number of full observations 28 PDAF --- PE-domain 0 dimension of observation vector 28 PDAF Perform ensemble transformation PDAF --- use symmetric square-root of A PDAF --- Compute deterministic Omega PDAF --- Ensemble update: use blocking with size 200 PDAF --- update duration: 0.001 s PDAF Call pre-post routine after analysis step Analysis RMS error according to sampled variance: 0.05099940102485418 PDAF --- duration of poststep: 0.007 s PDAF Forecast ------------------------------------------------------- PDAF Evolve state ensemble PDAF --- duration of forecast phase: 0.001 s PDAF Call pre-post routine after forecast; step 14 Forecast RMS error according to sampled variance: 0.05099940102485418 PDAF --- duration of prestep: 0.008 s PDAF Analysis ------------------------------------------------------- PDAF 14 Assimilating observations - ESTKF PDAFomi --- Use process-local observations for global filters PDAFomi --- Number of full observations 28 PDAF --- PE-domain 0 dimension of observation vector 28 PDAF Perform ensemble transformation PDAF --- use symmetric square-root of A PDAF --- Compute deterministic Omega PDAF --- Ensemble update: use blocking with size 200 PDAF --- update duration: 0.001 s PDAF Call pre-post routine after analysis step Analysis RMS error according to sampled variance: 0.04727280481992752 PDAF --- duration of poststep: 0.006 s PDAF Forecast ------------------------------------------------------- PDAF Evolve state ensemble PDAF --- duration of forecast phase: 0.000 s PDAF Call pre-post routine after forecast; step 16 Forecast RMS error according to sampled variance: 0.04727280481992752 PDAF --- duration of prestep: 0.007 s PDAF Analysis ------------------------------------------------------- PDAF 16 Assimilating observations - ESTKF PDAFomi --- Use process-local observations for global filters PDAFomi --- Number of full observations 28 PDAF --- PE-domain 0 dimension of observation vector 28 PDAF Perform ensemble transformation PDAF --- use symmetric square-root of A PDAF --- Compute deterministic Omega PDAF --- Ensemble update: use blocking with size 200 PDAF --- update duration: 0.000 s PDAF Call pre-post routine after analysis step Analysis RMS error according to sampled variance: 0.0442597907902762 PDAF --- duration of poststep: 0.007 s PDAF Forecast ------------------------------------------------------- PDAF Evolve state ensemble PDAF --- duration of forecast phase: 0.000 s PDAF Call pre-post routine after forecast; step 18 Forecast RMS error according to sampled variance: 0.0442597907902762 PDAF --- duration of prestep: 0.006 s PDAF Analysis ------------------------------------------------------- PDAF 18 Assimilating observations - ESTKF PDAFomi --- Use process-local observations for global filters PDAFomi --- Number of full observations 28 PDAF --- PE-domain 0 dimension of observation vector 28 PDAF Perform ensemble transformation PDAF --- use symmetric square-root of A PDAF --- Compute deterministic Omega PDAF --- Ensemble update: use blocking with size 200 PDAF --- update duration: 0.001 s PDAF Call pre-post routine after analysis step Analysis RMS error according to sampled variance: 0.041757942674833924 PDAF --- duration of poststep: 0.007 s PDAF Forecast ------------------------------------------------------- PDAF Evolve state ensemble

### Does analysis look better than forecast?¶

```
import matplotlib.pyplot as plt
from matplotlib import animation
import numpy as np
from IPython.display import HTML
dim_ens=9
ny, nx = 18, 36
# define diagnotics and model fields
spread = {'fcst': np.zeros(9), 'ana': np.zeros(9)}
RMSE = {'fcst': np.zeros(9), 'ana': np.zeros(9)}
field = {'truth': np.zeros((ny, nx)),
'fcst': np.zeros((dim_ens, ny, nx)),
'ana': np.zeros((dim_ens, ny, nx))
}
for key in spread:
spread[key][:] = np.nan
RMSE[key][:] = np.nan
# time
time = np.arange(2, 20, 2)
# get figure
fig = plt.figure('err')
w, h = fig.get_size_inches()
fig.set_size_inches(2*w, 2*h)
# define the time series plot
ax = fig.add_subplot(212)
ax.set_title('Time series of the ensemble spread and RMSE')
ax.set_ylim([0., 1.2])
ax.set_xlim([time[0] - 1, time[-1] + 1])
lines = []
for key, c in zip(spread, ['k', 'r']):
line, = ax.plot(time, spread[key], color=c, linestyle='dashed',label=f'{key} spread')
lines.append(line)
line, = ax.plot(time, RMSE[key], color=c, linestyle='solid',label=f'{key} RMSE')
lines.append(line)
ax.legend()
# define pcolormesh plots
ax = {'fcst': fig.add_subplot(221), 'ana': fig.add_subplot(222),}
pc = dict()
for key in ax:
pc[key] = ax[key].pcolormesh(field[key].mean(0) - field['truth'],
cmap='coolwarm', vmin=-.06, vmax=.06)
fig.colorbar(pc[key], ax=ax[key])
def draw_error(i):
"""Draw error at each analysis time step
"""
field['truth'] = np.loadtxt(f'inputs_online/true_step{i}.txt')
for j in range(1, dim_ens + 1):
field['fcst'][j-1] = np.loadtxt(f'outputs/ens_{j}_step{i}_for.txt')
field['ana'][j-1] = np.loadtxt(f'outputs/ens_{j}_step{i}_ana.txt')
for j, key in enumerate(spread):
spread[key][i//2 - 1] = field[key].std(0).mean()
RMSE[key][i//2 - 1] = np.sqrt(np.mean((field[key].mean(0) - field['truth'])**2))
lines[2*j].set_ydata(spread[key])
lines[2*j + 1].set_ydata(RMSE[key])
for key in ax:
ax[key].set_title(f'{key} error ({np.round(RMSE[key][i//2 - 1], 3)})')
pc[key].set_array(field[key].mean(0) - field['truth'])
return pc['fcst'], pc['ana'], *lines
# make an animation
anim = animation.FuncAnimation(fig, draw_error, frames=time, interval=1000, blit=True)
plt.close(fig)
HTML(anim.to_html5_video())
```

```
```