= First Steps with PDAF =

[[PageOutline(2-3,Contents of this page)]]

When you have downloaded PDAF, a good starting point is to run a
tutorial example. The directory tutorial/ contains files for tutorials
demonstrating the implementation and application of PDAF with a simple 2-dimensional example.

Here we describe the steps for a Linux-based computer.

== First Test Case - A Single Analysis Step == 

In this test case, a data assimilation program is used to read an ensemble of model fields from files and compute an analysis step. This is the so-called ''offline coupled mode'' of PDAF. A full description of this test case is provided in the [wiki:PdafTutorial implementation tutorial for the offline mode of PDAF].

=== Compiling ===

We recommend to first look at the tutorial `offline_2D_serial`, which is a
single analysis step acting on an ensemble of 2D model fields without any
parallelization.

Change to the tutorial directory with
{{{
cd tutorial/offline_2D_serial
}}}
and run
{{{
make PDAF_ARCH=linux_gfortran
}}}
This compiles the assimilation program including PDAF. The compilation
should work for computers running Linux. If the compilation fails, please
see below the section [#CompilationProblems Compilation Problems].

=== Running ===

Having compiled the program, you can just run it by executing
{{{
./PDAF_offline
}}}
The program reads ensemble files and a file holding the observations from the directory `tutorial/inputs_offline/`. Then, it computes a single analysis step of the ensemble Kalman filter ESTKF. Running the program should only take seconds. The program generates the output files
* `state_ana.txt` (the analysis state)
* `ens_01_ana.txt` to `ens_09_ana.txt` (the analysis ensemble). 

The screen output shows the progress of the program. For example, the ensemble standard deviation before and after the analysis step and final timing and memory information are shown. Lines starting with 'PDAF' are outputs from the core part of PDAF; other lines are from the user routines.

=== Plotting ===

You can plot using Python with, for example
{{{
import numpy as np
import matplotlib.pyplot as plt

file = 'state_ana.txt'

field = np.loadtxt(file)

plt.pcolor(field)
plt.show()
}}}

Analogously you can plot the observations (`obs.txt`) and the true state (`true.txt`) from which the observations have been generated. These files are in the input directory. In the observation file, only 28 grid points are observed, while non-observed grid points have the value -999.0. To get a meaningful plot, you can specify the color limits by
{{{
plt.clim([-1.5, 1.5])
}}}
before showing the plot. 


=== Assimilation Options ===

There are various options you can set on the command line to modify the assimilation.

For example you can run
{{{
./PDAF_offline -filtertype 7
}}}
to apply the localized filter LESTKF instead of the global ESTKF.
Without further settings, the localization radius is set to 0.0 so that only
the observed grid points are changed by the assimilation.
You can further set the localization cut-off radius with
{{{
./PDAF_offline -filtertype 7 -cradius 5.0
}}}
Now, the LESTKF is used with a localization radius of 5 grid points. This
localization uses a constant weight of the observation. So you will
see steps in the analysis fields around each observation. To add a
tapering so that observations get less influence for increasing distance,
use
{{{
./PDAF_offline -filtertype 7 -cradius 5.0 -locweight 2
}}}
Now, the filter is applied with the 5th-order polynomial function by
Gaspari and Cohn (1999). As a result you get a smoothly varying analysis field.
You can also change the ensemble size, e.g. running
{{{
./PDAF_offline -dim_ens 5
}}}
to run with an ensemble of 5 model states. (For this test case we only
prepared 9 ensemble files, so only dim_ens<=9 is possible to run here. Please note that such ensemble size is usually too low for real cases)
The standard deviation (RMS error) of the observation is set to 0.5 in the program. To change it to, e.g. 2.0, you can run
{{{
./PDAF_offline -rms_obs 2.0
}}}
Also the inflation can be specified on the command line. PDAF uses the so-called forgetting factor, which is a positive value <=1.0 (the ensemble variance is inflated by the inverse of the forgetting factor). One can specify the forgetting factor as
{{{
./PDAF_offline -forget 0.9
}}}
All the different options can be combined. For a complete list of possible options, see the file `mod_assimilate.F90`, which is the source code file in which the default values of options are declared and explained.




== Second Test Case - A Sequence of Analysis Steps ==

As a second test case, we recommend to look at the tutorial `online_2D_serialmodel`.
This case is again a simple 2D model field, but now coupled to PDAF with
time stepping. This is the so-called ''online coupling'' of PDAF, in which the model code is augmented with data assimilation functionality provided by PDAF. A full description of this test case is provided in the [wiki:PdafTutorial implementation tutorial for the online mode of PDAF].

=== Compiling ===

Change to the tutorial directory with
{{{
cd ../online_2D_serialmodel
}}}
and run
{{{
make model_pdaf PDAF_ARCH=linux_gfortran_openmpi
}}}
This compiles the assimilation program including PDAF. The compilation
should work for computers running Linux, but it requires that OpenMPI
is installed on the computer. If it is not installed, please install the Linux package providing it. If the compilation still fails,
please see below the section [#CompilationProblems Compilation Problems].

=== Running ===

Having compiled the program, you can just run it by executing
{{{
mpirun -np 5 ./model_pdaf -dim_ens 5
}}}
The program computes a sequence of 9 analysis steps with of forecase phase of 2 time steps (thus two model time steps are computed in between subsequent analysis steps). The initial ensemble files are read from the directory `tutorial/inputs_online/`, where also the observation files are stored. The assimilation uses of the
ensemble Kalman filter ESTKF. It should not take more than a few seconds.
The program generates the output files
* `state_stepX_ana.txt` (the analysis state at time step X)
* `ens_01_stepX_ana.txt` to `ens_09_stepX_ana.txt` (the analysis ensemble at time step X)
* `ens_01_stepX_for.txt` to `ens_09_stepX_for.txt` (the forecast ensemble at time step X)

=== Plotting ===

You can plot the analysis fields, but also the observations, true fields
and the initial state estimate as described in the first test case. However,
the files for the observations and true fields are stored in the directory
inputs_online/

You can plot using Python with, for example
{{{
import numpy as np
import matplotlib.pyplot as plt

file = 'state_step10_ana.txt'

field = np.loadtxt(file)

plt.pcolor(field)
plt.show()
}}}

Analogously, you can also plot the initial model field `state_ini.txt`, the true state (`true_stepX.txt`), or observations (`obs_stepX.txt`) with X the time step.


=== Assimilation Options ===

The same options as for the first test case can be used here, too. 

In addition, one can specify the
forecast length (number of time steps between two analysis steps) by
{{{
mpirun -np 9 ./model_pdaf -dim_ens 9 -delt_obs 6
}}}
To change the ensemble size to 6 states, you can use for example
{{{
mpirun -np 6 ./model_pdaf -dim_ens 6 -filtertype 7 -cradius 3.0
}}}
which chooses to run the LESTKF with a localization radius of 3 grid points.
(Please note: The value behind `-np` must always set to be equal to the value given for `-dim_ens`.
For this test case we only
prepared 9 initial ensemble files, so only dim_obs<=9 is possible to run here.)





== Compilation Problems ==

For the compilation, you need `make`. This should be installed on any computer running Linux, Unix or MacOS.
If it is missing, you cannot compile and should install `make`.

The compilation might fail with an error mentioning `blas` or `lapack`.
These are libraries for matrix computations, which are used by PDAF for
performance reasons. Both libraries are usually installed on Linux
computers. If these libraries are missing, please install them from the
Linux packages of your Linux distribution. Then compile the tutorial
example again.

For compilation on computers different from standard Linux or with a
different compiler than gfortran, the directory `make.arch/` provides
include files for the compilation. To check for a suitable include file
look into the directory `make.arch/`. There are files for
compilation on different computers with differnt variants of the parallelization library MPI. If you don't find a suitable include file, you might also copy
an existing file and edit it for your needs. To specify the include file
for the compilation, you just need to set it when running make as
{{{
make PDAF_ARCH=FILENAME_WITHOUT_.h
}}}

On MacOS there is usually no gfortran installed. You can install it, e.g., using the Homebrew (brew.sh). Likewise, you can install OpenMPI. Then setting PDAF_ARCH=osx_gfortran_openmpi should work to compile the test cases.


== Next steps ==

Having done your first experiments with PDAF, possible next steps can be the following:

=== Applying data assimilation with PDAF to the Lorenz-96 model ===

The implementation of [wiki:Lorenz_96_model PDAF with the Lorenz-96 model] is a fully-featured example of PDAF coupled to this small model. Next to the Lorenz-96 model and assimilation user routines, the example provides tools for generating synthetic observation, a covariance matrix for ensemble generation and plotting scripts. We used this implementation for different publications where we studied the behavior of different assimilation methods. 

=== Applying data assimilation with PDAF to the Lorenz-63 model ===

The implementation of [wiki:Lorenz_63_model PDAF with the Lorenz-63 model] is a fully-featured example of PDAF coupled to this small model. Next to the Lorenz-63 model and assimilation user routines, the example provides tools for generating synthetic observation, a covariance matrix for ensemble generation and plotting scripts. The modul only has 3 state variables, so it's too small for localization, but it's a good example for applying the particle filter. We used this implementation for our publication on the hybrid nonlinear-Kalman filter, LKNETF.

=== Implementing PDAF with your model ===

If you plan to couple PDAF with your model, we recommend to study the [wiki:PdafTutorial PDAF implementation tutorials]. The tutorial provides a step-by-step explanation of the implementation steps. Further, please see the description of the [wiki:GeneralImplementationConcept Implementation Concept of PDAF]. PDAF also provides tools to [wiki:EnsembleGeneration generate an ensemble].

Beyond the more applied tutorials there are also the Implementation Guids for [wiki:OnlineImplementationGuide_PDAF3 online coupled] and [wiki:OfflineImplementationGuide_PDAF3 offline coupled] data assimilation systems.

=== Model Couplings ===

Coupling routines (also called 'model bindings') to different models, and related user routines for the data assimilation with PDAF are available in different repositories of [https://github.com/PDAF]

Current model bindings include the MITgcm general circulation model, NEMO ocean model, AWI climate model and FESOM ocean model, WRF. See the page: [wiki:ModelsConnectedToPDAF List of models connected to PDAF].

=== Full overview of PDAF code Package ===

To get an overview of what is overall contained in the PDAF package, please see the [wiki:SoftwarePackage full description of the PDAF code package].