Version 7 (modified by 6 years ago) (diff) | ,
---|
First Steps with PDAF
Contents of this page
When you have downloaded PDAF, a good starting point is to run a tutorial example. The directory tutorial/ contains files for tutorials demonstrating the implementation and application of PDAF with a simple 2-dimensional example.
First Test Case - A Single Analysis Step
In this test case, a data assimilation program is used to read an ensemble of model fields from files and compute an analysis step. This is the so-called offline coupling mode of PDAF. A full description of this test case is provided in the implementation tutorial for the offline mode of PDAF.
Compiling
We recommend to first look at the tutorial offline_2D_serial, which is a single analysis step acting on an ensemble of 2D model fields without any parallelization.
Change to the tutorial directory with
cd tutorial/offline_2D_serial
and run
make PDAF_ARCH=linux_gfortran
This compiles the assimilation program including PDAF. The compilation should work for computers running Linux. If the compilation fails, please see below the section Compilation Problems.
Running
Having compiled the program, you can just run it by executing
./PDAF_offline
The program reads ensemble files and a file holding the observations from the directory tutorial/inputs_offline/
. Then it computes a single analysis step of the ensemble Kalman filter ESTKF. Running the program should not take more than a second. The program generates the
output files
state_ana.txt
(The analysis state)
and
ens_01_ana.txt
to ens_09_ana.txt
(the analysis ensemble).
The screen output shows the progress of the program. For example the ensemble standard deviation before and after the analysis step and final timing and memory information are shown. Lines starting with 'PDAF' are outputs from the code part of PDAF other lines are from the user routines.
Plotting
To plot, e.g. the analysis field you can use Matlab or Octave and do
load state_ana.txt pcolor(state_ana)
You can also plot the initial ensemble mean field by
cd ../inputs_offline load state_ini.txt pcolor(state_ini)
Analogously you can plot the observations (obs.txt
) and the true state (true.txt
) from which the observations have been generated. In the
observation file, only 28 grid points are observed, while non-observed grid points have the value -999.0. To get a meaningful plot, you can specify the limits for the color map by
set(gca,'clim',[-2 2])
Alternatively, you can plot using Python with e.g.
import numpy as np import matplotlib.pyplot as plt file = 'state_ana.txt' field = np.loadtxt(file) plt.pcolor(field) plt.show()
Assimilation Options
There are various options you can set to modify the assimilation, For example you can run
./PDAF_offline -filtertype 7
With this setting, the localized filter LESTKF instead of the global ESTKF. Without further settings, the localization radius is set to 0 so that only the observed grid points are changed by the assimilation. You can further set the localization radius with
./PDAF_offline -filtertype 7 -local_range 5
Now the LESTKF is used with a localization radius of 5 grid points. This localization still uses a constant weight of the observation. So you will see steps in the analysis fields around each observation. To add a tapering so that observations get less influence for increasing distance, use
./PDAF_offline -filtertype 7 -local_range 5 -locweight 2
Now, the filter is applied with the 5th-order polynomial function by Gaspari and Cohn. As a result you get a smoothly changing analysis field. You can also change the ensemble size, e.g. running
./PDAF_offline -dim_ens 5
to run with an ensemble of 5 model states. (For this test case we only prepared 9, so only dim_obs<=9 is possible to run here.) The standard deviation (RMS error) of the observation is set to 0.5 in the program. To change it to, e.g. 2.0, would would run
./PDAF_offline -rms_obs 2.0
Also the inflation can be specified on the command line. PDAF uses the so-called forgetting factor, which is a positive value <=1 (the ensemble variance is influted by the inverse of the forgetting factor). One can specify the forgetting factor as
./PDAF_offline -forget 0.9
All the different options can be combined. For a complete list of possible options, see the file init_pdaf_offline.F90
, which is the source code file in which the default values of options are specified.
Second Test Case - A Sequence of Analysis Steps
As a second test case, we recommend to look at the tutorial online_2D_serialmodel. This case is again a simple 2D model field, but now coupled to PDAF with time stepping. This is the so-called online-coupling of PDAF, in which the model code is augmented with data assimilation functionality provided by PDAF. A full description of this test case is provided in the implementation tutorial for the online mode of PDAF.
Compiling
Change to the tutorial directory with
cd ../online_2D_serialmodel
and run
make cleanall
and then
make model_pdaf PDAF_ARCH=linux_gfortran_openmpi
This compiles the assimilation program including PDAF. The compilation should work for computers running Linux, but it requires that OpenMPI is installed on the computer. If it's not installed, please install it using the Linux package providing it. If the compilation still fails, please see below the section Compilation Problems.
Running
Having compiled the program, you can just run it by executing
mpirun -np 9 ./model_pdaf -dim_ens 9
The program computes a sequence of 9 analysis steps with two model time
steps in between subsequent analysis steps. The initial ensemble are read from the directory tutorial/inputs_online/
, where also the observation files are stored. The assimilation uses of the
ensemble Kalman filter ESTKF. It should not take more than a few seconds.
The program generates the
output files
state_stepX_ana.txt
(The analysis state at time step X)
and
ens_01_stepX_ana.txt
to ens_09_stepX_ana.txt
(the analysis ensemble at time step X)
ens_01_stepX_for.txt to
ens_09_stepX_for.txt` (the forecast ensemble at time step X)
Plotting
You can plot the analysis fields, but also the observations, true fields and the initial state estimate as described in the first test case. However, the files for the observations and true fields are stored in the directory inputs_online/
To plot the analysis field at time step 10, you can do
load state_step10_ana.txt pcolor(state_step10_ana)
You can also plot the initial model field by
cd ../inputs_online load state_ini.txt pcolor(state_ini)
The directory inputs_online/
also contains files for the true state at time steps 1 to 18.
For example, you can plot the true state at time step 15, with
load true_step15.txt pcolor(true_step15)
Analogously you can plot the observations (obs_stepX.txt
) with time step X. In the
observation file, observation gaps are indiced by the value -999.0. So
to get a meaningful plot, you can specify the limits for the color map by
set(gca,'clim',[-2 2])
Assimilation Options
The same options as for the first test case can be used here, too. In addition, one can specify the forecast length (number of time steps between two analysis steps by
mpirun -np 9 ./model_pdaf -dim_ens 9 -delt_obs 6
To change the ensemble size to 6 states, you can use for example
mpirun -np 6 ./model_pdaf -dim_ens 6 -filtertype 5 -local_range 3
which chooses to run the LETKF with a localization radius of 3 grid points.
(Please note: The value behind -np
must always set to be equal to the value given for -dim_ens
.
For this test case we only
prepared 9 initial ensemble files, so only dim_obs<=9 is possible to run here.)
Compilation Problems
For the compilation, you need make
. This should be installed on any computer running Linux, Unix or OSX.
If it is missing, you cannot compile and should install make
.
The compilation might fail with an error mentioning blas
or lapack
.
These are libraries for matrix computations, which are used by PDAF for
performance reasons. Both libraries are usually installed on Linux
computers. If these libraries are missing, please install them from the
Linux packages of your Linux distribution. Then compile the tutorial
example again.
For compilation on computers different from standard Linux or with a
different compiler than gfortran, the directory make.arch/
provides
include files for the compilation. To check for a suitable include file
koos into the directory make.arch/
. There are files for
compilation on different computers both with and without parallelization
(MPI). If you don't find a suitable include file, you might also copy
an existing one and edit it for your needs. To specify the include file
for the compilation, you just need to set it when running make as
make PDAF_ARCH=FILENAME_WITHOUT_.h
On MacOS there is usually no gfortran installed. You can install it using the ports of Linux software from Fink or MacPorts?. Likewise, you can install OpenMPI. Then setting PDAF_ARCH=osx_gfortran or PDAF_ARCH=osx_gfortran_openmpi should work to compile the test cases.
Next steps
Possible next steps from here can be the following:
Implementing PDAF with your model
If you plan to couple PDAF with your model, we recommend to study the PDAF implementation tutorials. The tutorial provides a step-by-step explanation of the implementation steps. Further, please see the description of the Implementation Concept of PDAF. Further, PDAF provides tools to generate an ensemble.
Model Bindings
With PDAF 1.13, the PDAF package provides a model binding for the MITgcm general circulation model. This code provides an implementation of PDAF with the MITgcm model for a simple test case. Please look into the directory modelbindings/MITgcm/
of the PDAF package, where a README-file describes the use of the model binding code.
Full overview of PDAF code Package
To get an overview of what is overall contain in the PDAF package, please see the full description of the PDAF code package.