Changes between Initial Version and Version 1 of AdaptParallelizationuntilOnePointSeven


Ignore:
Timestamp:
Feb 6, 2012, 9:13:45 AM (12 years ago)
Author:
lnerger
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AdaptParallelizationuntilOnePointSeven

    v1 v1  
     1= Adapting a model's parallelization for PDAF =
     2
     3'''This version of the documentation is valid for PDAF until and including V1.7. For the most recent version see [AdaptParallelization here].'''
     4
     5{{{
     6#!html
     7<div class="wiki-toc">
     8<h4>Implementation Guide (<=V1.7)</h4>
     9<ol><li><a href="ImplementationGuideuntilOnePointSeven">Main page</a></li>
     10<li>Adaptation of the parallelization</li>
     11<li><a href="InitPdafuntilOnePointSeven">Initialization of PDAF</a></li>
     12<li><a href="ModifyModelforEnsembleIntegrationuntilOnePointSeven">Modifications for ensemble integration</a></li>
     13<li><a href="ImplementationofAnalysisStepuntilOnePointSeven">Implementation of the analysis step</a></li>
     14<li><a href="AddingMemoryandTimingInformationuntilOnePointSeven">Memory and timing information</a></li>
     15</ol>
     16</div>
     17}}}
     18
     19[[PageOutline(2-3,Contents of this page)]]
     20
     21== Overview ==
     22
     23Like many numerical models, PDAF uses the MPI standard for the parallelization. In the description below, we assume that the model is parallelized using MPI.
     24
     25PDAF supports a 2-level parallelization: First, the numerical model can be parallelized and can be executed using several processors. Second, several model tasks can be computed in parallel, i.e. a parallel ensemble integration can be performed. This 2-level parallelization has to be initialized before it can be used. The templates-directory  `templates/` contains the file `init_parallel_pdaf.F90` that can be used as a template for the initialization. The required variables are defined in `mod_parallel.F90`, which is stored int he same directory and can also be used as a template. If the numerical model itself is parallelized, this parallelization has to be adapted and modified for the 2-level parallelization of the data assimilation system generated by adding PDAF to the model. The necessary steps are described below.
     26
     27
     28== Three communicators ==
     29
     30MPI uses so-called 'communicators' to define sets of parallel processes. In order to provide the 2-level parallelism, three communicators need to be initialized that define the processes that are involved in different tasks of the data assimilation system.
     31The required communicators are initialized in the routine `init_parallel_pdaf` and called
     32 * `COMM_model` - defines the processes that are involved in the model integrations
     33 * `COMM_filter` - defines the processes that perform the filter analysis step
     34 * `COMM_couple` - defines the processes that are involved when data are transferred between the model and the filter
     35
     36The parallel region of an MPI parallel program is initialized by calling `MPI_init`.  By calling `MPI_init`, the communicator `MPI_COMM_WORLD` is initialized. This communicator is pre-defined by MPI to contain all processes of the MPI-parallel program. Often it is sufficient to conduct all parallel communication using only `MPI_COMM_WORLD`. Thus, numerical models often use only this communicator to control all communication. However, as `MPI_COMM_WORLD` contains all processes of the program, this approach will not allow for parallel model tasks. In order to allow parallel model tasks, it is required to replace `MPI_COMM_WORLD` by an alternative communicator that is split for the model tasks. We will denote this communicator `COMM_model`. If a model code already uses a communicator distinct from `MPI_COMM_WORLD`, it should be possible to use that communicator.
     37
     38== Using COMM_model ==
     39
     40Frequently the parallelization is initialized in the model by the lines:
     41{{{
     42      CALL MPI_Init(ierr)
     43      CALL MPI_Comm_Rank(MPI_COMM_WORLD, rank, ierr)
     44      CALL MPI_Comm_Size(MPI_COMM_WORLD, size, ierr)
     45}}}
     46(The call to `MPI_init` is mandatory, while the second an third line are optional) If the model itself is not parallelized, the MPI-initialization will not be present. Please see the section '[#Non-parallelmodels Non-parallel models]' below for this case.
     47
     48Subsequently, one can define `COMM_model` by adding
     49{{{
     50      COMM_model = MPI_COMM_WORLD
     51}}}
     52In addition, the variable `COMM_model` has to be declared in a way such that all routines using the communicator can access it. The parallelization variables of the model are frequently hold in a module. In this case, it is easiest to add `COMM_model` as an integer variable here.  (The example declares `COMM_model` and other parallelization-related variables in `mod_parallel.F90`)
     53
     54Having defined the communicator `COMM_model`, the communicator `MPI_COMM_WORLD` has to be replaced by `COMM_model` in all routines that perform MPI communication, except in calls to `MPI_init`, `MPI_finalize`, and `MPI_abort`.
     55The changes described by now must not influence the execution of the model itself. Thus, after these changes, one should ensure that the model compiles and runs correctly.
     56
     57== Initializing the communicators ==
     58
     59Having replaced `MPI_COMM_WORLD` by `COMM_model` enables to split the model integration into parallel model tasks. For this, the communicator `COMM_model` has to be redefined. This is performed by the routine `init_parallel_init`, which is supplied with the PDAF package. The routine should be added to the model usually directly after the initialization of the parallelization described above.
     60The routine `init_parallel_pdaf` also defines the communicators `COMM_filter` and `COMM_couple` that were described above. The provided routine `init_paralllel_init` is a template implementation. Thus, it has to be adjusted for the model under consideration. In particular one needs to ensure that the routine can access the variables `COMM_model` as well as `rank` and `size` (See the initialization example above. These variables might have different names in a model). If the model defines these variables in a module, a USE statement can be added to `init_parallel_pdaf` as is already done for `mod_parallel`.
     61
     62The routine `init_parallel_pdaf` splits the communicator `MPI_COMM_WORLD` and (re-)defines `COMM_model`. If multiple parallel model tasks are used, by setting `n_modeltasks` to a value above 1, `COMM_model` will actually be a set of communicators with one for each model task. In addition, the variables `npes_world` and `mype_world` are defined. If the model uses different names for these quantities, like `rank` and `size`, the model-specific variables should be re-initialized at the end of `init_parallel_pdaf`.
     63The routine defines several more variables that are declared and held in the module `mod_parallel`. It can be useful to use this module with the model code as some of these variables are required when the initialization routine of PDAF (`PDAF_init`) is called.
     64
     65== Arguments of `init_parallel_pdaf` ==
     66
     67The routine `init_parallel_pdaf` has two arguments, which are the following:
     68{{{
     69SUBROUTINE init_parallel_pdaf(dim_ens, screen)
     70}}}
     71 * `dim_ens`: An integer defining the ensemble size. This allows to check the consistency of the ensemble size with the number of processes of the program. If the ensemble size is specified after the call to `init_parallel_pdaf` (as in the example) it is recommended to set this argument to 0. In this case no consistency check is performed.
     72 * `screen`: An integer defining whether information output is written to the screen (i.e. standard output). The following choices are available:
     73  * 0: quite mode - no information is displayed.
     74  * 1: Display standard information about the configuration of the processes (recommended)
     75  * 2: Display detailed information for debugging
     76
     77
     78== Compiling the extended program ==
     79
     80This completes the adaptation of the parallelization. The compilation of the model has to be adjusted for the added files holding the routine `init_parallel_pdaf` and the module `mod_parallel`. One can test the extension by running the compiled model. It should run as without these changes, because `mod_parallel` defines by default that a single model task is executed (`n_modeltasks=1`). If `screen` is set to 1 in the call to init_parallel_pdaf, the standard output should include lines like
     81{{{
     82 PDAF: Initializing communicators
     83
     84                  PE configuration:
     85   world   filter     model        couple     filterPE
     86   rank     rank   task   rank   task   rank    T/F
     87  ----------------------------------------------------------
     88     0       0      1      0      1      0       T
     89     1       1      1      1      2      0       T
     90     2       2      1      2      3      0       T
     91     3       3      1      3      4      0       T
     92}}}
     93These lines show the configuration of the communicators. This example was executed using 4 processes and `n_modeltasks=1`.
     94
     95
     96To test parallel model tasks one has to set the variable `n_modeltasks` to a value larger than one. Now, the model will execute parallel model tasks. This can result in the following effects:
     97 * The standard screen output of the model can by shown multiple times. This is due to the fact that often the process with `rank=0` performs screen output. By splitting the communicator `COMM_model`, there will be as many processes with rank 0 as there are model tasks.
     98 * Each model task might write file output. This can lead to the case that several processes try to generate the same file or try to write into the same file. In the extreme case this can result in a program crash. For this reason, it might be useful to restrict the file output to a single model task. This can be implemented using the variable `task_id`, which is initialized by `init_parallel_pdaf` and holds the index of the model task ranging from 1 to `n_modeltasks`. (For the ensemble assimilation, it can be useful to switch off the regular file output of the model completely. As each model tasks holds only a single member of the ensemble, this output might not be useful. In this case,  the file output for the state estimate and perhaps all ensemble members should be done in the pre/poststep routine of the assimilation system.)
     99
     100== Non-parallel models ==
     101
     102If the numerical model is not parallelized (i.e. serial), there are two possibilities: The data assimilation system can be used without parallelization (serial), or parallel model tasks can be used in which each model task uses a single process. Both variants are described below.
     103
     104=== Serial assimilation system ===
     105
     106The data assimilation program can be compiled for serial processing without linking a real MPI library. As in the PDAF code calls to MPI functions are implemented, the file `nullmpi.F90` available in the directroy `templates` should be compiled and liked. An example for this gives the case `make.arch/linux_gfortran.h`. `nullmpi.F90` provides the functionality of the MPI functions for the case that only a single process is used and hence no real communication is performed.
     107
     108Even without parallelization, the call to `init_parallel_pdaf` described above is still required. The routine will simple initialize the parallelization variables for a single-process case.
     109
     110=== Adding parallelization to a serial model ===
     111
     112In order to use parallel model tasks with a model that is not parallelized, the procedure is generally as described for the fully parallel case. However, one has to add the general initialization of MPI to the model code (or to `init_parallel_pdaf`). This is the lines
     113{{{
     114      CALL MPI_Init(ierr)
     115      CALL MPI_Comm_Rank(MPI_COMM_WORLD, mype_world, ierr)
     116      CALL MPI_Comm_Size(MPI_COMM_WORLD, npes_world, ierr)
     117      COMM_model = MPI_COMM_WORLD
     118}}}
     119together with the `USE` statement for `mod_parallel` should be added. Subsequently, the call to `init_parallel_pdaf` has to be inserted at the beginning of the model code. At the end of the program one should insert
     120{{{
     121    CALL  MPI_Barrier(MPI_COMM_WORLD,MPIerr)
     122    CALL  MPI_Finalize(MPIerr)
     123}}}
     124The module `mod_parallel.F90` from the template directory provides subroutines for the initialization and finalization of MPI. Thus, if this module is used, the is no need to explicitly add the call to the MPI functions, but one can simply add
     125{{{
     126    CALL init_parallel()
     127}}}
     128at the beginning of the program. This has to be followed by
     129{{{
     130    CALL init_parallel_pdaf(dim_ens, screen)
     131}}}
     132to initialize the variables for the parallelization of PDAF. At the end of the program one should then insert
     133{{{
     134    CALL finalize_parallel()
     135}}}
     136in the source code.
     137
     138If the program is executed with these extensions using multiple model tasks, the issues discussed in '[#Compilingtheextendedprogram Compiling the extended program]' can occur. This one has to take care about which processes will perform output to the screen or to files.