Changes between Version 3 and Version 4 of AdaptParallelization


Ignore:
Timestamp:
Aug 23, 2010, 12:01:36 PM (14 years ago)
Author:
lnerger
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AdaptParallelization

    v3 v4  
    33Like many numerical models, PDAF uses the MPI standard for the parallelization. In the description below, we assume that the model is parallelized using MPI.
    44
    5 PDAF supports a 2-level parallelization: First, the numerical model can be parallelized and can be executed using several processors. Second, several model tasks can be computed in parallel, i.e. a parallel ensemble integration can be performed. This 2-level parallelization has to be initialized before it can be used. The example in `testsuite/dummymodel_1D/` includes the file `init_parallel_pdaf.F90` that can be used as a template for the initialization. The required variables are defined in `testsuite/main/mod_parallel.F90`. If the numerical model itself is parallelized, this parallelization has to be adapted and modified for the 2-level parallelization of the data assimilation system generated by adding PDAF to the model. The necessary steps are described below.
     5PDAF supports a 2-level parallelization: First, the numerical model can be parallelized and can be executed using several processors. Second, several model tasks can be computed in parallel, i.e. a parallel ensemble integration can be performed. This 2-level parallelization has to be initialized before it can be used. The example in `testsuite/dummymodel_1D/` includes the file `init_parallel_pdaf.F90` that can be used as a template for the initialization. The required variables are defined in `testsuite/main/mod_parallel.F90`, which can also be used as a template. If the numerical model itself is parallelized, this parallelization has to be adapted and modified for the 2-level parallelization of the data assimilation system generated by adding PDAF to the model. The necessary steps are described below.
    66
    77
     
    1414 * `COMM_couple` - defines the processes that are involved when data are transferred between the model and the filter
    1515
    16 The parallel region of an MPI parallel program is initialized by calling `MPI_init`.  By calling `MPI_init`, the communicator `MPI_COMM_WORLD` is initialized. This communicator is pre-defined by MPI to contain all processes of the MPI-parallel program. Often it is sufficient to conduct all parallel communication using only `MPI_COMM_WORLD`. Thus, numerical models often use only this communicator to control all communication. However, as `MPI_COMM_WORLD` contains all processes of the program, this approach will not allow for parallel model tasks. In order to allow parallel model tasks, it is required to replace `MPI_COMM_WORLD` by an alternative communicator that is split for the model tasks. We will denote this communicator `COMM_model`.
     16The parallel region of an MPI parallel program is initialized by calling `MPI_init`.  By calling `MPI_init`, the communicator `MPI_COMM_WORLD` is initialized. This communicator is pre-defined by MPI to contain all processes of the MPI-parallel program. Often it is sufficient to conduct all parallel communication using only `MPI_COMM_WORLD`. Thus, numerical models often use only this communicator to control all communication. However, as `MPI_COMM_WORLD` contains all processes of the program, this approach will not allow for parallel model tasks. In order to allow parallel model tasks, it is required to replace `MPI_COMM_WORLD` by an alternative communicator that is split for the model tasks. We will denote this communicator `COMM_model`. If a model code already uses a communicator distinct from `MPI_COMM_WORLD`, it should be possible to use that communicator.
    1717
    1818== Using COMM_model ==
     
    2727Subsequently, one can define `COMM_model` by adding
    2828{{{
    29       COMM_MODEL = MPI_COMM_WORLD
     29      COMM_model = MPI_COMM_WORLD
    3030}}}
    31 In addition, the variable COMM_MODEL has to be declared in a way such that all routines using the communicator can access it. The parallelization variables of the model are frequently hold in a module. In this case, it is easiest to add COMM_MODEL as an integer variable here. (If the model itself is not parallelized, one has to add the call to MPI_Init ot the program, e.g. to the routine init_parallel_pdaf, if the parallelization features of PDAF should be used.)
    32 Having defined the communicator COMM_model, the communicator MPI_COMM_WORLD has to be replaced by COMM_MODEL in all routines that perform MPI communication.
    33 These changes should not influence the model itself. Thus, after these changes, one should ensure that the model compiles and runs correctly.
     31In addition, the variable `COMM_model` has to be declared in a way such that all routines using the communicator can access it. The parallelization variables of the model are frequently hold in a module. In this case, it is easiest to add `COMM_model` as an integer variable here.  (The example declares `COMM_model` and other parallelization-related variables in `mod_parallel.F90`)
    3432
    35 Having replaced MPI_COMM_WORLD by COMM_MODEL enables to split the model integration into parallel model tasks. For this, the communicator COMM_MODEL has to be redefined. This is performed by the routine init_parallel_init, which is supplied with the PDAF package. The routine should be added to the model usually directly after the initialization of the parallelization described above.
    36 The routine init_parallel_pdaf also defines a communicator COMM_FILTER, which includes the processes that perform the analysis step of the filter, and COMM_couple, which couples the processes in COMM_FILTER with those in COMM_MODEL. The provided routine init_paralllel_init is a template implementation. Thus, it has to be adjusted for the model under consideration. In particular one needs to ensure that the routine knows the variables COMM_MODEL as well as 'rank' and 'size' (See the initialization example above. These variables might have different names in a model). This can be done by adding a USE statement in init_parallel_pdaf, if the model defines these variables in a module.
    37 The routine init_parallel_pdaf splits the communicator MPI_COMM_WORLD and defines a set of communicator COMM_MODEL. In addition, the variables npes_world and mype_world are defined. If the model uses different names for these quentities, like 'rank' and 'size', the model-internal variables should be re-initialized at the end of init_parallel_pdaf.
    38 The routine defined several more variables. In the example implementation these are declared and held in the model mod_parallel. It would be useful to use this module with the model code as some of these variables are required when the initialization routine of PDAF (PDAF_init) is called.
     33Having defined the communicator `COMM_model`, the communicator `MPI_COMM_WORLD` has to be replaced by `COMM_model` in all routines that perform MPI communication, except in calls to `MPI_init`, `MPI_finalize`, and `MPI_abort`.
     34The changes described by now must not influence the execution of the model itself. Thus, after these changes, one should ensure that the model compiles and runs correctly.
    3935
    40 This completes the adaptation of the parallelization. The compilation of the model has to be adjusted for the added files holding the routine init_parallel_pdaf and the model mod_parallel. One can test the extension by running the compiled model. It should run as without these changes, as the mod_parallel defines by default that a single model task is executed. To test parallel model tasks one has to set the variable n_modeltasks to a value larger than one. Now, the model will execute parallel model tasks. This can result in the following effects:
    41 - The standard screen output of the model can by shown multiple times. This is due to the fact that often the process with rank=0 perform screen output. By splitting the communicator COMM_MODEL, there will be as many processes with rank 0 as there are model tasks. In addition, each model task might write file output. This can lead to the case that several processes try to generate the same file or try to write into the same file. In the extreme case this can result in a program crash. For this resong, it might be useful to restrict the file output to a single model task. This might be implemented using the variable 'task_id' tha tis initialized by init_parallel_pdaf and hold the index of the model task ranging from 1 to n_modeltasks. (For the ensemble assimilation, it can be useful to switch off the regular file output of the model completely. As each model tasks holds only a single member of the ensemble, this output might not be useful. In this case,  the file output for the state estimate and perhaps all ensemble members should be done in the pre/poststep routine of the assimilation system.)
     36== Initializing the communicators ==
     37
     38Having replaced `MPI_COMM_WORLD` by `COMM_model` enables to split the model integration into parallel model tasks. For this, the communicator `COMM_model` has to be redefined. This is performed by the routine `init_parallel_init`, which is supplied with the PDAF package. The routine should be added to the model usually directly after the initialization of the parallelization described above.
     39The routine `init_parallel_pdaf` also defines the communicators `COMM_filter` and `COMM_couple` that were described above. The provided routine `init_paralllel_init` is a template implementation. Thus, it has to be adjusted for the model under consideration. In particular one needs to ensure that the routine can access the variables `COMM_model` as well as `rank` and `size` (See the initialization example above. These variables might have different names in a model). If the model defines these variables in a module, a USE statement can be added to `init_parallel_pdaf` as is already done for `mod_parallel`.
     40
     41The routine `init_parallel_pdaf` splits the communicator `MPI_COMM_WORLD` and (re-)defines `COMM_model`. If multiple parallel model tasks are used, by setting `n_modeltasks` to a value above 1, `COMM_model` will actually be a set of communicators with one for each model task. In addition, the variables `npes_world` and `mype_world` are defined. If the model uses different names for these quantities, like `rank` and `size`, the model-specific variables should be re-initialized at the end of `init_parallel_pdaf`.
     42The routine defines several more variables that are declared and held in the module `mod_parallel`. It can be useful to use this module with the model code as some of these variables are required when the initialization routine of PDAF (`PDAF_init`) is called.
     43
     44== Compiling the extended program ==
     45
     46This completes the adaptation of the parallelization. The compilation of the model has to be adjusted for the added files holding the routine `init_parallel_pdaf` and the module `mod_parallel`. One can test the extension by running the compiled model. It should run as without these changes, because `mod_parallel` defines by default that a single model task is executed (`n_modeltasks=1`).
     47
     48To test parallel model tasks one has to set the variable `n_modeltasks` to a value larger than one. Now, the model will execute parallel model tasks. This can result in the following effects:
     49 * The standard screen output of the model can by shown multiple times. This is due to the fact that often the process with `rank=0` performs screen output. By splitting the communicator `COMM_model`, there will be as many processes with rank 0 as there are model tasks.
     50 * Each model task might write file output. This can lead to the case that several processes try to generate the same file or try to write into the same file. In the extreme case this can result in a program crash. For this reason, it might be useful to restrict the file output to a single model task. This can be implemented using the variable `task_id`, which is initialized by `init_parallel_pdaf` and holds the index of the model task ranging from 1 to `n_modeltasks`. (For the ensemble assimilation, it can be useful to switch off the regular file output of the model completely. As each model tasks holds only a single member of the ensemble, this output might not be useful. In this case,  the file output for the state estimate and perhaps all ensemble members should be done in the pre/poststep routine of the assimilation system.)
     51
     52== Implementation for non-parallel models ==