Changes between Version 36 and Version 37 of AdaptParallelization


Ignore:
Timestamp:
May 20, 2025, 4:56:30 PM (12 days ago)
Author:
lnerger
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AdaptParallelization

    v36 v37  
    4545 * If no, then take note of the name of the communicator variable (we assume it's `COMM_mymodel`)
    46463. Insert the call `init_parallel_pdaf` directly after `MPI_init`.
    47 4. Adapt `init_parallel_pdaf` so that at the end of this routine you set `COMM_mymodel=COMM_model`. Potentially, also replace the rank and size variables, respectively, by `mype_model` and `npes_model`.
     474. Adapt `init_parallel_pdaf` so that at the end of this routine you set `COMM_mymodel=COMM_model`. Potentially, also set the rank and size variables of the model, respectively, by `mype_model` and `npes_model`.
    48485. The number of model tasks in variable `n_modeltasks` is required by `init_parallel_pdaf` to perform commucator splitting. In the tutorial code we added a command-line parsing to set the variable (it is parsing for `dim_ens`). One could also read the value from a configuration file.
    4949
     
    5151=== Details on adapting a parallelized model ===
    5252
    53 
    54 
    5553Any program parallelized with MPI will need to call `MPI_Init` for the initialziation of MPI. Frequently, the parallelization of a model is initialized in the model by the lines:
    5654{{{
    5755      CALL MPI_Init(ierr)
    58       CALL MPI_Comm_Rank(MPI_COMM_WORLD, rank, ierr)
    59       CALL MPI_Comm_Size(MPI_COMM_WORLD, size, ierr)
     56      CALL MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
     57      CALL MPI_Comm_size(MPI_COMM_WORLD, size, ierr)
    6058}}}
    61 Here, the call to `MPI_init` is mandatory, while the two other lines are optional, but common.
     59Here, the call to `MPI_init` is mandatory, while the two other lines are optional, but common. The call to `MPI_init` initializes the parallel region of an MPI-parallel program. This call initializes the communicator `MPI_COMM_WORLD`, which is pre-defined by MPI to contain all processes of the MPI-parallel program.
    6260
    63 In the model code one has to find the place where `MPI_init` is called to check how the parallelization is set up. In particular we have to check if parallelization is ready to be split into model tasks
     61In the model code, we have to find the place where `MPI_init` is called to check how the parallelization is set up. In particular, we have to check if the parallelization is ready to be split into model tasks.
     62For this one has to check if `MPI_COMM_WORLD`, e.g. checking the calls to `MPI_Comm_rank` and `MPI_Comm_size`, or MPI communication calls in the code (e.g. MPI_Send, MPI_Recv, MPI_Barrier).
     63* If the model uses `MPI_COMM_WORLD` we have to replace this by a user-defined communicator, e.g `COMM_mymodel`. One has to declare this variable in a module and initialize it as `COMM_mymodel=MPI_COMM_WORLD`. This change must not influence the execution of the model. It can be useful to do a test run to check for this.
     64* If the model uses a different communicator, one should take note of its name (below we refer to it as `COMM_mymodel`). We will then overwrite it to be able to represent the ensemble of model tasks.
    6465
    65 The call to `MPI_init` initialized the parallel region of an MPI-parallel program. This call initializes the communicator `MPI_COMM_WORLD`, which is pre-defined by MPI to contain all processes of the MPI-parallel program. Often models, use only this communicator to control all MPI communication. However, as `MPI_COMM_WORLD` contains all processes of the program, this approach will not allow for parallel model tasks.
     66Now, `COMM_mymodel` will be replaced by the communicator that represent the ensemble of model tasks. For this we
     67* Adapt `init_parallel_pdaf` so that at the end of this routine we set `COMM_mymodel=COMM_model`. Potentially, also set the rank and size variables of the model, respectively, by `mype_model` and `npes_model`.
    6668
    67 Next, one has to check whether the model
     69Finally, we have a ensure that the number of model tasks is correctly set. In the template and tutorial codes, the number of model tasks is specified by the variable `n_modeltasks`. This has to be set before the operations on communicators are done in `init_parallel_pdaf`. In the tutorial code we added a command-line parsing to set the variable (it is parsing for `dim_ens`). One could also read the value from a configuration file.
    6870
    69 In order to allow parallel model tasks, it is required to replace `MPI_COMM_WORLD` by an alternative communicator that is split for the model tasks. We will denote this communicator `COMM_model`. If a model code already uses a communicator distinct from `MPI_COMM_WORLD`, it should be possible to use that communicator.
     71If the program is executed with these extensions using multiple model tasks, the issues discussed in '[#Compilingtheextendedprogram Compiling the extended program]' can occur. This one has to take care about which processes will perform output to the screen or to files.
    7072
    7173
     74== Adaptions for a serial model ==
    7275
    73 Subsequently, one can define `COMM_model` by
     76If the numerical model is not parallelized (i.e. serial), we need to add the parallelization for the ensemble. We follow here the approach used in the tutorial code `tutorial/online_2D_serialmodel`. The files `mod_parallel_pdaf.F90`, `init_parallel_pdaf.F90`, `finalize_pdaf.F90` (and `parser_mpi.F90`) can be directly used. Note, the `init_parallel_pdaf` uses command-line parsing (from `parser_mpi.F90`) to read the number of model tasks (`n_modeltasks`) from the command line (specifying `-ens N_MODELTASKS`, where N_MODELTASKS should be the ensmeble size). One can replace this, e.g., by reading from a configuration file.
     77
     78In the tutorial code, the parallelization is simply initialized by adding the line
    7479{{{
    75       COMM_model = MPI_COMM_WORLD
     80  CALL init_parallel_pdaf(0, 1)
    7681}}}
    77 In addition, the variable `COMM_model` has to be declared in a way such that all routines using the communicator can access it. The parallelization variables of the model are frequently held in a Fortran module. In this case, it is easiest to add `COMM_model` as an integer variable here.  (The tutorial declares `COMM_model` and other parallelization-related variables in `mod_parallel.F90`)
     82into the source code of the main program. This is done at the very beginning of the functional code part.
     83The initialization of MPI itself (by a call to `MPI_Init`) is included in `init_parallel_pdaf`.
    7884
    79 Having defined the communicator `COMM_model`, the communicator `MPI_COMM_WORLD` has to be replaced by `COMM_model` in all routines that perform MPI communication, except in calls to `MPI_init`, `MPI_finalize`, and `MPI_abort`.
    80 The changes described by now must not influence the execution of the model itself. Thus, after these changes, one can run the model to ensure that the model compiles and runs correctly.
     85The finalization of MPI is included in `finalize_pdaf.F90` by a call to `finalize_parallel()`. The line
     86{{{
     87  CALL finalize_pdaf()
     88}}}
     89should be inserted at the end of the main program.
     90
     91With theses changes the model is ready to perform an ensemble simulation.
     92
     93If the program is executed with these extensions using multiple model tasks, the issues discussed in '[#Compilingtheextendedprogram Compiling the extended program]' can occur. This one has to take care about which processes will perform output to the screen or to files.
    8194
    8295
    83 == Three communicators ==
     96== Further Information ==
     97
     98=== Communicators created by `init_parallel_pdaf` ===
    8499
    85100MPI uses so-called 'communicators' to define groups of parallel processes. These groups can then conveniently exchange information. In order to provide the 2-level parallelism for PDAF, three communicators are initialized that define the processes that are involved in different tasks of the data assimilation system.
    86 The required communicators are initialized in the routine `init_parallel_pdaf`. There are called
     101The required communicators are initialized in the routine `init_parallel_pdaf`. They are called
    87102 * `COMM_model` - defines the groups of processes that are involved in the model integrations (one group for each model task)
    88103 * `COMM_filter` - defines the group of processes that perform the filter analysis step
    89  * `COMM_couple` - defines the groups of processes that are involved when data are transferred between the model and the filter
    90 
    91 
    92 
     104 * `COMM_couple` - defines the groups of processes that are involved when data are transferred between the model and the filter. This is used to distribute and collect ensemble states.
    93105
    94106[[Image(//pics/communicators_PDAFonline.png)]]
     
    97109
    98110
    99 == Initializing the communicators ==
    100 
    101 Having replaced `MPI_COMM_WORLD` by `COMM_model` enables to split the model integration into parallel model tasks. For this, the communicator `COMM_model` has to be redefined. This is performed by the routine `init_parallel_init`, which is supplied with the PDAF package. The routine should be added to the model usually directly after the initialization of the parallelization described above.
    102 The routine `init_parallel_pdaf` also defines the communicators `COMM_filter` and `COMM_couple` that were described above. The provided routine `init_paralllel_init` is a template implementation. Thus, it has to be adjusted for the model under consideration. In particular one needs to ensure that the routine can access the variables `COMM_model` as well as `rank` and `size` (See the initialization example above. These variables might have different names in a model). If the model defines these variables in a module, a USE statement can be added to `init_parallel_pdaf` as is already done for `mod_parallel`.
    103 
    104 The routine `init_parallel_pdaf` splits the communicator `MPI_COMM_WORLD` and (re-)defines `COMM_model`. If multiple parallel model tasks are used, by setting `n_modeltasks` to a value above 1, `COMM_model` will actually be a set of communicators with one for each model task. In addition, the variables `npes_world` and `mype_world` are defined. If the model uses different names for these quantities, like `rank` and `size`, the model-specific variables should be re-initialized at the end of `init_parallel_pdaf`.
    105 The routine defines several more variables that are declared and held in the module `mod_parallel`. It can be useful to use this module with the model code as some of these variables are required when the initialization routine of PDAF (`PDAF_init`) is called.
    106 
    107 == Arguments of `init_parallel_pdaf` ==
     111=== Arguments of `init_parallel_pdaf` ===
    108112
    109113The routine `init_parallel_pdaf` has two arguments, which are the following:
     
    118122
    119123
    120 == Compiling the extended program ==
     124=== Compiling and testing the extended program ===
    121125
    122 This completes the adaptation of the parallelization. The compilation of the model has to be adjusted for the added files holding the routine `init_parallel_pdaf` and the module `mod_parallel`. One can test the extension by running the compiled model. It should run as without these changes, because `mod_parallel` defines by default that a single model task is executed (`n_modeltasks=1`). If `screen` is set to 1 in the call to init_parallel_pdaf, the standard output should include lines like
     126To compile the model with the adaption for modified parallelization, one needs to ensure that the additional files (`init_parallel_pdaf.F90`, `mod_parallel_pdaf.F90` and potentially `parser_mpi.F90`) are included in the compilation.
     127
     128If a serial model is used, one needs to adapt the compilation to compile for MPI-parallelization. Usually, this is done with a compiler wrapper like 'mpif90', which should be part of the OpenMP installation. If you compiled a PDAF tutorial case before, you can check the compile settings for the tutorial.
     129
     130One can test the extension by running the compiled model. It should run as without these changes, because `mod_parallel` defines by default that a single model task is executed (`n_modeltasks=1`). If `screen` is set to 1 in the call to init_parallel_pdaf, the standard output should include lines like
    123131{{{
    124132 Initialize communicators for assimilation with PDAF
     
    133141     3       3      1      3      4      0       T
    134142}}}
    135 These lines show the configuration of the communicators. This example was executed using 4 processes and `n_modeltasks=1`. (In this case, the variables `npes_filter` and `npes_model` will have a value of 4.)
     143These lines show the configuration of the communicators. This example was executed using 4 processes and `n_modeltasks=1`, i.e. `mpirun -np 4 ./model_pdaf -dim_ens 4` in `tutorial/online_2D_parallelmodel`. (In this case, the variables `npes_filter` and `npes_model` will have a value of 4.)
    136144
    137145
     
    153161
    154162Using multiple model tasks can result in the following effects:
    155  * The standard screen output of the model can by shown multiple times. This is due to the fact that often the process with `rank=0` performs screen output. By splitting the communicator `COMM_model`, there will be as many processes with rank 0 as there are model tasks.
    156  * Each model task might write file output. This can lead to the case that several processes try to generate the same file or try to write into the same file. In the extreme case this can result in a program crash. For this reason, it might be useful to restrict the file output to a single model task. This can be implemented using the variable `task_id`, which is initialized by `init_parallel_pdaf` and holds the index of the model task ranging from 1 to `n_modeltasks`. (For the ensemble assimilation, it can be useful to switch off the regular file output of the model completely. As each model tasks holds only a single member of the ensemble, this output might not be useful. In this case,  the file output for the state estimate and perhaps all ensemble members should be done in the pre/poststep routine of the assimilation system.)
    157 
    158 
    159 == Non-parallel models ==
    160 
    161 If the numerical model is not parallelized (i.e. serial), there are two possibilities: The data assimilation system can be used without parallelization (serial), or parallel model tasks can be used in which each model task uses a single process. Both variants are described below.
    162 
    163 === Serial assimilation system ===
    164 
    165 PDAF requires that an MPI-library is present. Usually this is easy to realize since for example OpenMPI is available for many operating systems and can easily be installed from a package.
    166 
    167 Even if the model itself does not use parallelization, the call to `init_parallel_pdaf` described above is still required. The routine will simply initialize the parallelization variables for a single-process case.
    168 
    169 === Adding parallelization to a serial model ===
    170 
    171 In order to use parallel model tasks with a model that is not parallelized, the procedure is generally as described for the fully parallel case. However, one has to add the general initialization of MPI to the model code (or to `init_parallel_pdaf`). This is the lines
     163 * The standard screen output of the model can by shown multiple times. For a serial model this is the case, since all processes can now write output. For a parallel model, this is due to the fact that often the process with `rank=0` performs screen output. By splitting the communicator `COMM_model`, there will be as many processes with rank 0 as there are model tasks. Here one can adapt the code to use `mype_world==0` from `mod_parallel_pdaf`.
     164 * Each model task might write file output. This can lead to the case that several processes try to generate the same file or try to write into the same file. In the extreme case this can result in a program crash. For this reason, it might be useful to restrict the file output to a single model task. This can be implemented using the variable `task_id`, which is initialized by `init_parallel_pdaf` and holds the index of the model task ranging from 1 to `n_modeltasks`.
     165  * For the ensemble assimilation, it can be useful to switch off the regular file output of the model completely. As each model tasks holds only a single member of the ensemble, this output might not be useful. In this case, the file output for the state estimate and perhaps all ensemble members should be done in the pre/poststep routine (`prepoststep_pdaf`) of the assimilation system.
     166  * An alternative approach can be use run each model task in a separate directory. This needs an adapted setup. For example, for the tutorial in `tutorial/online_2D_parallelmodel`:
     167   * create sub-directories `x1`, `x2`, `x3`, `x4`
     168   * copy `model_pdaf` into each of these directories
     169   * for ensemble size 4 and each model task using 2 processes, we can now run
    172170{{{
    173       CALL MPI_Init(ierr)
    174       CALL MPI_Comm_Rank(MPI_COMM_WORLD, mype_world, ierr)
    175       CALL MPI_Comm_Size(MPI_COMM_WORLD, npes_world, ierr)
    176       COMM_model = MPI_COMM_WORLD
     171mpirun -np 2 ./x1/model_pdaf -dim_ens 4 : \
     172-np 2 ./x2/model_pdaf -dim_ens 4 : \
     173-np 2 ./x3/model_pdaf -dim_ens 4 : \
     174-np 2 ./x4/model_pdaf -dim_ens 4
    177175}}}
    178 together with the `USE` statement for `mod_parallel` should be added. Subsequently, the call to `init_parallel_pdaf` has to be inserted at the beginning of the model code. At the end of the program one should insert
    179 {{{
    180     CALL  MPI_Barrier(MPI_COMM_WORLD,ierr)
    181     CALL  MPI_Finalize(ierr)
    182 }}}
    183 The module `mod_parallel.F90` from the template directory provides subroutines for the initialization and finalization of MPI. Thus, if this module is used, the is no need to explicitly add the call to the MPI functions, but one can simply add
    184 {{{
    185     CALL init_parallel()
    186 }}}
    187 at the beginning of the program. This has to be followed by
    188 {{{
    189     CALL init_parallel_pdaf(dim_ens, screen)
    190 }}}
    191 to initialize the variables for the parallelization of PDAF. At the end of the program one should then insert
    192 {{{
    193     CALL finalize_parallel()
    194 }}}
    195 in the source code.
    196 
    197 If the program is executed with these extensions using multiple model tasks, the issues discussed in '[#Compilingtheextendedprogram Compiling the extended program]' can occur. This one has to take care about which processes will perform output to the screen or to files.
     176