Changes between Initial Version and Version 1 of OnlineAddingMemoryandTimingInformation_PDAF3


Ignore:
Timestamp:
May 25, 2025, 5:33:28 PM (6 days ago)
Author:
lnerger
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • OnlineAddingMemoryandTimingInformation_PDAF3

    v1 v1  
     1= Adding memory and timing information =
     2
     3{{{
     4#!html
     5<div class="wiki-toc">
     6<h4>Online Mode: Implementation Guide</h4>
     7<ol><li><a href="OnlineImplementationGuide_PDAF3">Main page</a></li>
     8<li><a href="OnlineAdaptParallelization_PDAF3">Adapting the parallelization</a></li>
     9<li><a href="OnlineInitPdaf_PDAF3">Initializing PDAF</a></li>
     10<li><a href="OnlineModifyModelforEnsembleIntegration_PDAF3">Modifications for ensemble integration</a></li>
     11<li><a href="ImplementationofAnalysisStep_PDAF3">Implementing the analysis step</a></li>
     12<li>Memory and timing information</li>
     13</ol>
     14</div>
     15}}}
     16
     17[[PageOutline(2-3,Contents of this page)]]
     18
     19== Overview ==
     20
     21PDAF provides functions to display the memory required by the array allocated inside PDAF. In addition, information about the execution duration of different parts of PDAF can be displayed. These information can be obtained by calling the routine `PDAF_print_info`.
     22
     23The calls described here are implemented in `finalize_pdaf.F90` in the template and tutorial codes. One can directly use these routines without changes.
     24
     25== Displaying memory information ==
     26
     27Information about the memory required by PDAF through allocated arrays can be obtained by inserting into the program the line
     28{{{
     29  IF (mype_world==0) CALL PDAF_print_info(10)
     30}}}
     31The function displays the following information
     32 * Memory required for the ensemble array, state vector, and matrix '''Ainv'''
     33 * Memory required by the analysis step
     34 * Memory required to perform the ensemble transformation
     35
     36The output will look like this:
     37{{{
     38  PDAF                       PDAF Memory overview
     39  PDAF          ---------------------------------------------
     40  PDAF                     Allocated memory  (MiB)
     41  PDAF              state and A:      0.598 MiB (persistent)
     42  PDAF           ensemble array:      0.641 MiB (persistent)
     43  PDAF            analysis step:     16.425 MiB (temporary)
     44}}}
     45
     46This memory information shows only the memory required by a single filter processes. In the example codes, this is the process with `mype_world=0`. One can also display the overall allocated memory by adding
     47{{{
     48  CALL PDAF_print_info(11)
     49}}}
     50to the routine `finalize_pdaf`.
     51
     52== Displaying timing information ==
     53
     54Timing information can be displayed by adding
     55{{{
     56  CALL PDAF_print_info(1)
     57}}}
     58to the code. This will provide an output like
     59{{{
     60  PDAF                     PDAF Timing information
     61  PDAF          ---------------------------------------------
     62  PDAF                  Initialize PDAF:      0.078 s
     63  PDAF                Ensemble forecast:      0.003 s
     64  PDAF                  LESTKF analysis:     25.183 s
     65  PDAF                      Prepoststep:      0.017 s
     66}}}
     67
     68More detailed output is obtained with
     69{{{
     70  IF (mype_world==0) CALL PDAF_print_info(3)
     71}}}
     72which will display timing information of each of the call-back routines. E.g. for the LESTKF this might look like:
     73{{{
     74PDAF            PDAF Timing information - call-back routines
     75PDAF        ----------------------------------------------------
     76PDAF          Initialize PDAF:                     1.552 s
     77PDAF            init_ens_pdaf:                       1.526 s
     78PDAF          Ensemble forecast:               23847.693 s
     79PDAF            MPI communication in PDAF:         666.890 s
     80PDAF            distribute_state_pdaf:               2.153 s
     81PDAF            collect_state_pdaf:                  0.427 s
     82PDAF          LESTKF analysis:                   191.429 s
     83PDAF            PDAF-internal operations:          157.618 s
     84PDAF            OMI-internal routines:               1.524 s
     85PDAF            init_n_domains_pdaf:                 0.000 s
     86PDAF            init_dim_l_pdaf:                     0.127 s
     87PDAF            g2l_state_pdaf:                      5.190 s
     88PDAF            l2g_state_pdaf:                      3.087 s
     89PDAF            Time in OMI observation module routines
     90PDAF              init_dim_obs_pdafomi:              8.880 s
     91PDAF              obs_op_pdafomi:                    3.913 s
     92PDAF              init_dim_obs_l_pdafomi:           10.750 s
     93PDAF          prepoststep_pdaf:                 9422.757 s
     94}}}
     95This example is from one of our real data assimilation applications where we performed 13 analysis steps in this run. Most of the time is spent in for ensemble forecast. The second most time is spent in `prepoststep_pdaf`, which is mainly due to the writing of large output files using a parallel writing using the binary netCDF file format.
     96The analysis steps  (line `LESTKF analysis`) took only 191.429s. Most of this time was spent for computations inside PDAF (line `PDAF-interal operations`, 157.618s), while also `init_dim_obs_l_pdafomi` (the search for observations within the localization cut-off radius, 10.75s) and `init_dim_obs_f_pdafomi` (the initialization of observation information, 8.88s) took some time.
     97
     98If significant time is spend in one or several of the call-back routines, this gives an indication which routines might have potential for optimization.
     99
     100More detailed information in time spend in different parts of the filter algorithm itself can be obtained using a value of 4 or 5 in the call to `PDAF_print_info`. Only the time from the first process of the filter processes is displayed. However, the time for each process should be similar. If one performs the call without `IF (mype_world==0) ` each process would write its timing information.
     101