wiki:OnlineAddingMemoryandTimingInformation_PDAF3

Adding memory and timing information

Overview

PDAF provides functions to display the memory required by the array allocated inside PDAF. In addition, information about the execution duration of different parts of PDAF can be displayed. These information can be obtained by calling the routine PDAF_print_info.

The calls described here are implemented in finalize_pdaf.F90 in the template and tutorial codes. One can directly use these routines without changes.

Displaying memory information

Information about the memory required by PDAF through allocated arrays can be obtained by inserting into the program the line

  IF (mype_world==0) CALL PDAF_print_info(10)

The function displays the following information

  • Memory required for the ensemble array, state vector, and matrix Ainv
  • Memory required by the analysis step
  • Memory required to perform the ensemble transformation

The output will look like this:

  PDAF                       PDAF Memory overview
  PDAF          ---------------------------------------------
  PDAF                     Allocated memory  (MiB)
  PDAF              state and A:      0.598 MiB (persistent)
  PDAF           ensemble array:      0.641 MiB (persistent)
  PDAF            analysis step:     16.425 MiB (temporary)

This memory information shows only the memory required by a single filter processes. In the example codes, this is the process with mype_world=0. One can also display the overall allocated memory by adding

  CALL PDAF_print_info(11)

to the routine finalize_pdaf.

Displaying timing information

Timing information can be displayed by adding

  CALL PDAF_print_info(1)

to the code. This will provide an output like

  PDAF                     PDAF Timing information
  PDAF          ---------------------------------------------
  PDAF                  Initialize PDAF:      0.078 s
  PDAF                Ensemble forecast:      0.003 s
  PDAF                  LESTKF analysis:     25.183 s
  PDAF                      Prepoststep:      0.017 s

More detailed output is obtained with

  IF (mype_world==0) CALL PDAF_print_info(3)

which will display timing information of each of the call-back routines. E.g. for the LESTKF this might look like:

PDAF            PDAF Timing information - call-back routines
PDAF        ----------------------------------------------------
PDAF          Initialize PDAF:                     1.552 s
PDAF            init_ens_pdaf:                       1.526 s
PDAF          Ensemble forecast:               23847.693 s
PDAF            MPI communication in PDAF:         666.890 s
PDAF            distribute_state_pdaf:               2.153 s
PDAF            collect_state_pdaf:                  0.427 s
PDAF          LESTKF analysis:                   191.429 s
PDAF            PDAF-internal operations:          157.618 s
PDAF            OMI-internal routines:               1.524 s
PDAF            init_n_domains_pdaf:                 0.000 s
PDAF            init_dim_l_pdaf:                     0.127 s
PDAF            g2l_state_pdaf:                      5.190 s
PDAF            l2g_state_pdaf:                      3.087 s
PDAF            Time in OMI observation module routines 
PDAF              init_dim_obs_pdafomi:              8.880 s
PDAF              obs_op_pdafomi:                    3.913 s
PDAF              init_dim_obs_l_pdafomi:           10.750 s
PDAF          prepoststep_pdaf:                 9422.757 s

This example is from one of our real data assimilation applications where we performed 13 analysis steps in this run. Most of the time is spent in for ensemble forecast. The second most time is spent in prepoststep_pdaf, which is mainly due to the writing of large output files using a parallel writing using the binary netCDF file format. The analysis steps (line LESTKF analysis) took only 191.429s. Most of this time was spent for computations inside PDAF (line PDAF-interal operations, 157.618s), while also init_dim_obs_l_pdafomi (the search for observations within the localization cut-off radius, 10.75s) and init_dim_obs_f_pdafomi (the initialization of observation information, 8.88s) took some time.

If significant time is spend in one or several of the call-back routines, this gives an indication which routines might have potential for optimization.

More detailed information in time spend in different parts of the filter algorithm itself can be obtained using a value of 4 or 5 in the call to PDAF_print_info. Only the time from the first process of the filter processes is displayed. However, the time for each process should be similar. If one performs the call without IF (mype_world==0) each process would write its timing information.

Last modified 4 days ago Last modified on May 25, 2025, 5:33:28 PM
Note: See TracWiki for help on using the wiki.