wiki:PDAF_print_info

Version 2 (modified by lnerger, 4 years ago) (diff)

--

PDAF_print_info

This page documents the routine PDAF_print_info of PDAF.

This routine is called to display memory and timing information measured by PDAF. Usually, the routine is only called by one process (the process with MPI Rank 0 in the communicator for the filter, which in most cases is the process with world rank 0 (i.e. in MPI_COMM_WORLD)), however, one can also let each process call the routine to e.g. get the timing information for each process.

Displaying memory information

Information about the memory required by PDAF through allocated arrays can be obtained by inserting into the program the line

  CALL PDAF_print_info(2)

The function displays the following information

  • Memory required for the ensemble array, state vector, and transform matrix
  • Memory required by the analysis step
  • Memory required to perform the ensemble transformation

The output will look like this:

                       PDAF Memory overview
          ---------------------------------------------
                     Allocated memory  (MB)
              state and U:   0.59617 MB (persistent)
           ensemble array:   0.64087 MB (persistent)
            analysis step:   6.05578 MB (temporary)
               resampling:   2.81129 MB (temporary)

Currently only the memory required by the first process of the filter processes is displayed. Thus the total required memory should be the displayed memory multiplied by the number of processes in COMM_filter.

Displaying timing information

Timing information can be displayed by adding

  CALL PDAF_print_info(X)

to the code. Where X is the timer level to be shown. Available choices are

  • X=1: Basic timers
  • X=3: Timers showing the time spent int he different call-back routines (this veriant was added with PDAF 1.15)
  • X=4: More detailed timers about parts of the filter algorithm (before PDAF 1.15, this was timer level 3)
  • X=5: Very detailed timers about various operations in the filter algorithm (before PDAF 1.15, this was timer level 4)

For X=1, the output will look like

  PDAF                     PDAF Timing information
  PDAF          ---------------------------------------------
  PDAF                  Initialize PDAF:      0.078 s
  PDAF                Ensemble forecast:      0.003 s
  PDAF                  LESTKF analysis:     25.183 s
  PDAF                      Prepoststep:      0.017 s

We recommend to use X=3 for optimizing the user routines. The output will look like

  PDAF            PDAF Timing information - call-back routines
  PDAF        ----------------------------------------------------
  PDAF          Initialize PDAF:                     0.078 s
  PDAF            init_ens_pdaf:                       0.077 s
  PDAF          Ensemble forecast:                   0.003 s
  PDAF            MPI communication in PDAF:           0.000 s
  PDAF            distribute_state_pdaf:               0.001 s
  PDAF            collect_state_pdaf:                  0.000 s
  PDAF          LESTKF analysis:                    25.183 s
  PDAF            PDAF-internal operations:           24.762 s
  PDAF            init_n_domains_pdaf:                 0.000 s
  PDAF            init_dim_obs_f_pdaf:                 0.000 s
  PDAF            obs_op_f_pdaf:                       0.003 s
  PDAF            init_dim_l_pdaf:                     0.001 s
  PDAF            init_dim_obs_l_pdaf:                 0.001 s
  PDAF            g2l_state_pdaf:                      0.001 s
  PDAF            g2l_obs_pdaf:                        0.021 s
  PDAF            init_obs_l_pdaf:                     0.000 s
  PDAF            prodRinvA_l_pdaf:                    0.006 s
  PDAF            l2g_state_pdaf:                      0.014 s
  PDAF          prepoststep_pdaf:                    0.017 s

This output will show you in which routine most time is spent. If it is a call-back routine, you have the possibility to check for optimizations to this routine to speed up the computation. If most time is spend inside PDAF, as in the example, you can't further optimize.