Version 4 (modified by 8 months ago) (diff) | ,
---|
PDAF_print_info
This page documents the routine PDAF_print_info
of PDAF.
This routine is called to display memory and timing information measured by PDAF. Usually, the routine is only called by one process (the process with MPI Rank 0 in the communicator for the filter, which in most cases is the process with world rank 0 (i.e. in MPI_COMM_WORLD)), however, one can also let each process call the routine to e.g. get the timing information for each process.
Displaying memory information
Information about the memory required by PDAF through allocated arrays can be obtained by inserting into the program the line
CALL PDAF_print_info(10)
(The value 10 is valid since PDAF V2.1. For older versions use 2)
The function displays the following information
- Memory required for the ensemble array, state vector, and transform matrix
- Memory required by the analysis step
- Memory required to perform the ensemble transformation
The output will look like this:
PDAF Memory overview --------------------------------------------- Allocated memory (MB) state and U: 0.59617 MB (persistent) ensemble array: 0.64087 MB (persistent) analysis step: 6.05578 MB (temporary) resampling: 2.81129 MB (temporary)
Currently only the memory required by the first process of the filter processes is displayed. Thus the total required memory should be the displayed memory multiplied by the number of processes in COMM_filter
.
Displaying timing information
Timing information can be displayed by adding
CALL PDAF_print_info(X)
to the code. Where X is the timer level to be shown. Available choices are
- X=1: Basic timers
- X=3: Timers showing the time spent int he different call-back routines (this variant was added with PDAF 1.15)
- X=4: More detailed timers about parts of the filter algorithm (before PDAF 1.15, this was timer level 3)
- X=5: Very detailed timers about various operations in the filter algorithm (before PDAF 1.15, this was timer level 4)
For X=1, the output will look like
PDAF PDAF Timing information PDAF --------------------------------------------- PDAF Initialize PDAF: 0.078 s PDAF Ensemble forecast: 0.003 s PDAF LESTKF analysis: 25.183 s PDAF Prepoststep: 0.017 s
We recommend to use X=3 for optimizing the user routines. With PDAF-OMI, the output will look like
PDAF PDAF Timing information - call-back routines PDAF ---------------------------------------------------- PDAF Initialize PDAF: 0.078 s PDAF init_ens_pdaf: 0.077 s PDAF Ensemble forecast: 0.003 s PDAF MPI communication in PDAF: 0.000 s PDAF distribute_state_pdaf: 0.001 s PDAF collect_state_pdaf: 0.000 s PDAF LESTKF analysis: 25.183 s PDAF PDAF-internal operations: 24.762 s PDAF OMI-internal routines: 0.001 s PDAF init_n_domains_pdaf: 0.000 s PDAF init_dim_l_pdaf: 0.001 s PDAF g2l_state_pdaf: 0.001 s PDAF l2g_state_pdaf: 0.014 s PDAF Time in OMI observation module routines PDAF init_dim_obs_pdafomi: 0.001 s PDAF obs_op_pdafomi: 0.003 s PDAF init_dim_obs_l_pdafomi: 0.002 s PDAF prepoststep_pdaf: 0.017 s
This output will show you in which routine most time is spent. If it is a call-back routine, you have the possibility to check for optimizations to this routine to speed up the computation. If most time is spend inside PDAF, as in the example, you can't further optimize.
The OMI-specific output shown above was introduced with PDAF V2.2.1. In older versions of PDAF, or when OMI is not used, more user-provided call-back routines are used. In this case, the output will look like
PDAF PDAF Timing information - call-back routines PDAF ---------------------------------------------------- PDAF Initialize PDAF: 0.078 s PDAF init_ens_pdaf: 0.077 s PDAF Ensemble forecast: 0.003 s PDAF MPI communication in PDAF: 0.000 s PDAF distribute_state_pdaf: 0.001 s PDAF collect_state_pdaf: 0.000 s PDAF LESTKF analysis: 25.183 s PDAF PDAF-internal operations: 24.762 s PDAF init_n_domains_pdaf: 0.000 s PDAF init_dim_obs_f_pdaf: 0.000 s PDAF obs_op_f_pdaf: 0.003 s PDAF init_dim_l_pdaf: 0.001 s PDAF init_dim_obs_l_pdaf: 0.001 s PDAF g2l_state_pdaf: 0.001 s PDAF g2l_obs_pdaf: 0.021 s PDAF init_obs_l_pdaf: 0.000 s PDAF prodRinvA_l_pdaf: 0.006 s PDAF l2g_state_pdaf: 0.014 s PDAF prepoststep_pdaf: 0.017 s