Changes between Version 1 and Version 2 of AddingMemoryandTimingInformation_PDAF23
- Timestamp:
- Jun 3, 2025, 1:57:45 PM (3 days ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
AddingMemoryandTimingInformation_PDAF23
v1 v2 20 20 PDAF provides functions to display the memory required by the array allocated inside PDAF. In addition, information about the execution duration of different parts of PDAF can be displayed. These information can be obtained by calling the routine `PDAF_print_info`. 21 21 22 The calls described here are implemented in `finalize_pdaf.F90` in the template and tutorial codes. One can directly use these routines without changes. 23 22 24 == Displaying memory information == 23 25 24 26 Information about the memory required by PDAF through allocated arrays can be obtained by inserting into the program the line 25 27 {{{ 26 CALL PDAF_print_info(2)28 IF (mype_world==0) CALL PDAF_print_info(10) 27 29 }}} 28 30 The function displays the following information 29 * Memory required for the ensemble array, state vector, and matrix ''' Uinv'''31 * Memory required for the ensemble array, state vector, and matrix '''Ainv''' 30 32 * Memory required by the analysis step 31 33 * Memory required to perform the ensemble transformation … … 41 43 }}} 42 44 43 Currently only the memory required by the first process of the filter processes is displayed. Thus the total required memory should be the displayed memory multiplied by the number of processes in `COMM_filter`. 45 This memory information shows only the memory required by a single filter processes. In the example codes, this is the process with `mype_world=0`. One can also display the overall allocated memory by adding 46 {{{ 47 CALL PDAF_print_info(11) 48 }}} 49 to the routine `finalize_pdaf`. 44 50 45 51 == Displaying timing information == … … 61 67 More detailed output is obtained with 62 68 {{{ 63 CALL PDAF_print_info(3)69 IF (mype_world==0) CALL PDAF_print_info(3) 64 70 }}} 65 71 which will display timing information of each of the call-back routines. E.g. for the LESTKF this might look like: 66 72 {{{ 67 PDAF PDAF Timing information - call-back routines 68 PDAF ---------------------------------------------------- 69 PDAF Initialize PDAF: 2.007 s 70 PDAF init_ens_pdaf: 2.004 s 71 PDAF Ensemble forecast: 571.850 s 72 PDAF MPI communication in PDAF: 0.004 s 73 PDAF distribute_state_pdaf: 0.140 s 74 PDAF collect_state_pdaf: 0.001 s 75 PDAF LESTKF analysis: 12.654 s 76 PDAF PDAF-internal operations: 10.360 s 77 PDAF init_n_domains_pdaf: 0.000 s 78 PDAF init_dim_obs_f_pdaf: 1.091 s 79 PDAF obs_op_f_pdaf: 0.022 s 80 PDAF init_dim_l_pdaf: 0.001 s 81 PDAF init_dim_obs_l_pdaf: 1.136 s 82 PDAF g2l_state_pdaf: 0.003 s 83 PDAF g2l_obs_pdaf: 0.007 s 84 PDAF init_obs_l_pdaf: 0.004 s 85 PDAF prodRinvA_l_pdaf: 0.023 s 86 PDAF l2g_state_pdaf: 0.002 s 87 PDAF prepoststep_pdaf: 91.396 s 73 PDAF PDAF Timing information - call-back routines 74 PDAF ---------------------------------------------------- 75 PDAF Initialize PDAF: 1.552 s 76 PDAF init_ens_pdaf: 1.526 s 77 PDAF Ensemble forecast: 23847.693 s 78 PDAF MPI communication in PDAF: 666.890 s 79 PDAF distribute_state_pdaf: 2.153 s 80 PDAF collect_state_pdaf: 0.427 s 81 PDAF LESTKF analysis: 191.429 s 82 PDAF PDAF-internal operations: 157.618 s 83 PDAF OMI-internal routines: 1.524 s 84 PDAF init_n_domains_pdaf: 0.000 s 85 PDAF init_dim_l_pdaf: 0.127 s 86 PDAF g2l_state_pdaf: 5.190 s 87 PDAF l2g_state_pdaf: 3.087 s 88 PDAF Time in OMI observation module routines 89 PDAF init_dim_obs_pdafomi: 8.880 s 90 PDAF obs_op_pdafomi: 3.913 s 91 PDAF init_dim_obs_l_pdafomi: 10.750 s 92 PDAF prepoststep_pdaf: 9422.757 s 88 93 }}} 89 This example is from one of our real data assimilation application . Most of the time is spent in for ensmeble forecast. The second most time is spent in `prepoststep_pdaf`, which is mainly due to the writing of large output files.90 The analysis step (line `LESTKF analysis`) took only 12.65s. Most of this time was spent for computations inside PDAF (line `PDAF-interal operations`, 10.36s), while also `init_dim_obs_f_pdaf` (the initialization of observation information) and `init_dim_obs_l_pdaf` (the search for observations within the localization cut-off radius) took some time.94 This example is from one of our real data assimilation applications where we performed 13 analysis steps in this run. Most of the time is spent in for ensemble forecast. The second most time is spent in `prepoststep_pdaf`, which is mainly due to the writing of large output files using a parallel writing using the binary netCDF file format. 95 The analysis steps (line `LESTKF analysis`) took only 191.429s. Most of this time was spent for computations inside PDAF (line `PDAF-interal operations`, 157.618s), while also `init_dim_obs_l_pdafomi` (the search for observations within the localization cut-off radius, 10.75s) and `init_dim_obs_f_pdafomi` (the initialization of observation information, 8.88s) took some time. 91 96 92 97 If significant time is spend in one or several of the call-back routines, this gives an indication which routines might have potential for optimization. 93 98 94 More detailed information in time spend in different parts of the filter algorithm itself can be obtained using a value of 4 or 5 in the call to `PDAF_print_info`. Only the time from the first process of the filter processes is displayed. However, the time for each process should be similar. 99 More detailed information in time spend in different parts of the filter algorithm itself can be obtained using a value of 4 or 5 in the call to `PDAF_print_info`. Only the time from the first process of the filter processes is displayed. However, the time for each process should be similar. If one performs the call without `IF (mype_world==0) ` each process would write its timing information. 95 100