wiki:OnlineorOfflineImplementation

Version 3 (modified by lnerger, 13 years ago) (diff)

--

Online or Offline Implementation

Here, we discuss whether an online or offline implementation of the data assimilation system should be considered.

Generally, we recommend to use the online implementation variant, i.e. the strong coupling to the numerical model (In fact, all implementations that the developers of PDAF did themselves are online implementations) This is motivated by the fact that the online implementation will be computationally more efficient. In the online implementation, the common memory of a single running executable is used to transfer state information between the model and PDAF. In contrast, the offline implementation uses disk files to transfer the state information between the model executable and the separate assimilation executable containing PDAF. In addition, the offline implementation implies that the full model initialization (start up) has to be performed each time when an ensemble member is integrated by the model. The cost of the model start up will be avoided when the online implementation is used. We cannot precisely state how large the overhead of the offline implementation over the offline implementation is. However, in general, for a single forecast/analysis cycle it will be the cost to write twice as many output files as there are ensemble files plus reading as many files as there are ensemble files. In addition, there will be the cost of the repeated model start up phase (which includes again reading a file holding state information) for each ensemble member.

However, there is also a strong advantage in using the offline implementation variant: One does not need to touch the model code. (This is apart from a possible addition of perturbed forcing to simulate model error, which in deed might require an addition to the initialization routines for forcing) Instead, the model is repeatedly called providing an initialization or restart file together with initial model time and the length of the integration. Then one has to implement reading routines for the assimilation executable. These routines have to initialize the ensemble information in the assimilation program. In addition one has to implement a routine, which writes the analysis states into restart/initialization files for the model. There are also some user-supplied routines that are not required in the case of the offline implementation. In particular these are U_distribute_state, U_collect_state, and U_next_observation. However, the implementation of these routines should not pose a challenge.

Avoiding the need to touch the model code can generally lead faster to a working assimilation system. When the model code is modified in the online implementation, one has to take care that all transfers of information between the model and the data are consistent. In addition, it might be that some arrays, apart from the model fields in the state vector, need to be re-initialized before a new ensemble state is integration. Over all, this should not be a problem, if the person who performs the implementation does know the model well or has very good contacts to a person with this experience. If you don't really know the model code, it can be difficult to implement the online variant.

When the assimilation system uses a large-scale model, there are more considerations about online and offline systems. In the online mode, a single ensemble has to be run using a rather large number of processors. In case of the offline mode, the integration of each ensemble member state can be performed using a single run with a smaller number of processors. When these runs are completed, the assimilation executable is run. Here the number of processors can be set rather independently from the number user for the model runs (This will depend on file output). In this respect the consideration is how to obtain the best throughput of the assimilation system. As large-scale models are typically run on computers using a batch or scheduling system one has to takes its configuration into account. Is it more likely that the single but large job is executed, or are a larger number of smaller jobs more likely to be executed. If the batch system only allows a very limited number of concurrent jobs of a single user, then the ensemble integration wil become essentially serial. (However, there can be possibilities to execute several parallel model integrations within a single batch job)

Actually, the consideration of an offline or online implementation is not an ultimate one. In fact, one can consider to implement first an offline system and later move on to an online system. This is due to the fact that the observation-related routines (observation operator, initialization of array holding observations) perform the same operations in online and offline systems. However, in an offline system all information about the model fields like the location of individual grid points is read from the model files. In contrast, in an online implementation, this information is usually taken from some array of the model code. Spatial information about observations is read from a files in both implementation variants. In addition, one need routines to write state estimates into files in both implementation variants. Over all, the changes to the observation-related routines should be rather limited when moving from an offline to an online implementation. Thus, the major work will be the adaption of the parallelization, and the extension of the model code for the ensmelbe integration.