I am looking at DDS as a solution to a military application. We are investigating failover and how it might work. I understand the simple case where a datawriter stops publishing, the reader fails over to the next high strength provider. What happens if your initial datawriter(DW) comes back online again for a few seconds, the reader goes back to reading it and then the datawriter stops publishing again - this cycle may continue.
Take this example for a 'system time provider' where all systems synchronise with the same time. The primary DW is providing time and all nodes are synchronised with it. The DW has a fault and stops writing data. All other nodes failover to the secondary system time provider. the secondary DW clock is 400ms different to the primary datawriter - thus the nodes clocks will shift 400ms. The primary DW comes online again. All nodes get the primary DW clock time again and shift back 400ms. This fault is intermittent and keeps happening. How, in DDS, can you determine the primary DW is flaky and all failover to the secondary regardless of re-appearance of the primary DW? AND if it becomes stable again, what mechanism do you use in your data model to deal with this??