Durability issue-old samples not retr... PreviousNext
Data Distribution Service (DDS) Forum > DDS Technical Forum >
Message/Author
Next message Agent Smith  posted on Monday, May 24, 2010 - 10:38 am
I have tried to implement the Durability QoS on both OpenSplice and OpenDDS.While I have been successfully done it for OpenDDS,it's been mixed results with OpenSplice.Hence I am posting the issue here for OpenSplice:

I have set the durability related qos policies(topic and datawriter on publisher side and topic and datareader on subscriber side),but when I run the subscriber after the publisher has already transmitted some samples,the old samples are not retrieved.I have been tinkering with these policies for teh last two weeks...have gone through ospl error and info logs ..I did manage to do it successfully around 2 weeks back...but since then no luck...i am attaching the the source of teh application...i am doing it in linux...
steps to build:
run the makefile to build it...run the pub and sub in different terminals...
application/zipApplication
durabilityexample.zip (3.6 k)
Next message Reinier Torenbeek  posted on Monday, May 24, 2010 - 02:31 pm
Hello,

I did not try running the examples, but I did the following in your Subscriber.cpp:

reader->wait_for_historical_data(wait);
reader->set_listener(reader_listener,DDS::ANY_STATUS);

The first line will return after all historical data has been delivered to the reader. If you attach the listener after that, it will not trigger on the historical data. Listeners are event-based and only trigger at the moment that new data is delivered, not after it has been delivered.

In stead of attaching a listener, you could call the take() method right from the main.

By the way, I mentioned that you have some strange choices for your QoS-es. Durability is not intended to be combined with Best Effort.

Hope this helps,
Reinier
Next message Reinier Torenbeek  posted on Monday, May 24, 2010 - 02:33 pm
Sorry, I made an unfortunate typo in my previous response:

"I did not try running the examples, but I did NOTICE the following in your
Subscriber.cpp"

Reinier
Next message Agent Smith  posted on Tuesday, May 25, 2010 - 05:49 am
Thanks for the response,Reiner.
I had tried the suggestions already...anyways I have modified the files a bit accordingly..and still no luck.
application/zipapplication
durabilityexample.zip (3.6 k)

I restarted the ospl daemon..and I find something suspicious about durability initialisation:

========================================================================================
Report : INFO
Date : Tue May 25 10:08:58 2010
Description : service 'durability': MemoryLocking disabled
Node : opendds1
Process : durability (9979)
Thread : main thread b7c368d0
Internals : V4.3/lockPages/u_service.c/163/0/676194272


I am attaching the ospl.info file.I am not able to understand the meaning of this message.
text/x-logosplinfo
ospl-info.log (9.1 k)
Next message Hans van 't Hag  posted on Tuesday, May 25, 2010 - 10:22 am
Hi,

The INFO report is 'as it says' not an erorr/warning just information about an optional setting that allows memory-pages of services to be locked in memory which in some circumstances can yield better determinism as pages can not be swapped-out by the OS anymore.

As this forum is not a product-specific forum, could you perhaps use our OpenSplice DDS developer-mailing-lists for direct interaction with our developer-community ? you can register here: http://www.opensplice.org/cgi-bin/twiki/view/Community/MailingLists#developer_op ensplice_org

Regards,
Hans
Next message Hans van 't Hag  posted on Tuesday, May 25, 2010 - 10:49 am
Hi,

Took a quick look at your publisher code (without running it yet..) and noticed that you don't create the topic with the properly configured topic_qos structure (that has durability set to TRANSIENT) but that instead you create it with TOPIC_QOS_DEFAULT which implies VOLATILE durability.

Regards,
Hans
Next message Agent Smith  posted on Tuesday, May 25, 2010 - 11:51 am
Hi Hans,
Thanks for your response.
Actually I tried to post this issue in Opensplice Google Group,but since I could not upload the attachment,I decided to go for this forum.

My bad luck that the code I had attached had the topic QoS as Default.I have changed it and the observation is that the subscriber throws a segmentation fault.I tried to debug the subscriber and found the following issue::

Starting program: /opt/PrismTech/OpenSpliceDDS/HDE/x86.linux2.6/examples/dcps/standalone/C++/durabilityexample/exec/sub
[Thread debugging using libthread_db enabled]
[New Thread 0xb7b756c0 (LWP 12387)]
[New Thread 0xb7b74b90 (LWP 12388)]
[New Thread 0xb7ef5b90 (LWP 12389)]
[New Thread 0xb7773b90 (LWP 12390)]
[New Thread 0xb7762b90 (LWP 12391)]
[New Thread 0xb7751b90 (LWP 12392)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7751b90 (LWP 12392)]
0x08056c55 in HelloWorldListener::on_data_available (this=0x83295e0,
reader=0x0) at Subscriber.cpp:46
46 ANY_SAMPLE_STATE, ANY_VIEW_STATE, ANY_INSTANCE_STATE);

As can be seen,the reader is Null.

The osplinfo.log is also attached for reference.

application/zipapplication
durabilityexample.zip (3.5 k)
text/x-log
ospl-info.log (9.1 k)
Next message Agent Smith  posted on Tuesday, May 25, 2010 - 12:11 pm
Further to the above issue,if I restart the ospl daemon,the sub runs fine once.After that almost all the time the segmentation fault happens.However,sometimes it runs fine,even shows some old samples.
I do see the following in osplerror.log:
================================================================================ ========
Report : ERROR
Date : Tue May 25 16:24:44 2010
Description : Create kernel entity failed. For Topic: <sample>
Node : opendds1
Process : 12891
Thread : main thread b7b666c0
Internals : V4.3/u_topicNew/u_topic.c/106/0/370762659
================================================================================ ========
Report : ERROR
Date : Tue May 25 16:24:44 2010
Description : Invalid TopicDescription
Node : opendds1
Process : 12891
Thread : main thread b7b666c0
Internals : V4.3/CCPP/ccpp_Subscriber_impl.cpp/58/0/370944279

Thanks again for all your responses.
Next message Agent Smith  posted on Tuesday, May 25, 2010 - 12:14 pm
I also see the following log in osplinfo when the segmentation fault happens:
================================================================================ ========
Report : INFO
Date : Tue May 25 16:33:39 2010
Description : service 'durability': MemoryLocking disabled
Node : opendds1
Process : durability (13380)
Thread : main thread b7c458d0
Internals : V4.3/lockPages/u_service.c/163/0/319718080
================================================================================ ========
Report : INFO
Date : Tue May 25 16:33:39 2010
Description : Durability identification is: 284218336
Node : opendds1
Process : durability (13380)
Thread : main thread b7c458d0
Internals : V4.3/DurabilityService/d_durability.c/419/0/463010799
================================================================================ ========
Report : WARNING
Date : Tue May 25 16:39:28 2010
Description : Forcing user-layer detach while still referenced 1 time.
Node : opendds1
Process : 13506
Thread : main thread b7b8f6c0
Internals : V4.3/u_userExit/u_user.c/71/0/799065952
================================================================================ ========
Report : WARNING
Date : Tue May 25 16:39:28 2010
Description : User layer not initialized
Node : opendds1
Process : 13506
Thread : b7b8eb90
Internals : V4.3/u_userDetach/u_user.c/185/1/799723675

Can someone explain the meaning of these messages??
Next message Hans van 't Hag  posted on Tuesday, May 25, 2010 - 06:03 pm
Hi,

very strange, it works fine with 5.1 but with 4.3 I had to uncomment the 'cout' in the lifeliness-changed call-back to make it work ...

can you try that 'fix' too ?
Next message Agent Smith  posted on Wednesday, May 26, 2010 - 08:18 am
Thanks Hans for running the example.Indeed I am using version 4.3(will try out on 5.1 soon).
As I informed in my earlier post about the segmention fault,I handle it by returning from the on_data_available callback if the DataReader object is sent as Null.Why the reader object comes as null is probably a different question.
That way I have got my Transient Durability to work as expected.
However,when in case of persistent durability,I can see the xml file in the storage directory(sample.xml in my case) getting updated when the publisher is running.But as soon as the publisher terminates,the size of the sample.xml file gets back to 57 bytes.After running the publisher again,the samples from the past session are not retrieved.
I am once again attaching my application.I am forced to comment out the durability service qos settings of the topics on both sub and pub as otherwise the pub would crash due to failure in creating the topic.
application/zipapplication
durabilityexample.zip (3.6 k)
Next message Hans van 't Hag  posted on Wednesday, May 26, 2010 - 02:30 pm
Hi,

You've run into a very common pittfall w.r.t. the default-value of the autodispose_unregistered_instances QoS policy. This policy is by default set to TRUE implying that when your application terminates (and implicitly unregisters its instances), this QoS policy setting will also automatically cause the transient and/or persistent data built-up for those instances being disposed and subsequently being cleaned-up (also from the persistent-store).

So in your writer you should add the following:
dw_qos.writer_data_lifecycle.autodispose_unregistered_instances = FALSE;

Now you can safely terminate your writer without impacting the availability of non-volatile data for late-joiners.

Regards,
hans
Next message Agent Smith  posted on Wednesday, May 26, 2010 - 03:40 pm
Thanks Hans once again...that solved the problem.However,I feel the ospl daemon also caches some data..because even after deleting the persistent data source(sample.xml) the subscriber was still getting past data.However,once ospl daemon was restarted,old samples were gone...hence,probably it caches them??
Next message Hans van 't Hag  posted on Wednesday, May 26, 2010 - 04:10 pm
You should view persistent data as a subset of transient data, so when you remove the persistent-store, that will only impact the ability to re-inject that data during a subsequent system restart yet it won't impact the availability of that data while the system is still running.
Next message Robert Tseng  posted on Tuesday, July 13, 2010 - 11:31 pm
Hi Hans,

I am having similar issue, using opensplice dds as well

I have tried with datarwriter autodispose = off. Setting my topic with keep_all_history, or keep_history_depth and then set it as default topic qos and have data_writer copied from topic qos either implicitly through macro or explicitly through helper functions.

On the data_reader side, under any configurations, I am only able to receive the last message.

My environment has 1 pub/1sub. The pub exits after writing, typically, and sub joins after all message have been written. I have also tried to put the pub into an infinite sleep after publishing but same result every time.

After 2 days I am at my wits' end. I can post my code snippet if it helps.
Next message Hans van 't Hag  posted on Wednesday, July 14, 2010 - 12:35 pm
hi Robert,

Maybe most efficient if you post your question with code on either our new product-specific forum (http://forums.opensplice.org) or by using our developers mailing-list which you can post to via http://www.opensplice.org/mailman/listinfo/developer

Perhaps you're confusing the (dataReader) HISTORY QoS policy with the topic DURABILITY QoS. Wherease the history drives the 'ability' of the dataReader to store a number of arrived samples for each 'instance' i.e. unique key-value (instead of overwriting old-data with new-data for each instance), its the DURABILITY QoS that when set to TRANSIENT or PERSISTENT will instruct the middleware to retain such non-volatile data for delivery to late-joining applications (dataReaders). When you mention the 'autodispose' setting (of FALSE) this suggests that you're interested in 'durability' rather than 'history' ..

Regards,
Hans
Next message Robert Tseng  posted on Wednesday, July 14, 2010 - 11:20 pm
Hans,


I finally got it working last night. As it turn out I needed both because I believe default history depth is 1 on OpenspliceDDS datareader.

So on publisher side I have TRANSIENT, KEEP_ALL_HISTORY_QOS, max_samples_per_instance set for Topic QOS, and Datawriter Qos having autodispose "off"

On the subscriber, the Topic QOS is matched with publisher's but datareader has a new history depth. (This did the trick)

I was definitely confused with the durability qos from topic, which has a history_depth field like Datareader's history qos. I was seeing the last message being saved but couldn't retrieve priors despite durability qos depth is set to 1000.

If you know any online resource on good design patterns for DDS messaging system let me know. I am trying to figure a scheme where one publisher multicast to multiple subscribers but as soon as the last sub gets the message, the middleware no longer needs to keep it around.


Thank you!!
Back to top
Add Your Message Here
Post:
Username: Posting Information:
This is a private posting area. Only registered users and moderators may post messages here.
Password:
Options: Post as "Anonymous"
Enable HTML code in message
Automatically activate URLs in message
Action: