I have some questions regarding keyed data, publishers and subscribers. I have seen keyed data examples where a data writer must register as a provider of a specific flavor of keyed data; for instance, sensor data with a sensor id as the key. The DDS interface supports the notion of a publisher registered as a provider of a specific flavor (instance) of sensor data. I have seen examples where subscribers register for *all* instances of sensor data, i.e. TOPIC only, however, I have yet to find an example where a subscriber can register as a consumer of a specific instance of sensor data.
My questions are:
1. Does the DDS specification support the notion of a subscriber’s ‘registration’ as a consumer of a specific instance of keyed data. 2. If so, how is this implemented? Specifically, if there exists a publisher of sensor data with key = sensor_id = 5 , and a subscriber of sensor data with sensor_id = 7, will there be data transfer between these two entities that is somehow filtered at the subscriber end? Or, is connectivity not ever established between these entities? 3. What does this look like in code?
Also a writer is allowed to register more than just 1 instance: it can register as many instances as it likes, as long as each of the instances has its own unique key values. This way a sensor can distinguish between the different 'objects' that it detects.
Since a subscriber is not aware of the information available at the publisher side (e.g. how many instances, what is the identity of each individual instance, etc.), it normally subscribes to the topic itself. By doing so, it receives all available instances from all available writers for that topic. By examining these samples, it knows what information is available and then it can normally focus on its region of interest by for example using queries.
Back to your question: if a Reader does have a-priori knowledge about instance identities and wants to make sure that only specific instances are received it can use a ContentFilter for that purpose. This Filter is just an SQL statement applied to a Topic that specifies the values of the keyfields for the instances that you want to receive. In your example a filter expression could be: "sensor_id=7". All other instances will be blocked out by the Reader.
It is dependent on the middleware product whether this content filter is executed on the reader side or on the writer side: either way, the selected mechanism is is fully transparent to the application and the resulting behaviour should be the same (although filtering on the Writer side may save you some network bandwith).
Basically what you need to do is to create what is called a ContentFilteredTopic: that is a sort of a filtered view of an existing Topic. A ContentFilteredTopic, like a normal topic, has a unique name by which it is identified, but the ContentFilteredTopic may only be applied to Readers, not to Writers. Also, the ContentFilteredTopic is local to a DomainParticipant, its existence will not be communicated to other nodes.
Using a ContentFilter looks a little bit like the C++ code snippet below. There we assume you have a DomainParticipant 'dp', a topic proxy 't' to the topic named "sendorData" for which you want to filter on sensor_id=7. The resulting ContentFilteredTopic we will name "sendorDataNr7":
sub = dp->create_subscriber(SUBSCRIBER_QOS_DEFAULT, NULL, 0); // Default Qos, No Listener. if (sub == NULL) exit(-1) //error.
sd7Reader = sub->create_datareader(cft, DATAREADER_QOS_USE_TOPIC_QOS, NULL, 0) // Use Qos settings from topic, no Listener. if (sd7Reader == NULL) exit(-1) //error.
Once this reader has successfully been created, you can cast it to its proper type (like with an ordinary Reader), and from that moment on it will behave like an ordinary Reader except that it only receives samples that match your SQL filter.
Hi Bryan, The Data-Distribution Service offers a set of Content-Based Subscriptions on topics in the global data space. These subscription schemes use a subset of the SQL language, allowing software engineers the express interest on specific data of a topic i.e. specific values, instances, etc. The Data-Distribution Service provides these as ContentFilteredTopic and MultiTopic subscriptions which contain SQL filter expressions. Whenever a datareader on for a subscription possesses data matching the filter expression, its corresponding listener is notified. Following the notification the application can then choose to consume the data from the associated datareader. However, concerning detecting and consuming specific instances of a topic, read and query can also be constructed and used to conditionally consume samples of a topic. Read and Query Conditions, which also use a subset of the SLQ language, are constructed post subscription and used to conditionally consume data according their filter expressions. Read and query conditions can therefore be easily used to construct conditions on top of existing subscriptions which i.e. express interest on new instances of a topic, specific instances of topic, or merely specific values and conditionally consume data matching their filter expressions. However read and query conditions require that software engineers manually demultiplex them using what is called a waitset to assess whether their filter expressions have evaluated to true.
Thanks for your quick responses. If I understand correctly, you both suggests all data of a certain topic, published from all publishers regardless of key, gets routed to every subscriber of that topic but the functionality I am looking for can be implemented using Content Filter mechanism. Unfortunately, this was not the answer I was hoping for, as it sounds like data is flooded everywhere and filtered at the destination. In bandwidth-limited applications, this is unacceptable. If content filtering is the only way to get this functionality, then I guess I either don't fully understand the issue, or overestimated the utility of keyed data.
> If content filtering is the only way to > get this functionality, then I guess I > either don't fully understand the issue, > or overestimated the utility of keyed > data.
Please note that using filtering on the source side might sound very attractive from a networking bandwidth point of view, but also has its disadvantages. The publisher will have to administrate the different filter expressions that are defined by the subscribers and it also will have to evaluate those filter expressions for every sample. This model is, in general, not scalable with respect to the number of filtering subscribers. And will the data be sent to those interested subscribers using a point-to-point mechanism? Only in specific situations you might benefit from the filtering at the source over filtering at the destination.
Depending on the DDS implementation you are using, other ways of achieving the filtering functionality you are asking for might be available. For example, OpenSplice supports a scalable filtering solution using DDS Partitions. The Partition QoS is an expression that can be attached at the Publisher and Subscriber levels. A Partition is like a logical namespace for Topics. Publishers and Subscribers will communicate only when they are attached to the same Partition. This is a more simple model that does allow for source-side filtering: OpenSplice supports mapping the DDS Partition onto IP multicast addresses. That way, the IP multicast mechanism executes the actual filtering. This requires some a priori knowledge on who will be interested in which partition, but it does give you something similar to the filtering that you are asking for -- if I understood your question properly.
>>Erik wrote: "Since a subscriber is not aware of the information available at the publisher side ..."
If all data of a certain Topic is sent everywhere and is filtered at the destination, then what benefit does the ability on the publisher side to specify that I am a producer of a specific instance of keyed data? It doesn't buy you anything if the data is filtered on the subscriber side regardless.
It was my understanding that a data pathway is only established if the Qos parameters of producer and consumer are matched. Qos subscribers do not need to be aware of Qos publishers, only that the specific Qos requirements in which they are interested are met. How is this different than keys?
Thanks for the response.
Its just that most of the articles I have read conlude that keyed data feature of DDS is the major benefit/distinction over "pure" messaging frameworks such as JMS. In all reality, it seems like keys are a way to implement application independent data filtering, and not a mechanism to reduce inter-component network connectivity (and therefore network bandwidth) as was implied.
Hi Bryan, One should consider the fact that the Data-Distribution Service is fundamentally based on the Publish/Subscribe paradigm and is not a messaging framework. You referee to Producers and Consumers which is justifiable, however Producers and Consumers are in many way dissimilar from Publishers and Subscribers. Publishers and Subscribers, in the Publish/Subscribe paradigm, are considered anonymous participants, which are unaware of each other’s presence and only coupled by the topic. Data is therefore not owned by the interacting parties (publishers nor subscribers) as data is considered to belong to the global data space, where publishers and subscribers merely use datawriters and datareaders to modify and/or retrieve topic related data (instances of the topic) from the global data space. Topics and instances of topics therefore exist in the global data space and not at the publishing or subscribing ends. Data should therefore not be considered sent from point a to point b and should instead be considered updated in the global data space. This is what makes the Data-Distribution Service so unique compared to similar middleware. In some aspects we can relate the data-distribution service with the shared spaces paradigm and middleware such as LINDA and Java Spaces, instead of messaging based middleware. However DDS is considered real-time and the quality of service supported by the middleware justifies this fact. Concerning the key aspect of the topics is in fact distinctive from “pure” messaging frameworks. Consider the global data space as a decentralized relational database, where topics are similar to tables, keys similar primary keys and instances of a topic similar to rows of a table. Strongly typed datawriters and datareaders are therefore used to access strongly typed topics (tables) in the global data space (database) similar to the way data access layers are used to access relational databases. Given this perspective of the data-distribution service, it’s clear to see that data is transparent and managed in the middleware, whereas messaging based middleware, such as the Java Messaging Service, treat data as completely transparent payloads merely used to transport data from point a to point b. As Erik mentions quality of service policies can be used to impose non-functional requirements on topics in the global data space. The Partition policy would therefore contribute on elevating network bandwidth as updates committed by datawriters will only be detected by datareaders within the same partition. However the TimebasedFilter policy can additionally be used to contribute on efficient bandwidth usage. This is however my brief perspective on publishers, subscribers and topics in the data-distribution service. free to correct me if I am wrong :-).
> It was my understanding that a data pathway is only established if the Qos parameters of producer and consumer are matched.
Yes, that is correct. There is only communication between two endpoints when their QoS settings are 'compatible'.
> How is this different than keys?
Keys are not used to determine whether or not communication takes place between two endpoints. Keys are used to identify instances; different data values with the same key value represent successive values for the same instance, while different data values with different key values represent different instances.
It looks like your vision on the purpose of key-fields is purely based on determining where information needs to go to (and especially where its does NOT need to go to). However, in the context of the DCPS specification, keyfields have no particular relationship with QoS settings of Readers and Writers, and therefore do not take part in determining compatibility and therefore connectivity between Readers and Writers.
I get the feeling that you want to use keys as an alternative way to determine connectivity between Readers and Writers, hence the only 1 key per writer limitation. However, the DDS specification provides other mechanisms for that purpose.
As Reinier already explained, the partition QoS is a mechanisms that is specifically designed for determining connectivity between Publishers and Subscribers and middleware implementations may highly optimize their network traffic based on these settings.
Another approach for your problem is to create separate topics for the separate pathways you have in mind. So in stead of using a unique Keyvalue to connect Readers to Writers, you could use unique topics for that purpose. In the DCPS it is allowed to create more than one topic for the same datatype. Each topic may then even have its own QoS settings, regardless of the other topics for the same datatype. By subscribing to a specific Topic your connect to a specific Writer.
However, in each of these approaches you reduce the DCPS to some sort of message queue, since you will not be using keys as a way to dinstinguish between different instances of the same datatype. The big benefit of DCPS over message queues is that it already manages the instances for you, so that you can look at the available information as some sort of relational database where the latest state of each instance is made available to you without the risk of recent instances pusing out other instances, which may happen when all instances are collected in a 1-dimensional FIFO based queue, like traditional message queues are doing.
Can you shed some light on the type of application you are trying to build? Does your information model have a need for distinguishing between different instances of the same type? (For example different 'tracks' on a Radar plot, different StockQuotes in a stock trading system, etc.)
You wrote: >>It looks like your vision on the purpose of key-fields is purely based on >>determining where information needs to go to (and especially where its >>does NOT need to go to). However, in the context of the DCPS specification, >>keyfields have no particular relationship with QoS settings of Readers and >>Writers, and therefore do not take part in determining compatibility and >>therefore connectivity between Readers and Writers.
I am sorry for the confusion. This was in direct response to your quote "Since a subscriber is not aware of the information available at the publisher side ...". I did not mean to imply that keys are associated with Qos. I was making the association of the subscriber not knowing about the instances being published just as the subscriber not knowing the Qos underwhich the instance is published.
Here is my reasoning. Topic (and Qos) matching governs connectivity bewteen DDS entities. Topics are defined by a name and a type. Within the type definition is the specification of one or more keys. Therefore, in my opinion, keys are part of the definition of a topic. If topics govern connectivity and keys are part of the definition of a topic, one would think keys would play a role in entity connectivity.
I really needed to nail down the specifics as my group was considering a topic/data design based on the above assumption which would allow us to cleanly overlay our interface onto a DDS implementation. I will investigate Partitioning as an alternative. Thanks to everybody for their time.
posted on Thursday, September 10, 2009 - 05:14 pm
I understand that it's an implementation detail whether the filtering is on the sending or receiving side but does this not have implications for DDSI?
posted on Monday, May 31, 2010 - 08:57 am
I 'm developing a simulator with respect to the publish/subscribe paradigm. I still have confusion in the relationship between publisher and subscriber:how they could be connected over the same topic? what i have understood is the publihser writes data in a global shared space (periodically ) and subscriber also perodically access to the same global shared space and retrieve data without knowing the publisher which has writen it. is it right? ohter question about qos: is it possible for a subscriber to specify both deadline ,lifespan and transport priority? in this case i suppose that publihser and subscriber are executing periodically.i consider the same qos for the publihser and i try to check compatibility between them. thanks for your help
posted on Thursday, October 21, 2010 - 10:28 pm
Like Bryan, I have been trying to use keyed topics and instances to organize one generator of samples and N processors of those samples. Thanks to finding this discussion I can now stop wasting my time and get on with some other approach.
However, I would like to make a claim about the DDS model from a naive point of view to get some feedback.
The ability to subscribe to individual instances of a topic (if it were provided) would not, by itself, change the model from publish/subscribe to a messaging model. If a topic is (in the sense mentioned above) a relational table in the global data space then an instance is just a selection (projection?) of a table. There is no requirement that only one publisher and only one subscriber use an instance any more than would be so for a full topic. Instances (if they existed in the global data space) would function as "subtopics".