SAGA Message API Extension

Hi Folx , here is the updated draft of the SAGA Message API, in preparation for OGF-19. It is also available in CVS, as usual. We would be happy to get feedback on the list of course, so don't feel oblidged to hold back your comments for OGF-19 :-) Cheers, Andre. -- "So much time, so little to do..." -- Garfield

Hi Andre, In Chapter 2, the paragraphs explaining the message transfer requirements are a bit confusing. 1) it says it supports multicast (which is inherently unreliable). I'm sure you mean to say a "message bus" (sort of like the Linux DBus concept), which would not specifically call out a particular network standard (multicast is more specific than a "message bus"). 2) It says the message must be received completely and correctly, or not at all. This leaves some aspects of the reliability uncertain. For instance a) does this mean it must guarantee delivery of a message? Or only that a delivered message does not contain errors? b) what about if the message arrives more than once? (eg. redundant copies of messages can occur in practice for a number of unreliable or semi-reliable messaging protocols). c) It says this document will not address things at a protocol level, but I think these issues are semantic and therefore must be addressed by the API. d) I think there are a number of attributes that users should be able to supply or query when opening a message service connnection. Like the JMS (Java Message Service), we should be able to specify whether this is a point-to-point (message queue) or a publish- subscribe (message bus) like interface. The API should not require any work to support both since point-to-point is a sub-category of the message-bus, but it should be an attribute of the message interface that a user can force/specify in the opening of the connection. Another thing to support is specification or query of the message service reliability (something that deserves at least a subsection to define. That is, the document should define classes of reliability (just as done with other XML-based messaging APIs that even define intermediate cases for unreliable messaging that guarantee message arrival, but do not ensure that duplicate messages will not arrive). The API should allow a user to specify these semantic attributes and will not allow a connection to be built if the underlying protocol for the message connection cannot meet those attributes. -john On Jan 16, 2007, at 11:17 AM, Andre Merzky wrote:
Hi Folx ,
here is the updated draft of the SAGA Message API, in preparation for OGF-19. It is also available in CVS, as usual.
We would be happy to get feedback on the list of course, so don't feel oblidged to hold back your comments for OGF-19 :-)
Cheers, Andre.
-- "So much time, so little to do..." -- Garfield <saga_messages.pdf>

Hi John! Quoting [John Shalf] (Jan 16 2007):
Hi Andre, In Chapter 2, the paragraphs explaining the message transfer requirements are a bit confusing. 1) it says it supports multicast (which is inherently unreliable). I'm sure you mean to say a "message bus" (sort of like the Linux DBus concept), which would not specifically call out a particular network standard (multicast is more specific than a "message bus").
You are right: using 'multicast' here is probably misleading. Thilo made a similar comment, so I'll change it to "message bus" or so.
2) It says the message must be received completely and correctly, or not at all. This leaves some aspects of the reliability uncertain. For instance a) does this mean it must guarantee delivery of a message? Or only that a delivered message does not contain errors?
It probably needs a more verbose explanation. Tyhe current idea is: - the implementation MUST ensure that the messages are complete and error free - if message delivery is guaranteed or not is up to the implementation, and up to the used protocol. The implementation MUST document that aspect, and the URL (scheme...) allows to choose between different reliability modes. I wanted to avoid a 'reliability' attribute on the endpoint, as I don't see a reasonable easy way to switch reliability in mid stream. It is most likely that changes in reliability policy would require different protocols (only few protocols will be able to serve both ends), and would hence need a new connection setup anyway. Not sure if that is reasonable though. Anyway, that would leave us with specification of reliability on connection setup (i.e. endpoint construction) - right now that is done via the scheme part of the URL. Would a flag be more appropriate/explicit? Probably...
b) what about if the message arrives more than once? (eg. redundant copies of messages can occur in practice for a number of unreliable or semi-reliable messaging protocols).
Good point. IMHO we should specify that messages arrive 'AT MOST ONCE' (unreliable) or EXACTLY ONCE (reliable). I don't think that we need to support additional modes, nor the complete spectrum of (un/)reliable transmission modes. Or?
c) It says this document will not address things at a protocol level, but I think these issues are semantic and therefore must be addressed by the API.
IC, good point. Well, flags on the endpoint construction would solve that.
d) I think there are a number of attributes that users should be able to supply or query when opening a message service connnection. Like the JMS (Java Message Service), we should be able to specify whether this is a point-to-point (message queue) or a publish- subscribe (message bus) like interface. The API should not require any work to support both since point-to-point is a sub-category of the message-bus, but it should be an attribute of the message interface that a user can force/specify in the opening of the connection.
Hmm, point to point could be simply established on application level, by just setting up a single connection. Uhm, problem: serve() can only be started, not stopped. An int argument to serve(), giving the number of clients to wait for, would solve that. // point to point: saga::message msg; saga::endpoint ep; ep.serve (1); // allow one client ep.recv (&msg); // expect one message exit; // publish/subscribe saga::message msg; saga::endpoint ep; ep.serve (); // allow n clients while ( 1 ) { ep.recv (&msg); // expect m messages } Does that make sense?
Another thing to support is specification or query of the message service reliability (something that deserves at least a subsection to define. That is, the document should define classes of reliability (just as done with other XML-based messaging APIs that even define intermediate cases for unreliable messaging that guarantee message arrival, but do not ensure that duplicate messages will not arrive). The API should allow a user to specify these semantic attributes and will not allow a connection to be built if the underlying protocol for the message connection cannot meet those attributes.
Again, that could be done by using flags on the ep creation. I guess that a 'more reliable' connection than requested would be possible? E.g., saga::endpoint ep (saga::message::Unreliable); could actually set up an unreliable or an reliable connection, as both would fulfill the user req.? Thanks, Andre.
-john
On Jan 16, 2007, at 11:17 AM, Andre Merzky wrote:
Hi Folx ,
here is the updated draft of the SAGA Message API, in preparation for OGF-19. It is also available in CVS, as usual.
We would be happy to get feedback on the list of course, so don't feel oblidged to hold back your comments for OGF-19 :-)
Cheers, Andre.
-- "So much time, so little to do..." -- Garfield <saga_messages.pdf>
-- "So much time, so little to do..." -- Garfield

Hi John, attached is an updated version with some changes addressing your comments. The respective parts in the intro are marked as ***NEW***. The changes in the IDL and 'Details' section affect the endpoint constructor and the serve() method only. Would these changes address your points? Thanks, Andre. Quoting [Andre Merzky] (Jan 17 2007):
From: Andre Merzky <andre@merzky.net> To: John Shalf <jshalf@lbl.gov> Cc: Andre Merzky <andre@merzky.net>, SAGA RG <saga-rg@ogf.org>, Andrei Hutanu <ahutanu@cct.lsu.edu>, Werner Benger <benger@zib.de>, Gregor von Laszewski <gregor@mcs.anl.gov> Subject: Re: SAGA Message API Extension
Hi John!
Quoting [John Shalf] (Jan 16 2007):
Hi Andre, In Chapter 2, the paragraphs explaining the message transfer requirements are a bit confusing. 1) it says it supports multicast (which is inherently unreliable). I'm sure you mean to say a "message bus" (sort of like the Linux DBus concept), which would not specifically call out a particular network standard (multicast is more specific than a "message bus").
You are right: using 'multicast' here is probably misleading. Thilo made a similar comment, so I'll change it to "message bus" or so.
2) It says the message must be received completely and correctly, or not at all. This leaves some aspects of the reliability uncertain. For instance a) does this mean it must guarantee delivery of a message? Or only that a delivered message does not contain errors?
It probably needs a more verbose explanation. Tyhe current idea is:
- the implementation MUST ensure that the messages are complete and error free
- if message delivery is guaranteed or not is up to the implementation, and up to the used protocol. The implementation MUST document that aspect, and the URL (scheme...) allows to choose between different reliability modes.
I wanted to avoid a 'reliability' attribute on the endpoint, as I don't see a reasonable easy way to switch reliability in mid stream. It is most likely that changes in reliability policy would require different protocols (only few protocols will be able to serve both ends), and would hence need a new connection setup anyway. Not sure if that is reasonable though.
Anyway, that would leave us with specification of reliability on connection setup (i.e. endpoint construction) - right now that is done via the scheme part of the URL. Would a flag be more appropriate/explicit? Probably...
b) what about if the message arrives more than once? (eg. redundant copies of messages can occur in practice for a number of unreliable or semi-reliable messaging protocols).
Good point. IMHO we should specify that messages arrive 'AT MOST ONCE' (unreliable) or EXACTLY ONCE (reliable). I don't think that we need to support additional modes, nor the complete spectrum of (un/)reliable transmission modes. Or?
c) It says this document will not address things at a protocol level, but I think these issues are semantic and therefore must be addressed by the API.
IC, good point. Well, flags on the endpoint construction would solve that.
d) I think there are a number of attributes that users should be able to supply or query when opening a message service connnection. Like the JMS (Java Message Service), we should be able to specify whether this is a point-to-point (message queue) or a publish- subscribe (message bus) like interface. The API should not require any work to support both since point-to-point is a sub-category of the message-bus, but it should be an attribute of the message interface that a user can force/specify in the opening of the connection.
Hmm, point to point could be simply established on application level, by just setting up a single connection.
Uhm, problem: serve() can only be started, not stopped. An int argument to serve(), giving the number of clients to wait for, would solve that.
// point to point: saga::message msg; saga::endpoint ep; ep.serve (1); // allow one client ep.recv (&msg); // expect one message exit;
// publish/subscribe saga::message msg; saga::endpoint ep; ep.serve (); // allow n clients while ( 1 ) { ep.recv (&msg); // expect m messages }
Does that make sense?
Another thing to support is specification or query of the message service reliability (something that deserves at least a subsection to define. That is, the document should define classes of reliability (just as done with other XML-based messaging APIs that even define intermediate cases for unreliable messaging that guarantee message arrival, but do not ensure that duplicate messages will not arrive). The API should allow a user to specify these semantic attributes and will not allow a connection to be built if the underlying protocol for the message connection cannot meet those attributes.
Again, that could be done by using flags on the ep creation. I guess that a 'more reliable' connection than requested would be possible? E.g.,
saga::endpoint ep (saga::message::Unreliable);
could actually set up an unreliable or an reliable connection, as both would fulfill the user req.?
Thanks, Andre.
-john
On Jan 16, 2007, at 11:17 AM, Andre Merzky wrote:
Hi Folx ,
here is the updated draft of the SAGA Message API, in preparation for OGF-19. It is also available in CVS, as usual.
We would be happy to get feedback on the list of course, so don't feel oblidged to hold back your comments for OGF-19 :-)
Cheers, Andre.
-- "So much time, so little to do..." -- Garfield <saga_messages.pdf>
-- "So much time, so little to do..." -- Garfield

On Jan 17, 2007, at 4:23 AM, Andre Merzky wrote:
Hi John!
Quoting [John Shalf] (Jan 16 2007):
Hi Andre, In Chapter 2, the paragraphs explaining the message transfer requirements are a bit confusing. 1) it says it supports multicast (which is inherently unreliable). I'm sure you mean to say a "message bus" (sort of like the Linux DBus concept), which would not specifically call out a particular network standard (multicast is more specific than a "message bus").
You are right: using 'multicast' here is probably misleading. Thilo made a similar comment, so I'll change it to "message bus" or so.
2) It says the message must be received completely and correctly, or not at all. This leaves some aspects of the reliability uncertain. For instance a) does this mean it must guarantee delivery of a message? Or only that a delivered message does not contain errors?
It probably needs a more verbose explanation. Tyhe current idea is:
- the implementation MUST ensure that the messages are complete and error free
- if message delivery is guaranteed or not is up to the implementation, and up to the used protocol. The implementation MUST document that aspect, and the URL (scheme...) allows to choose between different reliability modes.
I agree. I think the reliability semantics should be a property of the channel (and hence underlying protocol) and not something to switch at runtime. However, I think the clients that use the connection should be able to query what the channel properties are (or define the minimum requirements and throw an exception if they cannot be met). This is just a little introspection (not really a deep or fancy coding underneath).
I wanted to avoid a 'reliability' attribute on the endpoint, as I don't see a reasonable easy way to switch reliability in mid stream. It is most likely that changes in reliability policy would require different protocols (only few protocols will be able to serve both ends), and would hence need a new connection setup anyway. Not sure if that is reasonable though.
Anyway, that would leave us with specification of reliability on connection setup (i.e. endpoint construction) - right now that is done via the scheme part of the URL. Would a flag be more appropriate/explicit? Probably...
b) what about if the message arrives more than once? (eg. redundant copies of messages can occur in practice for a number of unreliable or semi-reliable messaging protocols).
Good point. IMHO we should specify that messages arrive 'AT MOST ONCE' (unreliable) or EXACTLY ONCE (reliable). I don't think that we need to support additional modes, nor the complete spectrum of (un/)reliable transmission modes. Or?
I cannot remember the document, but there was a spec for defining reliability of another kind of messaging bus. It had unreliable: message may or may not arrive (typical UDP). semi-reliable: message must arrive (will be retransmitted if not acked). Source keeps resending until it sees the ack. It is possible that the ack is lost, in which case, the message may arrive at the client twice (so this is a Message will arrive at least once, and possibly more than once.) This protocol is useful since the message destination does not need to maintain complex message state... it just acks messages that arrive (very simple). reliable: the message must arrive and it will only arrive once.
c) It says this document will not address things at a protocol level, but I think these issues are semantic and therefore must be addressed by the API.
IC, good point. Well, flags on the endpoint construction would solve that.
d) I think there are a number of attributes that users should be able to supply or query when opening a message service connnection. Like the JMS (Java Message Service), we should be able to specify whether this is a point-to-point (message queue) or a publish- subscribe (message bus) like interface. The API should not require any work to support both since point-to-point is a sub-category of the message-bus, but it should be an attribute of the message interface that a user can force/specify in the opening of the connection.
Hmm, point to point could be simply established on application level, by just setting up a single connection.
Uhm, problem: serve() can only be started, not stopped. An int argument to serve(), giving the number of clients to wait for, would solve that.
// point to point: saga::message msg; saga::endpoint ep; ep.serve (1); // allow one client ep.recv (&msg); // expect one message exit;
// publish/subscribe saga::message msg; saga::endpoint ep; ep.serve (); // allow n clients while ( 1 ) { ep.recv (&msg); // expect m messages }
Does that make sense?
That does address one of the cases (not exactly the one I was thinking of). Also should allow -1 to specify *any-number-of-clients* at the underlying protocol's discretion. But now for a semantic thicket.... Case 1: I connect to a named destination and I am joined together with everyone else who has connected to that port. In this case, any message I send will be broadcast to everyone else and vice verse (message bus). This is the equivalent of a publish-subscribe service. Case 2: I connect to a port and I own the service. This appears to be the case you are setting up above as it only allows one client to join the service. Case 3: I connect to the port and for a sub-process that does not share the messages with the other clients that have connected to that port (like an HTTP server). I don't quite see how that kind of point- to-point message service is supported. I think we need an attribute for the message port that says whether it is a bus or a point-to-point. In addition, I like the idea of setting the message queue length (as you have above). I was not thinking of that, but I can see the value of creating a first-come- first-serve message port as well. (if that is not too complicated).
Another thing to support is specification or query of the message service reliability (something that deserves at least a subsection to define. That is, the document should define classes of reliability (just as done with other XML-based messaging APIs that even define intermediate cases for unreliable messaging that guarantee message arrival, but do not ensure that duplicate messages will not arrive). The API should allow a user to specify these semantic attributes and will not allow a connection to be built if the underlying protocol for the message connection cannot meet those attributes.
Again, that could be done by using flags on the ep creation. I guess that a 'more reliable' connection than requested would be possible? E.g.,
saga::endpoint ep (saga::message::Unreliable);
could actually set up an unreliable or an reliable connection, as both would fulfill the user req.?
Yes, I think it is sufficient to define these things at the setup of the endpoint (not something you would change at runtime). -john

Quoting [John Shalf] (Jan 17 2007):
I agree. I think the reliability semantics should be a property of the channel (and hence underlying protocol) and not something to switch at runtime. However, I think the clients that use the connection should be able to query what the channel properties are (or define the minimum requirements and throw an exception if they cannot be met). This is just a little introspection (not really a deep or fancy coding underneath).
I agree - that can safely be done with a read-only attribute.
I wanted to avoid a 'reliability' attribute on the endpoint, as I don't see a reasonable easy way to switch reliability in mid stream. It is most likely that changes in reliability policy would require different protocols (only few protocols will be able to serve both ends), and would hence need a new connection setup anyway. Not sure if that is reasonable though.
Anyway, that would leave us with specification of reliability on connection setup (i.e. endpoint construction) - right now that is done via the scheme part of the URL. Would a flag be more appropriate/explicit? Probably...
b) what about if the message arrives more than once? (eg. redundant copies of messages can occur in practice for a number of unreliable or semi-reliable messaging protocols).
Good point. IMHO we should specify that messages arrive 'AT MOST ONCE' (unreliable) or EXACTLY ONCE (reliable). I don't think that we need to support additional modes, nor the complete spectrum of (un/)reliable transmission modes. Or?
I cannot remember the document, but there was a spec for defining reliability of another kind of messaging bus. It had unreliable: message may or may not arrive (typical UDP). semi-reliable: message must arrive (will be retransmitted if not acked). Source keeps resending until it sees the ack. It is possible that the ack is lost, in which case, the message may arrive at the client twice (so this is a Message will arrive at least once, and possibly more than once.) This protocol is useful since the message destination does not need to maintain complex message state... it just acks messages that arrive (very simple). reliable: the message must arrive and it will only arrive once.
I added to the spec: The available realiability levels are: \up \begin{tabbing} XXXXXXXXXX \= \kill |Unreliable|: \> messages MAY (or may not) reach the remote clients.\\[0.3em] |Atomic|: \> |Unreliable|, but a message received by one client is\\ \> guaranteed to (MUST) arrive at all clients.\\[0.3em] |SemiReliable|: \> messages are guaranteed to (MUST) arrive at all\\ \> clients, but may arrive more than once.\\[0.3em] |Reliable|: \> all messages are guaranteed to (MUST) arrive at\\ \> all clients.\\[0.3em] \end{tabbing} That is similar to your list, but adds the 'atomic' one. Not sure if that is useful (sounds like) or easily implementable (??), but I found it in a list of relibility modes for message transmissions which looked sensible, and seems to be close to what you propose. [ about point-to-point ]
That does address one of the cases (not exactly the one I was thinking of). Also should allow -1 to specify *any-number-of-clients* at the underlying protocol's discretion.
Yes, definitely, thats in the spec already.
But now for a semantic thicket....
Case 1: I connect to a named destination and I am joined together with everyone else who has connected to that port. In this case, any message I send will be broadcast to everyone else and vice verse (message bus). This is the equivalent of a publish-subscribe service. Case 2: I connect to a port and I own the service. This appears to be the case you are setting up above as it only allows one client to join the service. Case 3: I connect to the port and for a sub-process that does not share the messages with the other clients that have connected to that port (like an HTTP server). I don't quite see how that kind of point- to-point message service is supported.
I think we need an attribute for the message port that says whether it is a bus or a point-to-point. In addition, I like the idea of setting the message queue length (as you have above). I was not thinking of that, but I can see the value of creating a first-come- first-serve message port as well. (if that is not too complicated).
Got you (I think) :-) Yes, that makes sense. I added a 'topology' enum and attribute, which is handled similarly to the 'reliability' property (static over connection/endpoint lifetime, inspection via ReadOnly attribute). New draft is attached, with new sections marked. It would be great if you could review the paragraphs about connection topology. Thanks! Andre. -- "So much time, so little to do..." -- Garfield

On Jan 18, 2007, at 1:23 AM, Andre Merzky wrote:
|Unreliable|: \> messages MAY (or may not) reach the remote clients.\\[0.3em] |Atomic|: \> |Unreliable|, but a message received by one client is\\ \> guaranteed to (MUST) arrive at all clients.\\[0.3em] |SemiReliable|: \> messages are guaranteed to (MUST) arrive at all\\ \> clients, but may arrive more than once.\\ [0.3em] |Reliable|: \> all messages are guaranteed to (MUST) arrive at\\ \> all clients.\\[0.3em] \end{tabbing}
That is similar to your list, but adds the 'atomic' one. Not sure if that is useful (sounds like) or easily implementable (??), but I found it in a list of relibility modes for message transmissions which looked sensible, and seems to be close to what you propose.
That sounds good to me.
Got you (I think) :-) Yes, that makes sense. I added a 'topology' enum and attribute, which is handled similarly to the 'reliability' property (static over connection/endpoint lifetime, inspection via ReadOnly attribute).
New draft is attached, with new sections marked. It would be great if you could review the paragraphs about connection topology.
Looks good to me! I think you have addressed all of my concerns. -john

Hi Andre, Thanks for writing this up. I think it looks very good but I still have just a few quick comments to the latest version.. 1) Motivation - can make a bit stronger perhaps : a)to break out of the stream semantics (byte-oriented, strict ordering and reliability) and b)to remove the burden from the application programmer of dealing with arbitrary-sized entities such as Ethernet packet size. 2) I see ordering is enforced, could that be an option? 3) community : it would be nice if some of the active transport protocol developers would give their input. I would think the UDT team : Robert Grossman and Yunhong Gu and the EVL team : Venkatram Vishwanath should take a look at this. Perhaps the XIO team as well. They may all be already involved but just in case. Andrei Andre Merzky wrote:
Quoting [John Shalf] (Jan 17 2007):
I agree. I think the reliability semantics should be a property of the channel (and hence underlying protocol) and not something to switch at runtime. However, I think the clients that use the connection should be able to query what the channel properties are (or define the minimum requirements and throw an exception if they cannot be met). This is just a little introspection (not really a deep or fancy coding underneath).
I agree - that can safely be done with a read-only attribute.
I wanted to avoid a 'reliability' attribute on the endpoint, as I don't see a reasonable easy way to switch reliability in mid stream. It is most likely that changes in reliability policy would require different protocols (only few protocols will be able to serve both ends), and would hence need a new connection setup anyway. Not sure if that is reasonable though.
Anyway, that would leave us with specification of reliability on connection setup (i.e. endpoint construction) - right now that is done via the scheme part of the URL. Would a flag be more appropriate/explicit? Probably...
b) what about if the message arrives more than once? (eg. redundant copies of messages can occur in practice for a number of unreliable or semi-reliable messaging protocols).
Good point. IMHO we should specify that messages arrive 'AT MOST ONCE' (unreliable) or EXACTLY ONCE (reliable). I don't think that we need to support additional modes, nor the complete spectrum of (un/)reliable transmission modes. Or?
I cannot remember the document, but there was a spec for defining reliability of another kind of messaging bus. It had unreliable: message may or may not arrive (typical UDP). semi-reliable: message must arrive (will be retransmitted if not acked). Source keeps resending until it sees the ack. It is possible that the ack is lost, in which case, the message may arrive at the client twice (so this is a Message will arrive at least once, and possibly more than once.) This protocol is useful since the message destination does not need to maintain complex message state... it just acks messages that arrive (very simple). reliable: the message must arrive and it will only arrive once.
I added to the spec:
The available realiability levels are:
\up \begin{tabbing} XXXXXXXXXX \= \kill |Unreliable|: \> messages MAY (or may not) reach the remote clients.\\[0.3em]
|Atomic|: \> |Unreliable|, but a message received by one client is\\ \> guaranteed to (MUST) arrive at all clients.\\[0.3em]
|SemiReliable|: \> messages are guaranteed to (MUST) arrive at all\\ \> clients, but may arrive more than once.\\[0.3em]
|Reliable|: \> all messages are guaranteed to (MUST) arrive at\\ \> all clients.\\[0.3em] \end{tabbing}
That is similar to your list, but adds the 'atomic' one. Not sure if that is useful (sounds like) or easily implementable (??), but I found it in a list of relibility modes for message transmissions which looked sensible, and seems to be close to what you propose.
[ about point-to-point ]
That does address one of the cases (not exactly the one I was thinking of). Also should allow -1 to specify *any-number-of-clients* at the underlying protocol's discretion.
Yes, definitely, thats in the spec already.
But now for a semantic thicket....
Case 1: I connect to a named destination and I am joined together with everyone else who has connected to that port. In this case, any message I send will be broadcast to everyone else and vice verse (message bus). This is the equivalent of a publish-subscribe service. Case 2: I connect to a port and I own the service. This appears to be the case you are setting up above as it only allows one client to join the service. Case 3: I connect to the port and for a sub-process that does not share the messages with the other clients that have connected to that port (like an HTTP server). I don't quite see how that kind of point- to-point message service is supported.
I think we need an attribute for the message port that says whether it is a bus or a point-to-point. In addition, I like the idea of setting the message queue length (as you have above). I was not thinking of that, but I can see the value of creating a first-come- first-serve message port as well. (if that is not too complicated).
Got you (I think) :-) Yes, that makes sense. I added a 'topology' enum and attribute, which is handled similarly to the 'reliability' property (static over connection/endpoint lifetime, inspection via ReadOnly attribute).
New draft is attached, with new sections marked. It would be great if you could review the paragraphs about connection topology.
Thanks!
Andre.

On Jan 18, 2007, at 10:36 AM, Andrei Hutanu wrote:
Hi Andre,
Thanks for writing this up. I think it looks very good but I still have just a few quick comments to the latest version..
1) Motivation - can make a bit stronger perhaps : a)to break out of the stream semantics (byte-oriented, strict ordering and reliability) and b)to remove the burden from the application programmer of dealing with arbitrary-sized entities such as Ethernet packet size.
2) I see ordering is enforced, could that be an option?
I think ordering is *not* enforced, but I do wonder if it should be an option or a channel property (certainly semireliable will likely result in some reording whereas a TCP channel would enforce ordering of the messages for instance). This is a controversial topic in the HPC message passing community (whether msg. ordering is a good or bad-thing to enforce in at the hardware level).
3) community : it would be nice if some of the active transport protocol developers would give their input. I would think the UDT team : Robert Grossman and Yunhong Gu and the EVL team : Venkatram Vishwanath should take a look at this. Perhaps the XIO team as well. They may all be already involved but just in case.
That's actually a pretty good suggestion. Perhaps its not worth holding up the current draft submission, but getting the lower level transport poeple on board is pretty powerful.
Andrei
Andre Merzky wrote:
Quoting [John Shalf] (Jan 17 2007):
I agree. I think the reliability semantics should be a property of the channel (and hence underlying protocol) and not something to switch at runtime. However, I think the clients that use the connection should be able to query what the channel properties are (or define the minimum requirements and throw an exception if they cannot be met). This is just a little introspection (not really a deep or fancy coding underneath).
I agree - that can safely be done with a read-only attribute.
I wanted to avoid a 'reliability' attribute on the endpoint, as I don't see a reasonable easy way to switch reliability in mid stream. It is most likely that changes in reliability policy would require different protocols (only few protocols will be able to serve both ends), and would hence need a new connection setup anyway. Not sure if that is reasonable though.
Anyway, that would leave us with specification of reliability on connection setup (i.e. endpoint construction) - right now that is done via the scheme part of the URL. Would a flag be more appropriate/explicit? Probably...
b) what about if the message arrives more than once? (eg. redundant copies of messages can occur in practice for a number of unreliable or semi-reliable messaging protocols).
Good point. IMHO we should specify that messages arrive 'AT MOST ONCE' (unreliable) or EXACTLY ONCE (reliable). I don't think that we need to support additional modes, nor the complete spectrum of (un/)reliable transmission modes. Or?
I cannot remember the document, but there was a spec for defining reliability of another kind of messaging bus. It had unreliable: message may or may not arrive (typical UDP). semi-reliable: message must arrive (will be retransmitted if not acked). Source keeps resending until it sees the ack. It is possible that the ack is lost, in which case, the message may arrive at the client twice (so this is a Message will arrive at least once, and possibly more than once.) This protocol is useful since the message destination does not need to maintain complex message state... it just acks messages that arrive (very simple). reliable: the message must arrive and it will only arrive once.
I added to the spec:
The available realiability levels are:
\up \begin{tabbing} XXXXXXXXXX \= \kill |Unreliable|: \> messages MAY (or may not) reach the remote clients.\\[0.3em]
|Atomic|: \> |Unreliable|, but a message received by one client is\ \ \> guaranteed to (MUST) arrive at all clients.\\[0.3em]
|SemiReliable|: \> messages are guaranteed to (MUST) arrive at all\\ \> clients, but may arrive more than once.\\ [0.3em]
|Reliable|: \> all messages are guaranteed to (MUST) arrive at\\ \> all clients.\\[0.3em] \end{tabbing}
That is similar to your list, but adds the 'atomic' one. Not sure if that is useful (sounds like) or easily implementable (??), but I found it in a list of relibility modes for message transmissions which looked sensible, and seems to be close to what you propose.
[ about point-to-point ]
That does address one of the cases (not exactly the one I was thinking of). Also should allow -1 to specify *any-number-of-clients* at the underlying protocol's discretion.
Yes, definitely, thats in the spec already.
But now for a semantic thicket....
Case 1: I connect to a named destination and I am joined together with everyone else who has connected to that port. In this case, any message I send will be broadcast to everyone else and vice verse (message bus). This is the equivalent of a publish-subscribe service. Case 2: I connect to a port and I own the service. This appears to be the case you are setting up above as it only allows one client to join the service. Case 3: I connect to the port and for a sub-process that does not share the messages with the other clients that have connected to that port (like an HTTP server). I don't quite see how that kind of point- to-point message service is supported.
I think we need an attribute for the message port that says whether it is a bus or a point-to-point. In addition, I like the idea of setting the message queue length (as you have above). I was not thinking of that, but I can see the value of creating a first-come- first-serve message port as well. (if that is not too complicated).
Got you (I think) :-) Yes, that makes sense. I added a 'topology' enum and attribute, which is handled similarly to the 'reliability' property (static over connection/endpoint lifetime, inspection via ReadOnly attribute).
New draft is attached, with new sections marked. It would be great if you could review the paragraphs about connection topology.
Thanks!
Andre.

Hi,
2) I see ordering is enforced, could that be an option?
I think ordering is *not* enforced, but I do wonder if it should be an option or a channel property (certainly semireliable will likely result in some reording whereas a TCP channel would enforce ordering of the messages for instance).
This is a controversial topic in the HPC message passing community (whether msg. ordering is a good or bad-thing to enforce in at the hardware level).
I was thinking the same (no strong feelings for either option or property) but the text tells otherwise : In 2.1 introduction : In contrast, this message API extension guarantees that message blocks of arbitrary size are delivered in order, and intact, without the need for additional application level coordination or synchronization. and then in 2.1.7 reliability corectness and ordering The order of sent messages MUST be preserved by the implementation. Global ordering is, however, not guaranteed to be preserved: Assume three endpoints A, B and C, all connected to each other. If A sends two messages [a1, a2], in this order, it is guaranteed that both B and C receive the messages in this order [a1, a2]. If, however, A sends a message [a1] and then B sends a message [b1], C may receive the messages in either order, [a1, b1] or [b1, a1]. Andrei

Hi John, Andrei, you are right: getting some feedback from the transport level folx is certainly a good idea. The API draft won't go into public comment for another month or so (at least), and then it will stay in public comment for another 2 months or longer - that should give us enough time to contact them. About ordering: the text Andrei cited is in the spec because ordering is, as of now, not an attribute of the connection or endpoint - so the spec tries to nail it down. It says "MUST be ordered, but no global ordering is required" because I thought that this covers the majority of use cases. I don't think there are use cases which require global ordering - or at least not enough to justify a requirement for global ordering. What is your opinion? Also, thats really difficult to implement in Grids IMHO. Use cases which do not require ordering should be happy with order preserving connections, too. Question now is: does the benefit of un-ordered implementations (simplier, smaller footprint) justify an attribute on API level? Or are there use cases which require non-ordered delivery for other reasons? Cheers, Andre. Quoting [Andrei Hutanu] (Jan 18 2007):
Hi,
2) I see ordering is enforced, could that be an option?
I think ordering is *not* enforced, but I do wonder if it should be an option or a channel property (certainly semireliable will likely result in some reording whereas a TCP channel would enforce ordering of the messages for instance).
This is a controversial topic in the HPC message passing community (whether msg. ordering is a good or bad-thing to enforce in at the hardware level).
I was thinking the same (no strong feelings for either option or property) but the text tells otherwise : In 2.1 introduction : In contrast, this message API extension guarantees that message blocks of arbitrary size are delivered in order, and intact, without the need for additional application level coordination or synchronization. and
then in 2.1.7 reliability corectness and ordering The order of sent messages MUST be preserved by the implementation. Global ordering is, however, not guaranteed to be preserved:
Assume three endpoints A, B and C, all connected to each other. If A sends two messages [a1, a2], in this order, it is guaranteed that both B and C receive the messages in this order [a1, a2]. If, however, A sends a message [a1] and then B sends a message [b1], C may receive the messages in either order, [a1, b1] or [b1, a1].
Andrei
-- "So much time, so little to do..." -- Garfield

Use cases which do not require ordering should be happy with order preserving connections, too. Question now is: does the benefit of un-ordered implementations (simplier, smaller footprint) justify an attribute on API level?
In my opinion yes.
Or are there use cases which require non-ordered delivery for other reasons?
Andrei
Cheers, Andre.
Quoting [Andrei Hutanu] (Jan 18 2007):
Hi,
2) I see ordering is enforced, could that be an option?
I think ordering is *not* enforced, but I do wonder if it should be an option or a channel property (certainly semireliable will likely result in some reording whereas a TCP channel would enforce ordering of the messages for instance).
This is a controversial topic in the HPC message passing community (whether msg. ordering is a good or bad-thing to enforce in at the hardware level).
I was thinking the same (no strong feelings for either option or property) but the text tells otherwise : In 2.1 introduction : In contrast, this message API extension guarantees that message blocks of arbitrary size are delivered in order, and intact, without the need for additional application level coordination or synchronization. and
then in 2.1.7 reliability corectness and ordering The order of sent messages MUST be preserved by the implementation. Global ordering is, however, not guaranteed to be preserved:
Assume three endpoints A, B and C, all connected to each other. If A sends two messages [a1, a2], in this order, it is guaranteed that both B and C receive the messages in this order [a1, a2]. If, however, A sends a message [a1] and then B sends a message [b1], C may receive the messages in either order, [a1, b1] or [b1, a1].
Andrei

On Jan 18, 2007, at 12:18 PM, Andre Merzky wrote:
Hi John, Andrei,
you are right: getting some feedback from the transport level folx is certainly a good idea. The API draft won't go into public comment for another month or so (at least), and then it will stay in public comment for another 2 months or longer - that should give us enough time to contact them.
About ordering: the text Andrei cited is in the spec because ordering is, as of now, not an attribute of the connection or endpoint - so the spec tries to nail it down. It says "MUST be ordered, but no global ordering is required" because I thought that this covers the majority of use cases.
I don't think there are use cases which require global ordering - or at least not enough to justify a requirement for global ordering. What is your opinion? Also, thats really difficult to implement in Grids IMHO.
Well as I mentioned before, global ordering is actually a hot topic for debate in folks who are doing the low-level one-sided messaging interfaces (GA/ARMCI vs. UPC/GASNet). The issue with enforcing global ordering is that it limits opportunities for performance optimization and requires a lot more complexity (SW and HW) and software overhead at the endpoints to ensure the ordering is enforced. However, global ordering makes it much easier to send messages that express fences or barriers. As you can imagine, not enforcing ordering (particularly for the message bus case) is a *lot* easier to implement, but makes the concept of fences and simultaneity of events to be more complicted (starts to look like General Relativity brain teasers). If we want to steer clear of this nasty debate, it seems we should be able to query the ordering enforcement (or request it if available) offered by the underlying protocol.
Use cases which do not require ordering should be happy with order preserving connections, too. Question now is: does the benefit of un-ordered implementations (simplier, smaller footprint) justify an attribute on API level? Or are there use cases which require non-ordered delivery for other reasons?
Cheers, Andre.
Quoting [Andrei Hutanu] (Jan 18 2007):
Hi,
2) I see ordering is enforced, could that be an option?
I think ordering is *not* enforced, but I do wonder if it should be an option or a channel property (certainly semireliable will likely result in some reording whereas a TCP channel would enforce ordering of the messages for instance).
This is a controversial topic in the HPC message passing community (whether msg. ordering is a good or bad-thing to enforce in at the hardware level).
I was thinking the same (no strong feelings for either option or property) but the text tells otherwise : In 2.1 introduction : In contrast, this message API extension guarantees that message blocks of arbitrary size are delivered in order, and intact, without the need for additional application level coordination or synchronization. and
then in 2.1.7 reliability corectness and ordering The order of sent messages MUST be preserved by the implementation. Global ordering is, however, not guaranteed to be preserved:
Assume three endpoints A, B and C, all connected to each other. If A sends two messages [a1, a2], in this order, it is guaranteed that both B and C receive the messages in this order [a1, a2]. If, however, A sends a message [a1] and then B sends a message [b1], C may receive the messages in either order, [a1, b1] or [b1, a1].
Andrei
-- "So much time, so little to do..." -- Garfield -- saga-rg mailing list saga-rg@ogf.org http://www.ogf.org/mailman/listinfo/saga-rg

Hello all, I jump into the conversation a bit later then expected. It seems that Andre always choose a time in space-time that is inconvenient for me :p Here is some food: A. Typos I cannot access the CVS so I post some of my stuff here. 1. Copyright at 2006? Should it not be 2007? See page 1 and 23 2. Most notes of the detailed spec should be "Notes: - see notes *on* memory management." 3. Section 2.1, "[ibidem]" ??? 4. Section 2.1.3, paragraph 2 "A message sent by *an* endpoint" 5. Page 17 "Format: serve" instead of " Format: connect" There might be more, but this just catched my eyes. B. API 1. I would called the method "receive()" and not "recv()", in section 2.1.3 you have it right ;) 2. The "test()" method should be called "available()" because it sounds more like test code then API code. The method should be non-blocking and return -1 if no message is available. The timeout should be ignored IMO. 3. Abusive usage of the NotImplemented exception. Most method are mandatory anyway, clean out the spec would be nice. 4. Metric RemoteDisConnect should be RemoteDisconnect. 5. The "msg" class should be called "message" Andre, you are to C (hacker) centric :p Make it simple for use folx with full names. C. Others Here several more complex things. 1. State machine Why do you not include the failed states? Depending on how the message API is handled the failure might be permanent and the object remains in the given state. We can imaging that the dropped can be recovered for some time: You have two attributes attempts and delay, the object is in dropped state until it can reconnect, if the number of attempts are consumed the state is permanent, otherwise it will sleep (delay) for some time and try a new connection. I have this problem with the NAREGI project, for the moment I only allow one more attempt when the connection is dropped or fails (time out, others). After that a fatal exception is thrown. This might be an optional implementation but will ensure resilient implementation. If the failure is permanent it must be handled at application level. Of course the two attribute can be modified anytime during the life time of the object. 2. Silently Discarded & Reliability In section 2.1.2 second paragraph! Gasp no, in a (un)reliable system this cannot be! You should at least return a flag with the "send()" in synchronous connection telling the message was send, ev. acked. In asynchronous it will be a monitorable field, like a ticked for the "send()" call or the task that does the job. Whatever Reliability, Correctness and Order is applied there should be a way to tell the application, ev. Optional implementation, that the message was send. If the underlying API does not support reliability the application can implement one. In a reliable system anyway this should not happen. 3. Correctness I would rather see this as an attribute you can set. If the message is received I am not always interested in knowing if the content is correct. Just getting it is enough. The overhead for correctness depends on the implementation but in most cases this means extra CPU usages that I don’t want. I can think about MPEG streams, if we loose some part or some is corrupted, I don’t care; we go to the next frame. It would be even more powerful if you can set correctness for a given direction specifically. 4. Order Same as correctness. In most cases I don’t care order. I sometimes prefer have control over it on at the application level. This dramatically simplifies the underlying implementation and to have it at application level can be rather easy. In the NAREGI project I have such mechanism. I don’t care about the order for 95% of the messages, just some must be ordered. It would be even more powerful if you can set correctness for a given direction specifically. 5. Opaque messages In the NAREGI project we have already implement a messaging API. Actually we have several, (Socket, File, SOAP, and HTTP), and all use the same message container. Now the API for each implementation isn’t unified yet. We do use opaque messages content because when an application component wants to send a piece of something it just drops its content. That can be any object. When the message moves down to the messaging API it passes through filters that will transform it into byte arrays. The same is true for incoming messages. Filters will transform the content when it moves up in the application. The message class has an extra method called "getBytes()" which ensures to return a byte array. If the content cannot be serialized to a byte array, then nothing happen and null is returned. The "getSize()" will return the size of usable block of the array. The "getData()" might return an array but is not guarantied. In addition the message container has an ID flag to hint the receiving application in how the filters must be applied. It has also other attributes but of no interest so far for SAGA. 6. Memory Management In general for the sending part I totally agree. The management is up to the application. Now for the receiving part I am not entirely satisfied. When the management is done by the implementation I am fine, there is not much to say. But in the case if the application management I have issues: manly concerning efficient memory usage of the message buckets. With message intensive applications the message container should be handled by the application to avoid memory burns/leaks. This is the case for the NAREI project. For example, the application creates a pool of buckets (messages) of lets say 512 bytes each. +90% of the incoming messages are always smaller then 512, so we have no trouble with the buckets since the array capacity is always larger then the size of the message. Now what happen when we have a message larger and we didn’t do the available test because we want to block the current thread with the received? In the current proposal my message gets truncated. That is not good. In the current implementation the implementation will change the array by reallocating the array. The message container has two fields: size of the actual message, and capacity of the current array buffer. In the next version of the API for NAREGI I will add offset in case we want to use larger array blocks, typically from network buffers and allow sub blocks to be used. In C implementation this is not necessary because you can just shift the pointer, Java cannot. 7. Serve time out and shutdown I think that we should allow as optional implementation or usage to have the server method time out. This will either stop the internal thread or for example with a TCP/IP messaging solution to set the SO_TIMEOUT flag. We need this in order to check what is happening with processes. In some cases the job submission tool reports the process running but no IO connection is active, a time out might help us release the hook and do something different before making another attempt. We should also be able to properly stop the service with a "disconnect" or "shutdown", in the case the close doesn’t do it. That means that the server must stop and all the connection handled by the class must be terminated. That is important for us to free the underlying memory and objects allocated. Typically we should be able to close the server sockets in a proper way. 8. Endpoint reuse The current semantic does not allow reusing the same object after a close. It might be handy to support a reconnect with an open if the implementation does not support reconnection. In case the application must handle stable connection it must create a new object at each time, which might be a memory burden. If the URL is not modified and the states are correct I don’t see why we cannot open a connection again after a close. OK That is all for now. I might have other things popping up. -- Best regards, Pascal Kleijer ---------------------------------------------------------------- HPC Marketing Promotion Division, NEC Corporation 1-10, Nisshin-cho, Fuchu, Tokyo, 183-8501, Japan. Tel: +81-(0)42/333.6389 Fax: +81-(0)42/333.6382

Quoting [Pascal Kleijer] (Jan 18 2007):
From: Pascal Kleijer <k-pasukaru@ap.jp.nec.com> To: Andre Merzky <andre@merzky.net>, SAGA RG <saga-rg@ogf.org> CC: John Shalf <JShalf@lbl.gov>, Werner Benger <benger@zib.de> Subject: Re: [SAGA-RG] SAGA Message API Extension
Hello all,
I jump into the conversation a bit later then expected. It seems that Andre always choose a time in space-time that is inconvenient for me :p
Here is some food:
A. Typos I cannot access the CVS so I post some of my stuff here. 1. Copyright at 2006? Should it not be 2007? See page 1 and 23 2. Most notes of the detailed spec should be "Notes: - see notes *on* memory management." 3. Section 2.1, "[ibidem]" ??? 4. Section 2.1.3, paragraph 2 "A message sent by *an* endpoint" 5. Page 17 "Format: serve" instead of " Format: connect"
Perfect, thanks, will fix. I'll check the CVS access - I seem to remember that you should have a login... ibidem: basically means: same as last reference. I wanted to avoid referenceing the core spec again and again. Not sure if this is commonly used though...
There might be more, but this just catched my eyes.
B. API 1. I would called the method "receive()" and not "recv()", in section 2.1.3 you have it right ;)
Yes, I had 'receive' first, but then the examples don't line up so well: ep.send (msg); ep.recv (msg); Isn't that nice? :-P Also, send/recv is the same as the POSIX send/recv. But you may probably right: verbosity may be better here. Lets flip a coin at the OGF session (i.e. lets have a 30 min deiscussion and a random vote about it)? :-) I'm fine either way, really...
2. The "test()" method should be called "available()" because it sounds more like test code then API code. The method should be non-blocking and return -1 if no message is available. The timeout should be ignored IMO.
Agree, test is misleading. But possibly 'check'? (shorter but also explicit)? Reason for timeout: see later. That allows to have test blocking...
3. Abusive usage of the NotImplemented exception. Most method are mandatory anyway, clean out the spec would be nice.
NotImplemented is on all calls, because the whole package might be NotImplemented. We changes that in the Core spec too, lately, and list exceptions like NotImplemented and NoSuccess basically everywhere... The statement from the Core spec intro still holds though: "The NotImplemented exception MUST, however, be used only in necessary cases, for example if an underlying Grid middleware does not provide some capability, and if this capability can also not be emulated. The implementa- tion MUST carefully document and motivate the use of the NotImplemented exception."
4. Metric RemoteDisConnect should be RemoteDisconnect.
Agree.
5. The "msg" class should be called "message"
kind of agree - I like the shorter, but again you are probably right, and verbosity is better.
Andre, you are to C (hacker) centric :p Make it simple for use folx with full names.
I actually wonder that nobody complaend about 'endpoint' - that is what I have some issues with, because its so generic - who knows what other communication schemes we are going to add, which also have endpoints. 'message_endpoint' would be much better, but is too long really. Any suggestion? msg_ep? har har har :-P
C. Others Here several more complex things.
1. State machine Why do you not include the failed states? Depending on how the message API is handled the failure might be permanent and the object remains in the given state. We can imaging that the dropped can be recovered for some time: You have two attributes attempts and delay, the object is in dropped state until it can reconnect, if the number of attempts are consumed the state is permanent, otherwise it will sleep (delay) for some time and try a new connection.
Very good point! We had Failed (and Dropped) earlier. The problem is, when do you enter Failed? Assume you have an endpoint with three open connections, and a send() succeeds to send on two connections, but fails on the third. If the EP is 'Unrealiable', thats not an error anyway. if communication is 'Reliable', its certainly an error, and should get flagged - but should the two good sessions be dropped as well (which would happen when we go to Failed I assume, otherwise the state is meaningless)? States could be assigned to the connections as well, but we don't have any handle on them as of now, and introducing those would REALLY complicate the API (I tried). Any idea?
I have this problem with the NAREGI project, for the moment I only allow one more attempt when the connection is dropped or fails (time out, others). After that a fatal exception is thrown.
Ugh, that SHOULD be hidden in the implementation I think, no need to expose that on API level IMHO. Or?
This might be an optional implementation but will ensure resilient implementation. If the failure is permanent it must be handled at application level. Of course the two attribute can be modified anytime during the life time of the object.
Yes, I agree, it might help to stabilize the implementation on protocol/stream level, but what can the application do with that information, really? drop the connection and reconnect? That would mean to close the endpoint and set up the whole thing again. Much better to handle reconnect on implementation level, and tell the application when that fails -- the app then can still do the same (close/restart).
2. Silently Discarded & Reliability In section 2.1.2 second paragraph! Gasp no, in a (un)reliable system this cannot be! You should at least return a flag with the "send()" in synchronous connection telling the message was send, ev. acked. In asynchronous it will be a monitorable field, like a ticked for the "send()" call or the task that does the job.
Hmm....
Whatever Reliability, Correctness and Order is applied there should be a way to tell the application, ev. Optional implementation, that the message was send. If the underlying API does not support reliability the application can implement one. In a reliable system anyway this should not happen.
IMHO, if the application needs reliability, it sjould use a realible implementation of the message API - the idea is to avoid application level management of the message transpoint. Anyway, I think you make a good point, an ACK might be useful for more than app level reliability. But what do you return? Again, assume an EP is connected to three clients, mode is unreliable, and two messages get through - what do you report? 0.6666? ;-) 2? Is of no use if the app has to check how many clients are connected - that can not be done atomically, as its a separate operation to the send (get_receivers()). So, return true if at least one message got through? What is the use of that? If its not reliable, id does not matter anyway. If its reliable, an error will be flagged anyway if only one client fails... So: you may be right, but I think the semantics is non-trivial.
3. Correctness I would rather see this as an attribute you can set. If the message is received I am not always interested in knowing if the content is correct. Just getting it is enough. The overhead for correctness depends on the implementation but in most cases this means extra CPU usages that I don?t want. I can think about MPEG streams, if we loose some part or some is corrupted, I don?t care; we go to the next frame. It would be even more powerful if you can set correctness for a given direction specifically.
Good point. We are getting more and more flags though... Anyway, I'll add it, but we should discuss if we can collapse the (now four) options to the EP constructor. :-(
4. Order Same as correctness. In most cases I don?t care order. I sometimes prefer have control over it on at the application level. This dramatically simplifies the underlying implementation and to have it at application level can be rather easy. In the NAREGI project I have such mechanism. I don?t care about the order for 95% of the messages, just some must be ordered. It would be even more powerful if you can set correctness for a given direction specifically.
That is in the spec by now.
5. Opaque messages In the NAREGI project we have already implement a messaging API. Actually we have several, (Socket, File, SOAP, and HTTP), and all use the same message container. Now the API for each implementation isn?t unified yet. We do use opaque messages content because when an application component wants to send a piece of something it just drops its content. That can be any object. When the message moves down to the messaging API it passes through filters that will transform it into byte arrays. The same is true for incoming messages. Filters will transform the content when it moves up in the application.
The message class has an extra method called "getBytes()" which ensures to return a byte array. If the content cannot be serialized to a byte array, then nothing happen and null is returned. The "getSize()" will return the size of usable block of the array. The "getData()" might return an array but is not guarantied. In addition the message container has an ID flag to hint the receiving application in how the filters must be applied. It has also other attributes but of no interest so far for SAGA.
See answer to Werners comments: I certainly agree that support for data typing and conversion is handy - I only doubt that its easy to agree on a model here. I am not sure if you share Werners opinion that data type support should be independent from a data model - do you? My (personal) opinion still is that msg (or message ;-) should be a byte buffer only, moving all type conversion, packaging etc. on application level, and to later work on other messages with better data support which can then easily be mapped onto the original (unstructured) message. That way we can go in more than one direction in the future, and have something to start with very fast. Lets discuss this at OGF again - I am very open to other options. But I really would like to keep the message API simple, for now at least.
6. Memory Management In general for the sending part I totally agree. The management is up to the application. Now for the receiving part I am not entirely satisfied. When the management is done by the implementation I am fine, there is not much to say. But in the case if the application management I have issues: manly concerning efficient memory usage of the message buckets. With message intensive applications the message container should be handled by the application to avoid memory burns/leaks. This is the case for the NAREI project.
For example, the application creates a pool of buckets (messages) of lets say 512 bytes each. +90% of the incoming messages are always smaller then 512, so we have no trouble with the buckets since the array capacity is always larger then the size of the message. Now what happen when we have a message larger and we didn?t do the available test because we want to block the current thread with the received? In the current proposal my message gets truncated. That is not good. In the current implementation the implementation will change the array by reallocating the array.
The message container has two fields: size of the actual message, and capacity of the current array buffer. In the next version of the API for NAREGI I will add offset in case we want to use larger array blocks, typically from network buffers and allow sub blocks to be used. In C implementation this is not necessary because you can just shift the pointer, Java cannot.
Do I understand that correctly? - application allocates 512 bytes - message arrives, and is larger than 512 bytes - the implementation re-allocates the memory - the application later frees the memory I am sure that works, but I would be hesitant to introduce semantics which distributes memory management in different layers - I would think that this can give serious trouble for several languages. Also, I would expect you to do a blocking test() (check()) to wait for the message? (see earlier)
7. Serve time out and shutdown I think that we should allow as optional implementation or usage to have the server method time out. This will either stop the internal thread or for example with a TCP/IP messaging solution to set the SO_TIMEOUT flag. We need this in order to check what is happening with processes. In some cases the job submission tool reports the process running but no IO connection is active, a time out might help us release the hook and do something different before making another attempt.
Serve by now allows an int parameter to specify for how many clients it should serve (i.e. wait for). That is close to what you say I think, but not specifying a timeout in seconds, but an availability for <n> clients. Also, I certainly agree that more sophisticated management of both the serve() and the individual connections is possible - it would, however, imply a more complicated API. Having said that: yes, a timeout would be possible on send. I could not find SO_TIMEOUT? (I am offline though, will recheck...)
We should also be able to properly stop the service with a "disconnect" or "shutdown", in the case the close doesn?t do it. That means that the server must stop and all the connection handled by the class must be terminated. That is important for us to free the underlying memory and objects allocated. Typically we should be able to close the server sockets in a proper way.
close is supposed to do exactly that: - close Purpose: close the endpoint, and release all resources
8. Endpoint reuse The current semantic does not allow reusing the same object after a close. It might be handy to support a reconnect with an open if the implementation does not support reconnection. In case the application must handle stable connection it must create a new object at each time, which might be a memory burden. If the URL is not modified and the states are correct I don?t see why we cannot open a connection again after a close.
I am not sure about that. You would not have a final state anymore. Not sure if that is a problem. Also, as close released all resources anyway, an open() would imply a similar overhead to a new object creation, wouldn't it? (minus allocating the object itself though).
OK That is all for now. I might have other things popping up.
Thanks a lot! I'm sure we can converge on all these points. It seems we have to take care of what use cases we want to cover really, that seems the main point in your mail, and in the other comments. Cheers, Andre. -- "So much time, so little to do..." -- Garfield

Andre Merzky wrote:
Quoting [Pascal Kleijer] (Jan 18 2007):
From: Pascal Kleijer <k-pasukaru@ap.jp.nec.com> To: Andre Merzky <andre@merzky.net>, SAGA RG <saga-rg@ogf.org> CC: John Shalf <JShalf@lbl.gov>, Werner Benger <benger@zib.de> Subject: Re: [SAGA-RG] SAGA Message API Extension
Hello all,
I jump into the conversation a bit later then expected. It seems that Andre always choose a time in space-time that is inconvenient for me :p
Here is some food:
A. Typos I cannot access the CVS so I post some of my stuff here. 1. Copyright at 2006? Should it not be 2007? See page 1 and 23 2. Most notes of the detailed spec should be "Notes: - see notes *on* memory management." 3. Section 2.1, "[ibidem]" ??? 4. Section 2.1.3, paragraph 2 "A message sent by *an* endpoint" 5. Page 17 "Format: serve" instead of " Format: connect"
Perfect, thanks, will fix. I'll check the CVS access - I seem to remember that you should have a login...
Well I have a login, the problem is that I am behind a proxy gate, so I can only access CVS if it is configured to support web access (HTTP or HTTPS). This is not the case I think.
ibidem: basically means: same as last reference. I wanted to avoid referenceing the core spec again and again. Not sure if this is commonly used though...
I am not so familiar with this one, I tend to directly add the reference again. If the reference is just a paragraph away I don't add the reference at all.
There might be more, but this just catched my eyes.
B. API 1. I would called the method "receive()" and not "recv()", in section 2.1.3 you have it right ;)
Yes, I had 'receive' first, but then the examples don't line up so well:
ep.send (msg); ep.recv (msg);
Isn't that nice? :-P Also, send/recv is the same as the POSIX send/recv.
POSIX was made by C hacker, so no need to emulate them and do the things properly from start. Obfuscated code (http://www.ioccc.org/) should not be part of "Simple API". Sorry to hammer on this one, but I tend to correct my programmers all the time on that: Portability, Clarity, Simplicity, Re-usability, etc. As for making long names, who cares, I never write a full name, my IDE takes care of selecting the right method after 2 or 4 characters types.
But you may probably right: verbosity may be better here. Lets flip a coin at the OGF session (i.e. lets have a 30 min deiscussion and a random vote about it)? :-) I'm fine either way, really...
Not really verbosity IMO. Just clarity for a reader that enters the spec for the first time.
2. The "test()" method should be called "available()" because it sounds more like test code then API code. The method should be non-blocking and return -1 if no message is available. The timeout should be ignored IMO.
Agree, test is misleading. But possibly 'check'? (shorter but also explicit)?
Reason for timeout: see later. That allows to have test blocking...
Hmm, "check" is to vague. You need a method that looks for a possible message and returns information on the message. Basically it returns two values: a boolean to tell their is something and a message size.
3. Abusive usage of the NotImplemented exception. Most method are mandatory anyway, clean out the spec would be nice.
NotImplemented is on all calls, because the whole package might be NotImplemented. We changes that in the Core spec too, lately, and list exceptions like NotImplemented and NoSuccess basically everywhere...
The statement from the Core spec intro still holds though:
"The NotImplemented exception MUST, however, be used only in necessary cases, for example if an underlying Grid middleware does not provide some capability, and if this capability can also not be emulated. The implementa- tion MUST carefully document and motivate the use of the NotImplemented exception."
OK, then some verbatim in the document is necessary. Just telling ppl to read another document will not help. I didn't read the Core API spec document for sometime so I forgot about that point.
4. Metric RemoteDisConnect should be RemoteDisconnect.
Agree.
5. The "msg" class should be called "message"
kind of agree - I like the shorter, but again you are probably right, and verbosity is better.
Andre, you are to C (hacker) centric :p Make it simple for use folx with full names.
I actually wonder that nobody complaend about 'endpoint' - that is what I have some issues with, because its so generic - who knows what other communication schemes we are going to add, which also have endpoints. 'message_endpoint' would be much better, but is too long really. Any suggestion? msg_ep? har har har :-P
Well I was ready to tell something, but other matter were more important. Also in WS we have EPR, which is basically a pointer container and not a manager as here. We can think about MessageBus :)
C. Others Here several more complex things.
1. State machine Why do you not include the failed states? Depending on how the message API is handled the failure might be permanent and the object remains in the given state. We can imaging that the dropped can be recovered for some time: You have two attributes attempts and delay, the object is in dropped state until it can reconnect, if the number of attempts are consumed the state is permanent, otherwise it will sleep (delay) for some time and try a new connection.
Very good point! We had Failed (and Dropped) earlier. The problem is, when do you enter Failed? Assume you have an endpoint with three open connections, and a send() succeeds to send on two connections, but fails on the third. If the EP is 'Unrealiable', thats not an error anyway. if communication is 'Reliable', its certainly an error, and should get flagged - but should the two good sessions be dropped as well (which would happen when we go to Failed I assume, otherwise the state is meaningless)?
States could be assigned to the connections as well, but we don't have any handle on them as of now, and introducing those would REALLY complicate the API (I tried).
Any idea?
I think the whole issues is due to the support of multi cast/message bus. If one end point would handle a single point we would make the API easier. We can always add an aggregation class that supports more then one endpoint based on single end-points. In that case we would simplify the API and allow extensions. Now the "serve" method should be thrown out of the end point and delegated to a endpoint factory. In case of a TCP/IP implementation this could be a ServerSocket returning Sockets.
I have this problem with the NAREGI project, for the moment I only allow one more attempt when the connection is dropped or fails (time out, others). After that a fatal exception is thrown.
Ugh, that SHOULD be hidden in the implementation I think, no need to expose that on API level IMHO. Or?
It is hidden now in the implementation, but I want to make some of the controls visible for advanced tweaking of the system. By default it would remain as is.
This might be an optional implementation but will ensure resilient implementation. If the failure is permanent it must be handled at application level. Of course the two attribute can be modified anytime during the life time of the object.
Yes, I agree, it might help to stabilize the implementation on protocol/stream level, but what can the application do with that information, really? drop the connection and reconnect? That would mean to close the endpoint and set up the whole thing again. Much better to handle reconnect on implementation level, and tell the application when that fails -- the app then can still do the same (close/restart).
The application might unable to handle the problem, it must then transfer the problem to the user. In the NAREGI project we do it, in case of failure and the application can not recover, the application tells the user of the problem. The user has then to decide what to do.
2. Silently Discarded & Reliability In section 2.1.2 second paragraph! Gasp no, in a (un)reliable system this cannot be! You should at least return a flag with the "send()" in synchronous connection telling the message was send, ev. acked. In asynchronous it will be a monitorable field, like a ticked for the "send()" call or the task that does the job.
Hmm....
Whatever Reliability, Correctness and Order is applied there should be a way to tell the application, ev. Optional implementation, that the message was send. If the underlying API does not support reliability the application can implement one. In a reliable system anyway this should not happen.
IMHO, if the application needs reliability, it sjould use a realible implementation of the message API - the idea is to avoid application level management of the message transpoint.
Anyway, I think you make a good point, an ACK might be useful for more than app level reliability. But what do you return? Again, assume an EP is connected to three clients, mode is unreliable, and two messages get through - what do you report? 0.6666? ;-) 2? Is of no use if the app has to check how many clients are connected - that can not be done atomically, as its a separate operation to the send (get_receivers()). So, return true if at least one message got through? What is the use of that? If its not reliable, id does not matter anyway. If its reliable, an error will be flagged anyway if only one client fails...
So: you may be right, but I think the semantics is non-trivial.
See above my response about pooled endpoint connections.
3. Correctness I would rather see this as an attribute you can set. If the message is received I am not always interested in knowing if the content is correct. Just getting it is enough. The overhead for correctness depends on the implementation but in most cases this means extra CPU usages that I don?t want. I can think about MPEG streams, if we loose some part or some is corrupted, I don?t care; we go to the next frame. It would be even more powerful if you can set correctness for a given direction specifically.
Good point. We are getting more and more flags though... Anyway, I'll add it, but we should discuss if we can collapse the (now four) options to the EP constructor. :-(
Hmm, not necessary to add into the constructor. It might be a attribute set mechanism. Only when you start to serve the endpoint will it tell the user that it cannot if some features are not supported.
4. Order Same as correctness. In most cases I don?t care order. I sometimes prefer have control over it on at the application level. This dramatically simplifies the underlying implementation and to have it at application level can be rather easy. In the NAREGI project I have such mechanism. I don?t care about the order for 95% of the messages, just some must be ordered. It would be even more powerful if you can set correctness for a given direction specifically.
That is in the spec by now.
5. Opaque messages In the NAREGI project we have already implement a messaging API. Actually we have several, (Socket, File, SOAP, and HTTP), and all use the same message container. Now the API for each implementation isn?t unified yet. We do use opaque messages content because when an application component wants to send a piece of something it just drops its content. That can be any object. When the message moves down to the messaging API it passes through filters that will transform it into byte arrays. The same is true for incoming messages. Filters will transform the content when it moves up in the application.
The message class has an extra method called "getBytes()" which ensures to return a byte array. If the content cannot be serialized to a byte array, then nothing happen and null is returned. The "getSize()" will return the size of usable block of the array. The "getData()" might return an array but is not guarantied. In addition the message container has an ID flag to hint the receiving application in how the filters must be applied. It has also other attributes but of no interest so far for SAGA.
See answer to Werners comments: I certainly agree that support for data typing and conversion is handy - I only doubt that its easy to agree on a model here. I am not sure if you share Werners opinion that data type support should be independent from a data model - do you?
My (personal) opinion still is that msg (or message ;-) should be a byte buffer only, moving all type conversion, packaging etc. on application level, and to later work on other messages with better data support which can then easily be mapped onto the original (unstructured) message. That way we can go in more than one direction in the future, and have something to start with very fast.
Lets discuss this at OGF again - I am very open to other options. But I really would like to keep the message API simple, for now at least.
Indeed, this can be handled by filters on application level. For the moment in the NAREGI project the message itself can handle raw byte arrays and strings (can be translated in bytes directly). Other formats are up to the filters you add above the stack.
6. Memory Management In general for the sending part I totally agree. The management is up to the application. Now for the receiving part I am not entirely satisfied. When the management is done by the implementation I am fine, there is not much to say. But in the case if the application management I have issues: manly concerning efficient memory usage of the message buckets. With message intensive applications the message container should be handled by the application to avoid memory burns/leaks. This is the case for the NAREI project.
For example, the application creates a pool of buckets (messages) of lets say 512 bytes each. +90% of the incoming messages are always smaller then 512, so we have no trouble with the buckets since the array capacity is always larger then the size of the message. Now what happen when we have a message larger and we didn?t do the available test because we want to block the current thread with the received? In the current proposal my message gets truncated. That is not good. In the current implementation the implementation will change the array by reallocating the array.
The message container has two fields: size of the actual message, and capacity of the current array buffer. In the next version of the API for NAREGI I will add offset in case we want to use larger array blocks, typically from network buffers and allow sub blocks to be used. In C implementation this is not necessary because you can just shift the pointer, Java cannot.
Do I understand that correctly?
- application allocates 512 bytes - message arrives, and is larger than 512 bytes - the implementation re-allocates the memory - the application later frees the memory
I am sure that works, but I would be hesitant to introduce semantics which distributes memory management in different layers - I would think that this can give serious trouble for several languages.
Also, I would expect you to do a blocking test() (check()) to wait for the message? (see earlier)
Yeap I see the point now. I am also too Java oriented with this nice GC doing the work for you.
7. Serve time out and shutdown I think that we should allow as optional implementation or usage to have the server method time out. This will either stop the internal thread or for example with a TCP/IP messaging solution to set the SO_TIMEOUT flag. We need this in order to check what is happening with processes. In some cases the job submission tool reports the process running but no IO connection is active, a time out might help us release the hook and do something different before making another attempt.
Serve by now allows an int parameter to specify for how many clients it should serve (i.e. wait for). That is close to what you say I think, but not specifying a timeout in seconds, but an availability for <n> clients.
Also, I certainly agree that more sophisticated management of both the serve() and the individual connections is possible - it would, however, imply a more complicated API.
Having said that: yes, a timeout would be possible on send.
I could not find SO_TIMEOUT? (I am offline though, will recheck...)
We should also be able to properly stop the service with a "disconnect" or "shutdown", in the case the close doesn?t do it. That means that the server must stop and all the connection handled by the class must be terminated. That is important for us to free the underlying memory and objects allocated. Typically we should be able to close the server sockets in a proper way.
close is supposed to do exactly that:
- close Purpose: close the endpoint, and release all resources
8. Endpoint reuse The current semantic does not allow reusing the same object after a close. It might be handy to support a reconnect with an open if the implementation does not support reconnection. In case the application must handle stable connection it must create a new object at each time, which might be a memory burden. If the URL is not modified and the states are correct I don?t see why we cannot open a connection again after a close.
I am not sure about that. You would not have a final state anymore. Not sure if that is a problem. Also, as close released all resources anyway, an open() would imply a similar overhead to a new object creation, wouldn't it? (minus allocating the object itself though).
OK That is all for now. I might have other things popping up.
Thanks a lot! I'm sure we can converge on all these points. It seems we have to take care of what use cases we want to cover really, that seems the main point in your mail, and in the other comments.
Cheers, Andre.
-- "So much time, so little to do..." -- Garfield
-- Best regards, Pascal Kleijer ---------------------------------------------------------------- HPC Marketing Promotion Division, NEC Corporation 1-10, Nisshin-cho, Fuchu, Tokyo, 183-8501, Japan. Tel: +81-(0)42/333.6389 Fax: +81-(0)42/333.6382

Well I have a login, the problem is that I am behind a proxy gate, so I can only access CVS if it is configured to support web access (HTTP or HTTPS). This is not the case I think.
If this CVS is setup to use ssh (and most of them are nowadays) you may want to check a nice tool named 'corkscrew'. It allows you to reach ssh through a http proxy - it's not perfect but for CVS access should suffice. hth Konrad

Hmm, not a nice tool to setup. You need Cygwin, compile and plenty of setup. I am the lazy guy, if a feature is more then 3 clicks aways... I might try a VPN trough a environment which is not protected and then try a CVS access. Might be easier. I am eagerly waiting that Eclipse supports HTTP proxy to access a CVS server :) Thanks anyway! Konrad Karczewski wrote:
Well I have a login, the problem is that I am behind a proxy gate, so I can only access CVS if it is configured to support web access (HTTP or HTTPS). This is not the case I think.
If this CVS is setup to use ssh (and most of them are nowadays) you may want to check a nice tool named 'corkscrew'. It allows you to reach ssh through a http proxy - it's not perfect but for CVS access should suffice.
hth
Konrad
-- Best regards, Pascal Kleijer ---------------------------------------------------------------- HPC Marketing Promotion Division, NEC Corporation 1-10, Nisshin-cho, Fuchu, Tokyo, 183-8501, Japan. Tel: +81-(0)42/333.6389 Fax: +81-(0)42/333.6382

Well under linux it's less then 3 clicks ;) I suppose that there are similar tools for other OSes - I never needed them so I can't tell K On Mon, 22 Jan 2007, Pascal Kleijer wrote:
Hmm, not a nice tool to setup. You need Cygwin, compile and plenty of setup. I am the lazy guy, if a feature is more then 3 clicks aways...
I might try a VPN trough a environment which is not protected and then try a CVS access. Might be easier.
I am eagerly waiting that Eclipse supports HTTP proxy to access a CVS server :)
Thanks anyway!
Konrad Karczewski wrote:
Well I have a login, the problem is that I am behind a proxy gate, so I can only access CVS if it is configured to support web access (HTTP or HTTPS). This is not the case I think.
If this CVS is setup to use ssh (and most of them are nowadays) you may want to check a nice tool named 'corkscrew'. It allows you to reach ssh through a http proxy - it's not perfect but for CVS access should suffice.
hth
Konrad
--
Best regards, Pascal Kleijer
---------------------------------------------------------------- HPC Marketing Promotion Division, NEC Corporation 1-10, Nisshin-cho, Fuchu, Tokyo, 183-8501, Japan. Tel: +81-(0)42/333.6389 Fax: +81-(0)42/333.6382

Andre, Where does this messaging API sit with things such as WS-Notification or WS-Eventing and WS-Relaibility or WS-ReliabaleMessaging. Can these underpin the API? Steven -- ---------------------------------------------------------------- Dr Steven Newhouse Mob:+44(0)7920489420 Tel:+44(0)23 80598789 Director, Open Middleware Infrastructure Institute-UK (OMII-UK) c/o Suite 6005, Faraday Building (B21), Highfield Campus, University of Southampton, Highfield, Southampton, SO17 1BJ, UK

Hi Steven, Quoting [Steven Newhouse] (Jan 19 2007):
Andre,
Where does this messaging API sit with things such as WS-Notification or WS-Eventing and WS-Relaibility or WS-ReliabaleMessaging. Can these underpin the API?
In terms of notification and event management I'd say: yes, that is compatible to WS-Notification, WS-Eventing and WS-Relaibility. I don't think though that all reliability and ordering modes can be supported with those (have to recheck, in respect to the latest discussion and changes). I do not think though that the API would be implemented with these, as it focuses on large messages (think megabyte or more) of binary data. I think thats not the target usage scenario for the WS-protocols. Or am I missing something? I must admit I don't know a thing about WS-ReliabaleMessaging, really - will check as soon as I am back online. I suspect though that the same answer holds. (I don't know much about the other three too, really, so will re-check them, too ;-) Thanks for the hint! Cheers, Andre. -- "So much time, so little to do..." -- Garfield

Hi Steven, I found: "In WS-Notification, a message type is represented by an XML Schema global element definition." I did not find such an explicit statement for the othe WS definitions, but I think the same holds there. I am not sure if that would exclude large binary messages, but it certainly would make that more difficult. Yes, you can encode them, and include even larger ascii messages then, but that would be inefficient. So I think the mechanisms used in WS-Notification and WS-ReliableMessaging are VERY compatible to the SAGA message API, but the target use cases differ, and it is possible to implement the SAGA message API on top of the WS-standards, but that may be not optimal in terms of performance. OTOH, for use cases with less strict requirements (smaller message footprint, XML messages, no latency limits), the API and the WS-specs seem definitely compatible. Cheers, Andre. Quoting [Andre Merzky] (Feb 20 2007):
From: Andre Merzky <andre@merzky.net> To: Steven Newhouse <s.newhouse@omii.ac.uk> Cc: Andre Merzky <andre@merzky.net>, SAGA RG <saga-rg@ogf.org> Subject: Re: [SAGA-RG] SAGA Message API Extension
Hi Steven,
Quoting [Steven Newhouse] (Jan 19 2007):
Andre,
Where does this messaging API sit with things such as WS-Notification or WS-Eventing and WS-Relaibility or WS-ReliabaleMessaging. Can these underpin the API?
In terms of notification and event management I'd say: yes, that is compatible to WS-Notification, WS-Eventing and WS-Relaibility. I don't think though that all reliability and ordering modes can be supported with those (have to recheck, in respect to the latest discussion and changes).
I do not think though that the API would be implemented with these, as it focuses on large messages (think megabyte or more) of binary data. I think thats not the target usage scenario for the WS-protocols. Or am I missing something?
I must admit I don't know a thing about WS-ReliabaleMessaging, really - will check as soon as I am back online. I suspect though that the same answer holds. (I don't know much about the other three too, really, so will re-check them, too ;-)
Thanks for the hint!
Cheers, Andre. -- "So much time, so little to do..." -- Garfield

Hi Andre,
here is the updated draft of the SAGA Message API, in preparation for OGF-19. It is also available in CVS, as usual.
We would be happy to get feedback on the list of course, so don't feel oblidged to hold back your comments for OGF-19 :-)
Sorry, it took me some time to digg through the new spec... I had to start implementing the stuff to have comments (I'm perhaps too old to do programming other than in an intuitive way :-P). The comments I have relate to the API part of the spec. Frankly, I don't like the design of the 'msg' object, I even think it's not viable to make it a first class SAGA object (derived from saga::object) - is there a use case requiring that? The msg object in its current design mixes two different paradigms into one object (remember: every kind of bool in an API is a strong hint for a flawed design): a) saga::msg can be a wrapper object for application memory (size > 0) and b) saga::msg can be a object managing memory on behalf of the SAGA implementation (size == -1) I strongly believe that we don't need b) and even if we do, it should be implemented in a separate abstraction. Why do I think we don't need b)? Instead of relying on the SAGA implementation to manage the memory for the application (BTW: this is the only instance in the whole spec, where we require the implementation to do this), an application always may write (syntax is C++): endpoint ep(...); ep.connect(...); int size = 0; if (ep.test(..., &size)) { void *data = new unsigned char[size]; ep.recv(data, size); Or even better: ep.recv(buffer(data, size)); ... delete data; } i.e. The application always can handle the memory itself, without having to go through too much overhead. This brings me to the second comment: by implementing the msg/buffer object solely for the purpose of wrapping up application memory this msg/buffer could be reused for other packages as well: file::read, file::write, stream::read, stream::write, rpc::parameter::buffer - to name a few. BTW: I silently introduced a const_buffer and mutable_buffer objects, combined with a buffer() generator function in the C++ implementation of the saga::file package to improve memory handling in the API. Regards Hartmut

Hi Hartmut, I certainly do understand your unease with implementation managed memory, and, yes, its the only place in the spec where we have that at the moment... The main (and actually only) reason for this is the ability to avoid memcopies. If you think of large messages, you definitely would like to have a way to avoid the copy from implementation memory (where it was received, and is buffered anyway) into application memory. For streams, that problem is not that bad, as you can delay the low level read until the stream.read() method is really called, and then you can read directly into the application level memory. For messages that does not work, as you do not know the size of the message beforehand. If you call test() (or check() now), the message must (at least partly) already recide in the application memory. Do you see any other possibility for avoiding the additional copy from impl. memory to application memory? About your second comment, the resuse of the msg (or message) object in other parts of the API: I am not sure about this, really. The main difference to stream and file byte buffers is the opacity of the message: for files and streams, you can write an int, five floats, another int etc. There are no contraints on when the data are transferred etc, and if, you flush the file. For the message API, the message is to be transferred as an entity, but can have internal structure. So you want to pack an int, five floats, and another int in a message, and then ship that as an opaque entity. w/o message object that is awkward. OTOH, a message object for that would be overkill for the file and stream API I think. However, I think we CAN reuse the message abstraction in several places, e.g. in the advert service, or, as we discussed in other mails, to create specific message types for certain application domains, or with typed data support etc. Well, these are my 2 cent... :-) Cheers, Andre. Quoting [Hartmut Kaiser] (Jan 20 2007):
Hi Andre,
here is the updated draft of the SAGA Message API, in preparation for OGF-19. It is also available in CVS, as usual.
We would be happy to get feedback on the list of course, so don't feel oblidged to hold back your comments for OGF-19 :-)
Sorry, it took me some time to digg through the new spec... I had to start implementing the stuff to have comments (I'm perhaps too old to do programming other than in an intuitive way :-P).
The comments I have relate to the API part of the spec.
Frankly, I don't like the design of the 'msg' object, I even think it's not viable to make it a first class SAGA object (derived from saga::object) - is there a use case requiring that?
The msg object in its current design mixes two different paradigms into one object (remember: every kind of bool in an API is a strong hint for a flawed design): a) saga::msg can be a wrapper object for application memory (size > 0) and b) saga::msg can be a object managing memory on behalf of the SAGA implementation (size == -1) I strongly believe that we don't need b) and even if we do, it should be implemented in a separate abstraction.
Why do I think we don't need b)?
Instead of relying on the SAGA implementation to manage the memory for the application (BTW: this is the only instance in the whole spec, where we require the implementation to do this), an application always may write (syntax is C++):
endpoint ep(...); ep.connect(...); int size = 0; if (ep.test(..., &size)) { void *data = new unsigned char[size]; ep.recv(data, size);
Or even better:
ep.recv(buffer(data, size)); ... delete data; }
i.e. The application always can handle the memory itself, without having to go through too much overhead.
This brings me to the second comment: by implementing the msg/buffer object solely for the purpose of wrapping up application memory this msg/buffer could be reused for other packages as well: file::read, file::write, stream::read, stream::write, rpc::parameter::buffer - to name a few.
BTW: I silently introduced a const_buffer and mutable_buffer objects, combined with a buffer() generator function in the C++ implementation of the saga::file package to improve memory handling in the API.
Regards Hartmut
-- "So much time, so little to do..." -- Garfield
participants (9)
-
'Andre Merzky'
-
Andre Merzky
-
Andrei Hutanu
-
Hartmut Kaiser
-
John Shalf
-
John Shalf
-
Konrad Karczewski
-
Pascal Kleijer
-
Steven Newhouse