Notes from the OGSA ByteIO session at OGF19
Notes from OGSA-ByteIO Session at OGF19, Chapel Hill ==================================================== 6 people present Chairs noted that attendance has dropped off since Specification and Rendering have been published :-) It was suggested that we attend the Visualisation workshop to make sure we tie off the streaming case loose ends. Steve Pickles suggested that we look at CDDLM interop experiences notes that were written by Steve Loughran - there may be some useful things to note for the ByteIO interop process. The group examined Dave Berry's EMS data scenario GridFTP issue - GridFTP does not support partial data transfer, so this makes it difficult to use ByteIO directly in this case. The three protocols in the ByteIO specification document all assume that you encode the data in the response. However this is not a requirement of ByteIO, as you just need to encode what is required to get the data, which may utilise out of band communication. Mark talked around a scenario with a representative protocol "Mark's Efficient Transport Protocol". The Request specifies: read a bunch of data, IP Address Port. The service sets up the data connection and sends back a response "expect data now". The client is then just waiting for it. Michel mentioned that this is the same as his ||http based ByteIO implementation. With GridFTP, because of the lack of partial transfer you could implement something in which the first time you talk to the service, you get sent back information to gridftp the whole thing (file?), then store it locally and access chunks back on client - could implement this but it is a silly way to do it. Basically GridFTP is targetted to moving the whole chunk of data, whereas ByteIO targetted to being cleverer at getting just the bytes it needs. It was agreed to finish off the Use Case document: there are some more to collect from Mark and from Dave Berry (which may overlap) A question was asked again about when do you use ByteIO, and when do you use DMI/GridFTP. GridFTP is about transferring large amounts of data. ByteIO is about providing interfaces to tease out interesting information from resources. A GENESIS II/BES like example: ByteIO to stream the JSDL, GRidFTP to stage the data specified in the JSDL Comments on interoperability document from ETSI - we agreed to paste comment to public comments and answer - a telcon will be setup to discuss this. Andre was shown the UVa ByteIO demo. Then he asked: - is there a C/C++ client? - SAGA based plugin for ByteIO? - experiences wrt performance? Mark answered: - only implemented "simple" protocol - very inefficient, however previous experience suggests that aggressive caching will make it as good as anything else. So, should be good enough! Discussion of Experiences Document layout. We agreed the following structure: 1. Description of process (here's how we approached it) 2. Description of the actual interop experiment (here's how we set it up) - we will post endpoints publicly rather than having a F2F session MD: Java/JAX-WS MM: Java/Axis AK: Java/Axis 3. Results matrix (a table of check marks) 4. List of problems and issues 5. Comments sections - comments from each implementor - comments from authors 6. Conclusions - Which should be... "we believe that this shows that the specification is a good one"! -- Neil P Chue Hong | T: [+44] (0)131 650 5957 Project Manager, EPCC | F: [+44] (0)131 650 6555 Rm 2409, JCMB, Mayfield Rd. | E: N.ChueHong@epcc.ed.ac.uk Edinburgh, EH9 3JZ, UK | W: http://www.epcc.ed.ac.uk BT MeetMe: http://tinyurl.com/8mwhd - Code: 14712935# "A film is like a battleground. It's love, hate, action, violence, death - in a word, emotion." - Sam Fuller
Hi Neil, Quoting [Neil P Chue Hong] (Jan 31 2007):
Notes from OGSA-ByteIO Session at OGF19, Chapel Hill ====================================================
[...]
With GridFTP, because of the lack of partial transfer you could implement something in which the first time you talk to the service, you get sent back information to gridftp the whole thing (file?), then store it locally and access chunks back on client - could implement this but it is a silly way to do it. Basically GridFTP is targetted to moving the whole chunk of data, whereas ByteIO targetted to being cleverer at getting just the bytes it needs.
Maybe I am missing something here, but GridFTP as protocol _does_ support partial file access, doesn't it? It is not implemented in all GridFTP servers though AFAIK. The spec says (I may not have the latest version though, will check): All implementations of this specification SHOULD implement the following Partial File Transfer ERET module: ERET <SP> PFT="<offset>,<length>" <filename> offset::= string representation of a positive 64 bit integer length::= string representation of a positive 64 bit integer Note that the offset specified here is the offset in the file and is not related to the offset specified in the MODE E header, which is the offset in the transfer over the wire. Cheers, Andre. -- "So much time, so little to do..." -- Garfield
Hi Andre, Andre Merzky wrote:
[...]
All implementations of this specification SHOULD implement the following Partial File Transfer ERET module:
ERET <SP> PFT="<offset>,<length>" <filename>
offset::= string representation of a positive 64 bit integer length::= string representation of a positive 64 bit integer
Note that the offset specified here is the offset in the file and is not related to the offset specified in the MODE E header, which is the offset in the transfer over the wire.
Saw this for the first time. But even so, this is an *optional* feature, something we cannot rely on. In HTTP for example, partial data transfer is a MUST, so it is feasible to define a profile for HTTP as each compliant implementation will offer that feature. What sense does it make to define a rendering for a feature that is optional in the underlying transport protocol? Cheers, Michel -- Michel <dot> Drescher <at> uk <dot> fujitsu <dot> com Fujitsu Laboratories of Europe +44 20 8606 4834
Quoting [Michel Drescher] (Feb 10 2007):
Hi Andre,
Andre Merzky wrote:
[...]
All implementations of this specification SHOULD implement the following Partial File Transfer ERET module:
ERET <SP> PFT="<offset>,<length>" <filename>
offset::= string representation of a positive 64 bit integer length::= string representation of a positive 64 bit integer
Note that the offset specified here is the offset in the file and is not related to the offset specified in the MODE E header, which is the offset in the transfer over the wire.
Saw this for the first time.
But even so, this is an *optional* feature, something we cannot rely on. In HTTP for example, partial data transfer is a MUST, so it is feasible to define a profile for HTTP as each compliant implementation will offer that feature.
What sense does it make to define a rendering for a feature that is optional in the underlying transport protocol?
Yes, I agree, that is a definite problem. However, if I am not mistaken, all gridftp implementations implement partial file transfer. Also, gridftp has a number of advantages over HTTP IMHO: comes with globus and other middleware, and is nicely (ahem) integrated into the security infrastructure. Anyway, I am not sure if that justifies spec dependencies... Cheers, Andre.
Cheers, Michel
-- "So much time, so little to do..." -- Garfield
Hi Andre, Andre Merzky wrote:
Quoting [Michel Drescher] (Feb 10 2007):
Hi Andre, [...] But even so, this is an *optional* feature, something we cannot rely on. In HTTP for example, partial data transfer is a MUST, so it is feasible to define a profile for HTTP as each compliant implementation will offer that feature.
What sense does it make to define a rendering for a feature that is optional in the underlying transport protocol?
Yes, I agree, that is a definite problem.
However, if I am not mistaken, all gridftp implementations implement partial file transfer. Also, gridftp has a number of advantages over HTTP IMHO: comes with globus and other middleware, and is nicely (ahem) integrated into the security infrastructure.
:-) ByteIO itself is agnostic to security (TBD or orthogonal), so this must not be taken into consideration. However, ByteIO is considerably different in its use cases and requirements from GridFTP. Hence a different architecture. GridFTP is designed to accomplish file *movement* while ByteIO is designed to solve file (well, data) *access*. As a consequence, GridFTP requests for small amounts of data are simply way too slow when compared to simple HTTP or parallel HTTP. I don't advocate against a GridFTP rendering - I just do not want to put the "burden" (in terms of time effort) on the ByteIO group itself because of the above mentioned reason(s). The group would be glad to see a rendering contributed from external sources, I suppose. Cheers, Michel -- Michel <dot> Drescher <at> uk <dot> fujitsu <dot> com Fujitsu Laboratories of Europe +44 20 8606 4834
Good points, I agree. Cheers, Andre. Quoting [Michel Drescher] (Feb 15 2007):
From: Michel Drescher
To: Andre Merzky Cc: neil p chue hong , byteio-wg@ggf.org Subject: Re: [BYTEIO-WG] Notes from the OGSA ByteIO session at OGF19 OpenPGP: id=334AC384 Hi Andre,
Andre Merzky wrote:
Quoting [Michel Drescher] (Feb 10 2007):
Hi Andre, [...] But even so, this is an *optional* feature, something we cannot rely on. In HTTP for example, partial data transfer is a MUST, so it is feasible to define a profile for HTTP as each compliant implementation will offer that feature.
What sense does it make to define a rendering for a feature that is optional in the underlying transport protocol?
Yes, I agree, that is a definite problem.
However, if I am not mistaken, all gridftp implementations implement partial file transfer. Also, gridftp has a number of advantages over HTTP IMHO: comes with globus and other middleware, and is nicely (ahem) integrated into the security infrastructure.
:-) ByteIO itself is agnostic to security (TBD or orthogonal), so this must not be taken into consideration.
However, ByteIO is considerably different in its use cases and requirements from GridFTP. Hence a different architecture. GridFTP is designed to accomplish file *movement* while ByteIO is designed to solve file (well, data) *access*. As a consequence, GridFTP requests for small amounts of data are simply way too slow when compared to simple HTTP or parallel HTTP.
I don't advocate against a GridFTP rendering - I just do not want to put the "burden" (in terms of time effort) on the ByteIO group itself because of the above mentioned reason(s). The group would be glad to see a rendering contributed from external sources, I suppose.
Cheers, Michel -- "So much time, so little to do..." -- Garfield
participants (3)
-
Andre Merzky
-
Michel Drescher
-
neil p chue hong