
-- REPOST -- Sorry for the spam, but earlier today, the OGF mailing list manager choked on mails sent to OGSA-DMI. Here's a reply to a mail from Allen regarding the DataEPR element we define in OGSA-DMI. Cheers, Michel -------- Original Message -------- Subject: Re: Latest DMI Spec. Date: Wed, 12 Sep 2007 11:38:23 +0100 From: Michel Drescher <Michel.Drescher@uk.fujitsu.com> To: Steven Newhouse <Steven.Newhouse@microsoft.com> CC: Allen Luniewski <luniew@us.ibm.com>, "mario@epcc.ed.ac.uk" <mario@epcc.ed.ac.uk>, OGSA-DMI ML <ogsa-dmi-wg@ogf.org> References: <OF75CBA36B.23241ED0-ON8825734E.0058DC3A-8825734E.005CB8BA@us.ibm.com> <F75A5628125A9C4EA64981CC4D412B94273558D977@NA-EXMSG-C102.redmond.corp.microsoft.com> Folks, this is going to be a quite lengthy mail, but I hope I can make that point clear: Steven Newhouse wrote:
Philosophical discussion.
4.2.1.2.1 XML Representation. Just an opinion, and not one I will fall on my sword about, but I have never liked, and still do not like, putting all of this information into an EPR. I strongly believe that it is simpler, and more appropriate, to get this information via a call to the source/sink of the transfer via an architected interface.
I’d agree this is one of our weak areas. We need to get this information… but we don’t quite know from where. I would be nervous about defining (and getting implemented) another port type.
For now I think we should keep it weak and abuse the EPR as a mere data container. Later on we can still bang our heads on that other port type (do we really have to define that port type? Or can we trust implementors being intelligent and program an intelligent DMI system that acts as an intelligent client that can programmatically analyze an EPR metadata, find a link to the EPR's live WSDL, analyse that and act appropriately?
Later in this section, @dmi:url. I am rather surprised that this information is being carried in a URL. Yes, one can put almost anything into a URL but it would seem more natural to admit that this information is basically source/sink specific and thus passing it back as an "any" would seem logical.
Its more protocol specific in my view. But I see what you mean. If the protocol was (say) SQL this would be an SQL statement on a particular data base table.
The datatype of @dmi:url is already xsd:anyURI so we are safe there as there is no doubt a URI scheme "sql:" that models an SQL query as a URI. IF not, who stops people creating one? But I still think we should rename that attribute to something more accurate, e.g. "dmi:dataDescriptor".
Again later in this section, @dmi:sourceMode. As written this section could be interpreted to require that a DTI implementation always work in a synchronous mode with the source/sink. That is, do a push (pull) and then wait for an acknowledgement. I do not believe that we want to impose that kind of restriction. Somehow this description should make it clear that a DTI implementation may have many outstanding push/pull messages outstanding at any given time.
In this same area, the description of 3rd party transfer is not very clear. Yes, I understand what it is but since this is a specification I think that this document needs to make it quite clear what is intended.
Michel – You led the definition of this section… care to comment?
Okay, here's the lengthy part of this specification: Remember, a DataEPR without changing can be used as either a "Source DataEPR" or a "Sink DataEPR", but it SHOULD NOT be used for both in the same data transfer request to the DTF (results are undefined and DMI implementation specific). Hence a DataEPR, for each supported protocol, provides information on how the data described by this EPR should be accessed when used in a particular context (i.e. either as source or as sink). The following boolean table illustrates which combinations are feasible and can be used to initiate a data transfer. Note that this table should be viewed in a fixed font (e.g. Courier). Source DataEPR's || Sink DataEPR's || @dmi:sourceMode value || @dmi:sinkMode value || -------------------------------||--------------------------------|| Data | | pull | 3rd || | | pull | 3rd || Transfer pull | push | and push | party || pull | push | and push | party || possible -----|------|----------|-------||------|------|----------|-------||---------- X | | | || X | | | || - -----|------|----------|-------||------|------|----------|-------||---------- X | | | || | X | | || X -----|------|----------|-------||------|------|----------|-------||---------- X | | | || | | X | || X -----|------|----------|-------||------|------|----------|-------||---------- X | | | || | | | X || - -----|------|----------|-------||------|------|----------|-------||---------- | X | | || X | | | || - -----|------|----------|-------||------|------|----------|-------||---------- | X | | || | X | | || - -----|------|----------|-------||------|------|----------|-------||---------- | X | | || | | X | || - -----|------|----------|-------||------|------|----------|-------||---------- | X | | || | | | X || - -----|------|----------|-------||------|------|----------|-------||---------- | | X | || X | | | || - -----|------|----------|-------||------|------|----------|-------||---------- | | X | || | X | | || X -----|------|----------|-------||------|------|----------|-------||---------- | | X | || | | X | || X -----|------|----------|-------||------|------|----------|-------||---------- | | X | || | | | X || - -----|------|----------|-------||------|------|----------|-------||---------- | | | X || X | | | || - -----|------|----------|-------||------|------|----------|-------||---------- | | | X || | X | | || - -----|------|----------|-------||------|------|----------|-------||---------- | | | X || | | X | || - -----|------|----------|-------||------|------|----------|-------||---------- | | | X || | | | X || X The table should be read like this. "pull" means: The data can be only be pulled from the (source | sink) using the given protocol. "push" means: The data can be only pushed into the (source | sink) using the given protocol. "pull and push" means: The data can either be pulled from or pushed into the (source | sink) using the given protocol. "3rd party" means: The given protocol supports true 3rd party data transfers that do not involve an intermediate to cache the data. The table does not specify protocols as those combinations can occur for any protocol that is supported by the source and/or sink. It is also clear that except for 3rd party data transfer that the DTI always has to cache the data in transit, i.e. the DTI "pull"s the data from the source and "push"es it to the sink. The table does *not* cover the use case where a particular implementation of a data transfer protocol (or an extension to it) enables the DTI to: a) instruct the source to push the data to the sink, or b) instruct the sink to pull the data from the source. Those use cases would require the definition of additional values for the @dmi:sourceMode and @dmi:sinkMode attributes. Those scenarios are possible, but I don't know of any data transfer protocol standard that actually explicitly allows that. This ominous "ParallelHTTP" I was working on is based on HTTP but explicitly allowed entities such as the DTI to instruct a ParallelHTTP-enabled source or sink to perform the described actions. However, the work on this stalled and I don't know whether it is worth mentioning it in OGSA-DMI or even worth pursuing this programming effort at all. Instead I am inclined to add a ByteIO identifier as a very last common resort to the list of identified data transfer standard. I know it's slow, but it's in the OGF track, and there are several implementations available in UNICORE, GenesisII, EPCC's OGSA-DAI effort. The UNICORE implementation also provides Globus integration if I am not wrong. I hope that this makes it a lot clearer what I intend to express with those values. Cheers, Michel