[ogsa-dmi-wg] [Fwd: Re: Latest DMI Spec.]

12 Sep 2007

      -- REPOST --

Sorry for the spam, but earlier today, the OGF mailing list manager choked
on mails sent to OGSA-DMI.

Here's a reply to a mail from Allen regarding the DataEPR element we define
in OGSA-DMI.

Cheers,
Michel

-------- Original Message --------
Subject: Re: Latest DMI Spec.
Date: Wed, 12 Sep 2007 11:38:23 +0100
From: Michel Drescher <Michel.Drescher@uk.fujitsu.com>
To: Steven Newhouse <Steven.Newhouse@microsoft.com>
CC: Allen Luniewski <luniew@us.ibm.com>,  "mario@epcc.ed.ac.uk"
<mario@epcc.ed.ac.uk>, OGSA-DMI ML <ogsa-dmi-wg@ogf.org>
References:
<OF75CBA36B.23241ED0-ON8825734E.0058DC3A-8825734E.005CB8BA@us.ibm.com>
<F75A5628125A9C4EA64981CC4D412B94273558D977@NA-EXMSG-C102.redmond.corp.microsoft.com>

Folks,

this is going to be a quite lengthy mail, but I hope I can make that point
clear:

Steven Newhouse wrote:
...
Philosophical discussion.
4.2.1.2.1 XML Representation. Just an opinion, and not one I will fall on my 
sword about, but I have never liked, and still do not like, putting all of this
information into an EPR. I strongly believe that it is simpler, and more 
appropriate, to get this information via a call to the source/sink of the transfer
via an architected interface.
I’d agree this is one of our weak areas. We need to get this information… but we 
don’t quite know from where. I would be nervous about defining (and getting 
implemented) another port type.
For now I think we should keep it weak and abuse the EPR as a mere data
container. Later on we can still bang our heads on that other port type (do
we really have to define that port type? Or can we trust implementors being
intelligent and program an intelligent DMI system that acts as an
intelligent client that can programmatically analyze an EPR metadata, find a
link to the EPR's live WSDL, analyse that and act appropriately?
...
Later in this section, @dmi:url. I am rather surprised that this information is 
being carried in a URL. Yes, one can put almost anything into a URL
but it would seem more natural to admit that this information is basically 
source/sink specific and thus passing it back as an "any" would
seem logical.
Its more protocol specific in my view. But I see what you mean. If the protocol 
was (say) SQL this would be an SQL statement on a particular data base table.
The datatype of @dmi:url is already xsd:anyURI so we are safe there as there
is no doubt a URI scheme "sql:" that models an SQL query as a URI. IF not,
who stops people creating one?
But I still think we should rename that attribute to something more
accurate, e.g. "dmi:dataDescriptor".
...
Again later in this section, @dmi:sourceMode. As written this section could be 
interpreted to require that a DTI implementation always work in a
synchronous mode with the source/sink. That is, do a push (pull) and then wait 
for an acknowledgement. I do not believe that we want to
impose that kind of restriction. Somehow this description should make it clear 
that a DTI implementation may have many outstanding
push/pull messages outstanding at any given time.
In this same area, the description of 3rd party transfer is not very clear. Yes, 
I understand what it is but since this is a specification I think
that this document needs to make it quite clear what is intended.
Michel – You led the definition of this section… care to comment?
Okay, here's the lengthy part of this specification:

Remember, a DataEPR without changing can be used as either a "Source
DataEPR" or a "Sink DataEPR", but it SHOULD NOT be used for both in the same
  data transfer request to the DTF (results are undefined and DMI
implementation specific).

Hence a DataEPR, for each supported protocol, provides information on how
the data described by this EPR should be accessed when used in a particular
context (i.e. either as source or as sink).

The following boolean table illustrates which combinations are feasible and
can be used to initiate a data transfer. Note that this table should be
viewed in a fixed font (e.g. Courier).

       Source DataEPR's        ||         Sink DataEPR's         ||
     @dmi:sourceMode value     ||       @dmi:sinkMode value      ||
-------------------------------||--------------------------------||   Data
     |      |   pull   |  3rd  ||      |      |   pull   |  3rd  || Transfer
pull | push | and push | party || pull | push | and push | party || possible
-----|------|----------|-------||------|------|----------|-------||----------
  X  |      |          |       ||  X   |      |          |       ||    -
-----|------|----------|-------||------|------|----------|-------||----------
  X  |      |          |       ||      |   X  |          |       ||    X
-----|------|----------|-------||------|------|----------|-------||----------
  X  |      |          |       ||      |      |    X     |       ||    X
-----|------|----------|-------||------|------|----------|-------||----------
  X  |      |          |       ||      |      |          |   X   ||    -
-----|------|----------|-------||------|------|----------|-------||----------
     |  X   |          |       ||  X   |      |          |       ||    -
-----|------|----------|-------||------|------|----------|-------||----------
     |  X   |          |       ||      |   X  |          |       ||    -
-----|------|----------|-------||------|------|----------|-------||----------
     |  X   |          |       ||      |      |    X     |       ||    -
-----|------|----------|-------||------|------|----------|-------||----------
     |  X   |          |       ||      |      |          |   X   ||    -
-----|------|----------|-------||------|------|----------|-------||----------
     |      |    X     |       ||  X   |      |          |       ||    -
-----|------|----------|-------||------|------|----------|-------||----------
     |      |    X     |       ||      |   X  |          |       ||    X
-----|------|----------|-------||------|------|----------|-------||----------
     |      |    X     |       ||      |      |    X     |       ||    X
-----|------|----------|-------||------|------|----------|-------||----------
     |      |    X     |       ||      |      |          |   X   ||    -
-----|------|----------|-------||------|------|----------|-------||----------
     |      |          |   X   ||  X   |      |          |       ||    -
-----|------|----------|-------||------|------|----------|-------||----------
     |      |          |   X   ||      |   X  |          |       ||    -
-----|------|----------|-------||------|------|----------|-------||----------
     |      |          |   X   ||      |      |    X     |       ||    -
-----|------|----------|-------||------|------|----------|-------||----------
     |      |          |   X   ||      |      |          |   X   ||    X

The table should be read like this.

"pull" means: The data can be only be pulled from the (source | sink) using
              the given protocol.
"push" means: The data can be only pushed into the (source | sink) using
              the given protocol.
"pull and push" means: The data can either be pulled from or pushed into
                       the (source | sink) using the given protocol.
"3rd party" means: The given protocol supports true 3rd party data transfers
                   that do not involve an intermediate to cache the data.

The table does not specify protocols as those combinations can occur for any
protocol that is supported by the source and/or sink. It is also clear that
except for 3rd party data transfer that the DTI always has to cache the data
in transit, i.e. the DTI "pull"s the data from the source and "push"es it to
the sink.

The table does *not* cover the use case where a particular implementation of
a data transfer protocol (or an extension to it) enables the DTI to:
a) instruct the source to push the data to the sink, or
b) instruct the sink to pull the data from the source.
Those use cases would require the definition of additional values for the
@dmi:sourceMode and @dmi:sinkMode attributes.

Those scenarios are possible, but I don't know of any data transfer protocol
standard that actually explicitly allows that.

This ominous "ParallelHTTP" I was working on is based on HTTP but explicitly
allowed entities such as the DTI to instruct a ParallelHTTP-enabled source
or sink to perform the described actions. However, the work on this stalled
and I don't know whether it is worth mentioning it in OGSA-DMI or even worth
pursuing this programming effort at all. Instead I am inclined to add a
ByteIO identifier as a very last common resort to the list of identified
data transfer standard. I know it's slow, but it's in the OGF track, and
there are several implementations available in UNICORE, GenesisII, EPCC's
OGSA-DAI effort. The UNICORE implementation also provides Globus integration
if I am not wrong.

I hope that this makes it a lot clearer what I intend to express with those
values.

Cheers,
Michel