Re: [ogsa-dmi-wg] DMI Factory Concern

23 Feb 2007

      Folks,

here're my thoughts and proposals concerning the current DMI architecture
(actually, this is a result of a consultation of David Snelling).

Following some sort of requirements list:
 1) DMI needs to offer reliability to users
 2) DMI needs to enable data movements across Grid middlewares
 3) DMI needs to support more than one protocol
 4) However, DMI does NOT need to offer protocol translation
    (i.e. source talks GridFTP, sink talks HTTP)
 5) Instead, DMI does protocol negotiation between soource and sink.
 6) Users need not know which protocol is employed to get the data
    movement job done.
 7) DMI need not technically understand striping or replicas or what.
 8) DMI need to understand the semantics of striping, replicas, etc.
    instead.
 9) Once a protocol between source and sink is negotiated, DMI acts
    as the controller instance to that protocol.
There are certainly more requirements necessary, but that's all I can think
of for the moment.

Having said that I support Allens concern how the factory solves the
problem we describe as "protocol negotiation", and his example nicely
illustrates that.

So, DMI needs to find a solution. But how?

One option would be to implement the interface at the source and sink as
proposed in the Data Architecture document. My take on that option is that
it would put too much burden on existing deployments to enforce this
solution. The result would be that DMI would not be implemented/accepted.

An alternative approach to that is as follows.

As per current draft, the user of DMI obtains two EPRs out-of-bands that
describe the data to be moved, from the source and sink, respectively. We
also know, that the DMI Factory needs to sort out which data transport
protocol shall be employed between source and sink location. The question
is how.

Most protocols do not have a mechanism built-in that allows for its
automatic detection (that's how I understood Allen's concern). As a
consequence, DMI should define at least information elements and
identifiers thatdescribe supported protocols.

For this proposal I assume DMI defines the following information elements
in DMI namespace:

[source protocol]
The [source protocol] information element identifies a data transport
protocol a node in a Data Grid supports when it acts as a source node in a
DMI governed data movement. Its type is defined as xsd:QName.
When conveyed in a SOAP message this information element is serialised into
XML as follows (Pseudo XML code):
<dmi:SourceProtocol>xsd:QName</dmi:SourceProtocol>

[sink protocol]
The [sink protocol] information element identifies a data transport
protocol a node in a Data Grid supports when it acts as a sink node in a
DMI governed data movement. Its type is defined as xsd:QName.
When conveyed in a SOAP message this information element is serialised into
XML as follows (Pseudo XML code):
<dmi:SinkProtocol>xsd:QName</dmi:SinkProtocol>

We need to define those two different information elements because some
data transport protocols are asymmetric. For example, a node may
support/implement HTTP, but only when acting as a sink (i.e. accepting only
HTTP GET requests). For now I defined their types as xsd:QName but they can
 arguably be changed to any other suitable type (e.g. xsd:anyURI as in
OGSA-ByteIO).

To identify different data transport protocols, I define identifiers for
now as follows:
xmlns:dmi-prot="http://www.ogf.org/ogsa-dmi/2007/02/transport-protocols"
1) GridFTP v2.0 shall be identified by the QName "dmi-prot:gridftp-v20"
2) HTTP/1.1 shall be identified by the QName "dmi-prot:http-v11"
3) FTP shall be identified by the QName "dmi-prot:ftp"
4) Passive FTP, when supported as sink only shall be identified by the
   QName "dmi-prot:ftp-passive"

However, we still have not solved the problem how source and sink should
make available this kind of information, and how a DMII Factory would query
this information.

From here on, I can think of at least two different opptions how to solve
this, thanks to the definition of the information elements as outlined above.

The basic assumption is that, in a Data Grid, source, sinks and even DMI
(once implemented) may or even may *not* be tightly coupled. So our
approach should support both scenarios (or eeven more if identified).

DMI needs that information from the source and sink, so DMI must
normatively define that source and sink MUST provide a list of [source
protocol]s and [sink protocol]s.

Keeping in mind that the DMI functional specification must not make any
assumptions on the message format being used when acquiring that
information, we punt that into the rendering specifications that we will
write in the future:

a) OGSA WSRF Base Provile 1.0
The WSRF binding specification defines that the source and sink nodes MUST
publish their [source protocol]s and [sink protocol]s as Resource
Properties. A source node MAY also publish [sink protocol]s when it is also
capable serving as a sink node. Respective for sink nodes.
For example, the DMI Factory could invoke the rp:GetResourceProperty
operation using the QName "dmi:SourceProtocol" when quizzing the source,
and using the QName "dmi:SinkProtocol" when quizzing the sink, respectively.

b) WS-I 1.1 Base Profile
The WS-I binding specification defines that the source and sink nodes MUST
publish their [source protocol]s and [sink protocol]s as
a.1) respective lists of protocols in the wsa:MetaData section in the
wsa:EPR describing the data source or data sink, respectively.
For example, a non-normative wsa:EPR for the source may look like follows:

<wsa:EndpointReference>

    <wsa:address>http://foo.example.org/bar/baz</wsa:address>

    <wsa:ReferenceParameters>
        [thinks like striping, replica, information etc]
    </wsa:ReferenceParameters>

    <wsa:MetaData>
        <dmi:SourceProtocol> dmi-prot:gridftp-v20 </dmi:SourceProtocol>
        <dmi:SourceProtocol> dmi-prot:ftp </dmi:SourceProtocol>
    </wsa:MetaData>

</wsa:EndpointReference>

c) WS-ResourceTransfer (??)
No clue how this could/should be rendered here - WS-RT is evil. ;-)

I think that this leaves enough freedom to the implementors of DMI to
choose which rendering they want to use while specifying enough information
eelements and formats so that DMI can accomplish its designated work
effectively.

Of course, all this information must be reflected in the specification,
which of course it isn't at present... :-/

Cheers,
Michel

-- 
Michel <dot> Drescher <at> uk <dot> fujitsu <dot> com
Fujitsu Laboratories of Europe
+44 20 8606 4834

Re: [ogsa-dmi-wg] DMI Factory Concern

Michel Drescher