
Folks, here're my thoughts and proposals concerning the current DMI architecture (actually, this is a result of a consultation of David Snelling). Following some sort of requirements list: 1) DMI needs to offer reliability to users 2) DMI needs to enable data movements across Grid middlewares 3) DMI needs to support more than one protocol 4) However, DMI does NOT need to offer protocol translation (i.e. source talks GridFTP, sink talks HTTP) 5) Instead, DMI does protocol negotiation between soource and sink. 6) Users need not know which protocol is employed to get the data movement job done. 7) DMI need not technically understand striping or replicas or what. 8) DMI need to understand the semantics of striping, replicas, etc. instead. 9) Once a protocol between source and sink is negotiated, DMI acts as the controller instance to that protocol. There are certainly more requirements necessary, but that's all I can think of for the moment. Having said that I support Allens concern how the factory solves the problem we describe as "protocol negotiation", and his example nicely illustrates that. So, DMI needs to find a solution. But how? One option would be to implement the interface at the source and sink as proposed in the Data Architecture document. My take on that option is that it would put too much burden on existing deployments to enforce this solution. The result would be that DMI would not be implemented/accepted. An alternative approach to that is as follows. As per current draft, the user of DMI obtains two EPRs out-of-bands that describe the data to be moved, from the source and sink, respectively. We also know, that the DMI Factory needs to sort out which data transport protocol shall be employed between source and sink location. The question is how. Most protocols do not have a mechanism built-in that allows for its automatic detection (that's how I understood Allen's concern). As a consequence, DMI should define at least information elements and identifiers thatdescribe supported protocols. For this proposal I assume DMI defines the following information elements in DMI namespace: [source protocol] The [source protocol] information element identifies a data transport protocol a node in a Data Grid supports when it acts as a source node in a DMI governed data movement. Its type is defined as xsd:QName. When conveyed in a SOAP message this information element is serialised into XML as follows (Pseudo XML code): <dmi:SourceProtocol>xsd:QName</dmi:SourceProtocol> [sink protocol] The [sink protocol] information element identifies a data transport protocol a node in a Data Grid supports when it acts as a sink node in a DMI governed data movement. Its type is defined as xsd:QName. When conveyed in a SOAP message this information element is serialised into XML as follows (Pseudo XML code): <dmi:SinkProtocol>xsd:QName</dmi:SinkProtocol> We need to define those two different information elements because some data transport protocols are asymmetric. For example, a node may support/implement HTTP, but only when acting as a sink (i.e. accepting only HTTP GET requests). For now I defined their types as xsd:QName but they can arguably be changed to any other suitable type (e.g. xsd:anyURI as in OGSA-ByteIO). To identify different data transport protocols, I define identifiers for now as follows: xmlns:dmi-prot="http://www.ogf.org/ogsa-dmi/2007/02/transport-protocols" 1) GridFTP v2.0 shall be identified by the QName "dmi-prot:gridftp-v20" 2) HTTP/1.1 shall be identified by the QName "dmi-prot:http-v11" 3) FTP shall be identified by the QName "dmi-prot:ftp" 4) Passive FTP, when supported as sink only shall be identified by the QName "dmi-prot:ftp-passive" However, we still have not solved the problem how source and sink should make available this kind of information, and how a DMII Factory would query this information. From here on, I can think of at least two different opptions how to solve this, thanks to the definition of the information elements as outlined above. The basic assumption is that, in a Data Grid, source, sinks and even DMI (once implemented) may or even may *not* be tightly coupled. So our approach should support both scenarios (or eeven more if identified). DMI needs that information from the source and sink, so DMI must normatively define that source and sink MUST provide a list of [source protocol]s and [sink protocol]s. Keeping in mind that the DMI functional specification must not make any assumptions on the message format being used when acquiring that information, we punt that into the rendering specifications that we will write in the future: a) OGSA WSRF Base Provile 1.0 The WSRF binding specification defines that the source and sink nodes MUST publish their [source protocol]s and [sink protocol]s as Resource Properties. A source node MAY also publish [sink protocol]s when it is also capable serving as a sink node. Respective for sink nodes. For example, the DMI Factory could invoke the rp:GetResourceProperty operation using the QName "dmi:SourceProtocol" when quizzing the source, and using the QName "dmi:SinkProtocol" when quizzing the sink, respectively. b) WS-I 1.1 Base Profile The WS-I binding specification defines that the source and sink nodes MUST publish their [source protocol]s and [sink protocol]s as a.1) respective lists of protocols in the wsa:MetaData section in the wsa:EPR describing the data source or data sink, respectively. For example, a non-normative wsa:EPR for the source may look like follows: <wsa:EndpointReference> <wsa:address>http://foo.example.org/bar/baz</wsa:address> <wsa:ReferenceParameters> [thinks like striping, replica, information etc] </wsa:ReferenceParameters> <wsa:MetaData> <dmi:SourceProtocol> dmi-prot:gridftp-v20 </dmi:SourceProtocol> <dmi:SourceProtocol> dmi-prot:ftp </dmi:SourceProtocol> </wsa:MetaData> </wsa:EndpointReference> c) WS-ResourceTransfer (??) No clue how this could/should be rendered here - WS-RT is evil. ;-) I think that this leaves enough freedom to the implementors of DMI to choose which rendering they want to use while specifying enough information eelements and formats so that DMI can accomplish its designated work effectively. Of course, all this information must be reflected in the specification, which of course it isn't at present... :-/ Cheers, Michel -- Michel <dot> Drescher <at> uk <dot> fujitsu <dot> com Fujitsu Laboratories of Europe +44 20 8606 4834