Re: [ogsa-dmi-wg] DMI Telcon Minutes

8 Nov 2007

      As we agreed on yesterday's call, we really need to have an answer to the
failure mode where the DTF chooses a protocol that does not work, for
whatever reason, between the source and sink.

I am a strong proponent of option #1 below.  Here is how I see this playing
out.

We add the following fault to the description of the DTI (my apologies for
the poor WSDL and arbitrary choice of pseudo-code but I think that they
make the point):

      The DTI MUST raise the following fault if the protocol being used for
the transfer can not be used to transfer data between the source and sink.
            FailedProtocol [reason: xsd: any; updatedSource, updatedSink:
DEPR]
      The returned value "reason" MAY be used by the DTI to provide details
about the protocol failure.  The value of "reason" is dependent upon
            the DTI implementation.
      The returned values "updatedSource" and "updatedSink"  MUST refer to
the same source/sink of data as that passed into the DTF that created
            the DTI that has just failed.  These MUST differ from the
source/sink passed into that DTF only in that the value of the
"dmi:SupportedProtocol"
            field has been updated to remove the failing protocol, and any
other protocols that the DTI has determined will also fail.
      The client MAY use "reason" to determine its next action.  The
following pseudo-code is the core code that a typical client of DMI is
expected to use:
            source, sink: DEPR;
            retry: BOOLEAN := TRUE;
            << client sets source and sink in an implementation dependent
manner >>
            WHILE (retry) DO {
                  dti: DTI;
                  retry := FALSE;
                  dti := RequestDataTransferInstance (source, sink, ...) [
                        DMI: FailedProtcol(reason: ANY, newSource, newSink:
DEPR) => {
                              retry := TRUE;
                              source := newSource;
                              sink := newSInk;}];
                  };

Here are the pros and cons of this approach as I see them:
      1. The client is insulated from the logic needed to properly invoke
the DTF after this failure.  The client simply revokes the DTF.
      2. The client recovery code is just another failure case that the
client needs to handle when using the DTI.
      3. The client recovery code is simple/trivial.
      4. The DTI must have specific code to properly raise this error.  The
code to modify the source/sink DEPRs is fairly simple.  In any event, this
becomes
            something that the is written only once by the DTI
implementation and effectively used many times by the clients of DMI who
end up using this
            particular DTI implementation.
      5. We avoid introducing new architectural elements into a DMI
implementation.
      6. The client is aware of the error and must recover from it.  In an
ideal solution, we would hide this error situation from the client unless
there was no
            protocol that worked.

Allen Luniewski
IBM WebSphere Cross Brand Services
IBM Silicon Valley Laboratory
555 Bailey Ave.
San Jose, CA 95141

408-463-2255
408-930-1844 (mobile)

             Mario                                                         
             Antonioletti                                                  
             <mario@epcc.ed.ac                                          To 
             .uk>                      OGSA-DMI <ogsa-dmi-wg@ogf.org>      
             Sent by:                                                   cc 
             ogsa-dmi-wg-bounc                                             
             es@ogf.org                                            Subject 
                                       [ogsa-dmi-wg] DMI Telcon Minutes    

             11/08/2007 02:12                                              
             AM                                                            

Note: No call next week and the next call is on the 20th at the usual
time (not the 21st!).

OGSA-DMI Telcon - 07/11/07
==========================

Attendees:

                Steve Newhouse, Microsoft
                Mario Antonioletti, EPCC
                Allen Luniewski, IBM
                Michel Drescher, Fujitsu

Agenda:

1.       IPR Notice
2.       Previous Minutes and Action Review
3.       Agenda Bashing
4.       Issues Arising
5.       Spec Progress
6.       AOB

Actions:

[Ravi] Resolve minor comments on the spec.
[Ravi] Come up with a proposal on how to represent data aggregates
         (e.g. multiple files) with Data EPRs to the mailing list.
[Ravi] Come up with a proposal for a multiple retries property to the
         mailing list.
[Steve/Michel] Come up with a proposal for Data EPRs that can deal with
         the three scenarios proposed below.

http://www.ogf.org/pipermail/ogsa-dmi-wg/2007-October/000293.html

+---

Discussion on the Data EPRs and scenarios was postponed until Ravi
becomes available or comments on the suggestions proposed by Michel:

http://www.ogf.org/pipermail/ogsa-dmi-wg/2007-October/000293.html

and Steven:

http://www.ogf.org/pipermail/ogsa-dmi-wg/2007-October/000294.html

Discussion thus proceeded on how to address transfer failures by the
DMI architecture.

DTF transport negotiation is done by protocol matching but, at run
time, other factors may come into play that prevent the transfer from
succeeding, e.g. fire walls. At the moment this means that the DTI will
report the failure to the client but the client has no current way to
communicate this information back to the DTF - the client can talk to
the DTF again to effect the transfer but this will probably lead to
the same protocols being chosen and thus reproduce the same
failure. Allen suggested three possible solutions.

http://www.ogf.org/pipermail/ogsa-dmi-wg/2007-November/000301.html

Neither 1 and 2 are preferred, 3 requires active negotiation which is
not in scope for this version of the spec. Not clear that it would not
be able to do the transfer unless it performed a small test.

The following solutions were discussed all starting from the point
where the data transfer has began with the DTI but then failed:

1. The DTI returns the failure to the client with modified DEPRs to
    the client - effectively new ones with the failed protocol removed
    from the list of supported protocols. The client can then use these
    DEPRs to retry the transfer through the DTF.

2. The DTI returns the failed protocol to the client, the client modifies
    the data EPRs (as in 1 by removing the failed transport protocol)
    and resubmits these to the DEPRs.

3. The client passes on the DEPRs to the DTF as well as any failure
    messages returned by the DTI to the DTF which is then informed of
    the protocols that do not work (this might mean aggregating
    multiple failure messages which does not seem desirable).

4. The DTI is able to communicate an outcome of a transfer to the DTF
    which is then informed of what protocols do not work and it may be
    able to act on this.

5. There is a user agent between the DTF and DTI that maintains state
    and is able to apply some re-try policy on behalf of the client.
    This would maintain the clean interfaces and state models already
    in the spec.

Michel is not at all keen on 1 as the minting of the DEPRs should be
done by third parties. Allen not keen on 2 as this is making clients
do stuff that should not really be in their scope. Idea now is that
interested parties will flesh out the use case above (or some other)
that most appeals to them noting the pros and cons. It was felt that
we should not produce a first version of this spec that does not
address this problem.

Other factor addressed in the call was when the DTF can match more
than one protocol - how does it chose what protocol to use? The
fastest? The cheapest? etc? It was thought that for version 1 of the
scope this would not be addressed - i.e. the client does not provide
hints or QoS parameters BUT the DTF should publish what algorithm it
will use to choose a transport protocol when there is more than one
valid choice.

There will be no DMI call next week in part to SC07 and other
commitments.

The next call on the 20th of November at the same time (this is so as
to not clash with the night before thanks giving).

+-----------------------------------------------------------------------+
|Mario Antonioletti:EPCC,JCMB,The King's Buildings,Edinburgh EH9 3JZ.   |
|Tel:0131 650 5141|mario@epcc.ed.ac.uk|http://www.epcc.ed.ac.uk/~mario/ |
+-----------------------------------------------------------------------+
--
  ogsa-dmi-wg mailing list
  ogsa-dmi-wg@ogf.org
  http://www.ogf.org/mailman/listinfo/ogsa-dmi-wg