New subject: [saga-rg] proposal for extended file IO

12 Jun 2005

      Hi again, 

consider following use case for remote IO.  Given a large
binary 2D field on a remote host, the client wans to access
a 2D sub portion of that field.  Dependend on the remote
file layout, that requires usually more than one read
operation, since the standard read (offset, length) is
agnostic to the 2D layout.

For more complex operations (subsampling, get a piece of a
jpg file), the number of remote operations grow very fast.
Latency then stringly discourages that type of remote IO.

For that reason, I think that the remote file IO as
specified by SAGA's Strawman as is will only be usable for a
limited and trivial set of remote I/O use cases.

There are three (basic) approaches:

  A) get the whole thing, and do ops locally
     Pro: - one remote op, 
          - simple logic
          - remote side doesn't need to know about file
            structure
          - easily implementable on application level
     Con: - getting the header info of a 1GB data file comes
            with, well, some overhead ;-)

  B) clustering of calls: do many reads, but send them as a
     single request.
     Pro: - transparent to application
          - efficient
     Con: - need to know about dependencies of reads
            (a header read needed to determine size of
            field), or included explicite 'flushes'
          - need a protocol to support that
          - the remote side needs to support that

  C) data specific remote ops: send a high level command,
     and get exactly what you want.
     Pro: - most efficient
     Con: - need a protocol to support that
          - the remote side needs to support that _specific_
            command

The last approach (C) is what I have best experiences with.
Also, that is what GridFTP as a common file access protocol
supports via ERET/ESTO operations.

I want to propose to include a C-like extension to the File
API of the strawman, which basically maps well to GridFTP,
but should also map to other implementations of C.

That extension would look like:

      void lsEModes   (out array<string,1> emodes   );
      void eWrite      (in  string          emode,
                        in  string          spec,
                        in  string          buffer
                        out long            len_out  );
      void eRead       (in  string          emode,
                        in  string          spec,
                        out string          buffer, 
                        out long            len_out  );

      - hooks for gridftp-like opaque ERET/ESTO features
      - spec:  string for pattern as in GridFTP's ESTO/ERET
      - emode: string for ident.  as in GridFTP's ESTO/ERET

EMode:        a specific remote I/O command supported
lsEModes:     list the EModes available in this implementation
eRead/eWrite: read/write data according to the emode spec

Example (in perl for brevity):

  my $file   = SAGA::File new ("http://www.google.com/intl/en/images/logo.gif");
  my @emodes = $file->lsEModes ();

  if ( grep (/^jpeg_block$/, @emodes) )
  {
    my ($buff, $len) = file.eRead ("jpeg_block", "22x4+7+8");
  }

I would discourage support for B, since I do not know any
protocoll supporting that approach efficiently, and also it
needs approximately the same infrastructure setup as C.

As A is easily implementable on application level, or within
any SAGA implementation, there is no need for support on API
level -- however, A is insufficient for all but some trivial
cases.

Comments welcome :-))

Cheers, Andre.

-- 
+-----------------------------------------------------------------+
| Andre Merzky                      | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science         | mail: merzky@cs.vu.nl       |
| De Boelelaan 1083a                | www:  http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands    |                             |
+-----------------------------------------------------------------+

proposal for extended file IO

Andre Merzky

Hartmut Kaiser

Andre Merzky

Jon MacLaren

Andre Merzky

Jon MacLaren

Andre Merzky

Jon MacLaren

Andre Merzky

John Shalf

Andrei Hutanu

Andre Merzky

Andrei Hutanu

Andre Merzky

Andrei Hutanu

Andre Merzky

John Shalf

Andre Merzky

John Shalf

Andre Merzky

Andre Merzky

John Shalf

Andre Merzky

John Shalf

Andre Merzky

Thorsten Schuett

Andrei Hutanu

Andre Merzky

Andre Merzky

John Shalf

Andre Merzky

Andre Merzky

Thorsten Schuett

Andre Merzky

Thorsten Schuett

Andre Merzky

tags

participants (6)