find in name spaces

29 Jul 2005

      And hi again.

The current strawman API does not contain any means to find
entries in a name space (e.g. to find a specific logical
file in a replica catalog).  The only way to perform such a
operation is currently to walk through the namespace 'by
foot', and evaluating the output of 'list' and the various
inspection methods (isFile, isDir...) and possible
attributes attached to the entries.

We left out 'find' until now for the main reason that a good
and mostly complete query language seemed difficult to
define.

However, my opinion is that a _simple_ find can easily be
defined, hence this proposal for a simple find on
namespaces:

1) find on names 

   common find operations on name spaces search for name
   patterns 

   "ls *.tex"
   "ls data/*/summary.dat)"

   such operations should be covered by a find.

2) find on meta data

   another common find paradigma is to find entries
   according to some meta data

   "find files named summary.dat"
   "find files owned by Fritz"
   "find files older than 3 days"
   "find files larger of size 1024 bytes"
   "find files larger than 1 MB"

The second set of find examples impy some knowledge about
attribute semantic, and that is where the complexity for
find operations comes in.  e.g. "find files older than 3
days" needs a compare relation for dates and times, and a
specification about metadata structure (how is date/time
represented?

Part one of the problem is, in my opinion, easily dealt
with, in two ways:

  - allow shell wildcards for all 'name' arguments on the
    name space operations:

    dir.list ("*.tex");
    dir.move ("data/*/summary.dat", "/data/summary/");

  - add a simple find, which runs recursively (removing the
    burden of implementing that on application level over
    and over again)

    dir.find ("data/*.tex");

Part two of the problem is, as said, more difficult.
However, I think it can be coherently dealt with, in a way
which is consistent with the evolution of SAGA.

The strawman API includes the AttributeSet interface.  By
simply adding that interface to the NameSpace, and hence
adding attributes to name space entries, a find can run over
these attributes.

More complex find semantics (dates, owners etc.) can evolve
with more complex attribute types, which are planned to get
defined in the future anyway.  

BTW: That argument can also be made in reverse: attributes
     will be more useful if there is a find which can operate 
     on them.

BTW: And that same argument can be made for ACL which are to
     be defined in SAGA still.

As attributes are currently just key/value pairs of strings,
the name space would look like:

  interface SAGA::NSEntry : extends-all SAGA::Attributes { }
  interface SAGA::NSDir   : extends-all SAGA::NSEntry { 

    find (in  array<string,  1>  query, 
          out array<NSEntry, 1>  matches);
  }

The query would be defined as a set of key/value pairs, such
as:

  "name=data/*.tex"
  "tag=useful"
  "enabled=" (memaning key must be present with any value)

Note that this does not allow search for "size>2MB" or
"date>2days" etc. - again, such queries need to wait until
the attribute interface becomes more sophisticated.

I would be happy to hear feedback on the list.  

 - does it make sense at all?
 - is it too simple to be of any use?
 - is it too complex to fit into SAGA?

Cheers, Andre.

PS.: That proposal does not imply that any name space
     instance has to implement ar bitrary persistent 
     meta data!
     Attributes can be read only (file size), and the 
     set of attributes supported can be limited by the 
     class specification (we do the very same for the 
     JobDefinition).

-- 
+-----------------------------------------------------------------+
| Andre Merzky                      | phon: +31 - 20 - 598 - 7759 |
| Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 |
| Dept. of Computer Science         | mail: merzky@cs.vu.nl       |
| De Boelelaan 1083a                | www:  http://www.merzky.net |
| 1081 HV Amsterdam, Netherlands    |                             |
+-----------------------------------------------------------------+

Andre Merzky

tags

participants (1)