[SAGA-RG] URLs and wildcards (was: More confusion)

29 Nov 2007

      Ceriel and I have been chatting about this issue, producing a proposal for
a solution.

Two observations (Thilo only) up front:

1. wildcards are ONLY applicable to the methods copy, link, move, and remove
   in class ns_directory, and to nothing else in the whole name space package.

2. In ns_directory, method list has a parameter pattern, while method find
   has name_pattern. This should both be "pattern". It refers to the same kind
   of thing.

Further thoughts about URLs and wildcards.

3. In ns_directory, list and find with their "pattern" parameter actually
   refer to pathnames, relative to the current working directory (CWD).
   We should say that explicitly in the spec.

4. URLs, according to the RFC do NOT provide wildcards for files.
   (Non-)options:

   a) add specific wildcards (like '*') to the URLs we use.
      This would not be corformant to the RFC, so it would no longer be URLs.
   b) "Use" the query mechanism for http to express wildcards for files.
      While possible "in theory" this would be far from obvious, so this would
      NOT be anything "simple" to use. (remember the "S" in SAGA)
   c) Wildcard characters could be brought into URLs by %-escape sequences.
      Argument as with query: non-intuitive, not simple for the user.

   Summary: we MUST NOT introduce file wildcards to URLs.

This leaves us (IOHO - Ceriel and me) with two possible options for wildcards
for namespace entries (as expressed for operations on ns_directories):

A. Have an additional method expand that takes a string parameter describing
   a pathname, relative to the CWD, (possibly) containing POSIX-style shell
   wildcards.
   expand() has an output parameter, an array of URLs, the expansion.

   In addition to expand(), we add versions of the methods 
   copy, link, move, and remove from ns_directory that accept arrays of URLs
   instead of single URLs. (If we do not add these versions, we force the
   users to resort to bulk execution of tasks for a simple thing like
   "remove *.doc")

B. Add versions of the methods copy, link, move, and remove from ns_directory
   that accept a string parameter describing a pathname, relative to the CWD,
   (possibly) containing POSIX-style shell wildcards.

Comparing both options, Ceriel and myself are in favour of B.
It comes with less methods and a simpler and more obvious-to-use interface.

A is a very indirect solution where a user first has to build a list of URLs
from a wildcard string, and then has to pass this list of URLs to, e.g., copy.
With B, the user can directly pass the wildcard string to, e.g., copy.
The "trick" is that the string is restricted in its expressiveness, namely to
pathnames relative to the CWD.

Any opinions on the proposal of implementing solution B ???

Thilo
-- 
Thilo Kielmann                                 http://www.cs.vu.nl/~kielmann/