Re: [SAGA-RG] URLs and wildcards

3 Dec 2007


      Hi all,

Andre Merzky wrote:
...
Hi Thilo, all,
Quoting [Thilo Kielmann] (Nov 29 2007):
...
Ceriel and I have been chatting about this issue, producing a proposal for
a solution.
Two observations (Thilo only) up front:
1. wildcards are ONLY applicable to the methods copy, link, move, and remove
   in class ns_directory, and to nothing else in the whole name space package.
And to permissions_allow / permissions_deny.
Agreed.
...
And in list and find of course, but those take strings, not
URLs, no problem here.  Here we can (and should) leave the
full wildcards IMHO.
Agreed.
...
...
2. In ns_directory, method list has a parameter pattern, while method find
   has name_pattern. This should both be "pattern". It refers to the same kind
   of thing.
Right.  The parameter for find is called name_pattern to
distinguish it from the additional attrib_pattern pattern in
the overloaded find method in the replica package...  But
yes, they are the same thing.  So, if you want to have the
same parameter name, it should be name_pattern I guess?
...
Further thoughts about URLs and wildcards.
3. In ns_directory, list and find with their "pattern" parameter actually
   refer to pathnames, relative to the current working directory (CWD).
   We should say that explicitly in the spec.
4. URLs, according to the RFC do NOT provide wildcards for files.
Hmm, a mail from me seem to have gone astray?  A while ago
in this thread I wrote:
| Quoting [Thilo Kielmann] (Nov 26 2007):
|
|| URLs, however, do not allow for wildcards, according to RFC1738.
| 
| Well, RFC1738 actually refers wildcards explicitely, e.g. in
| Section 3.6. NEWS:
| 
|     If <newsgroup-name> is "*" (as in <URL:news:*>), it is
|     used to refer to "all available news groups".
So, unless my interpretation is wrong, I'd say that '*' is
explicitely allowed as wildcards.
True, I think I mentioned in my original mail that '*' was OK, but
the other wildcard characters are not.
...
...
(Non-)options:
a) add specific wildcards (like '*') to the URLs we use.
      This would not be corformant to the RFC, so it would no longer be URLs.
See above.
...
b) "Use" the query mechanism for http to express wildcards for files.
      While possible "in theory" this would be far from obvious, so this would
      NOT be anything "simple" to use. (remember the "S" in SAGA)
Yep, I agree.
...
c) Wildcard characters could be brought into URLs by %-escape sequences.
      Argument as with query: non-intuitive, not simple for the user.
I agree.  Another options would be (also from my previous
mail):
| And here are two other options actually for dealing with
| wildcards:
|
|  - allow only *, not the full blown shell wirldcards
|
|  - or use different characters for wildcards, e.g.
|
|    data_[a-z].bin -> data_((a-z)).bin
|    image.?pg      -> image01.#pg
|
| I would find the second one slightly confusing, but an
| option it is.
...
Summary: we MUST NOT introduce file wildcards to URLs.
Hhmmmm... ;-)
...
This leaves us (IOHO - Ceriel and me) with two possible options for wildcards
for namespace entries (as expressed for operations on ns_directories):
A. Have an additional method expand that takes a string parameter describing
   a pathname, relative to the CWD, (possibly) containing POSIX-style shell
   wildcards.
   expand() has an output parameter, an array of URLs, the expansion.
In addition to expand(), we add versions of the methods 
   copy, link, move, and remove from ns_directory that accept arrays of URLs
   instead of single URLs. (If we do not add these versions, we force the
   users to resort to bulk execution of tasks for a simple thing like
   "remove *.doc")
B. Add versions of the methods copy, link, move, and remove from ns_directory
   that accept a string parameter describing a pathname, relative to the CWD,
   (possibly) containing POSIX-style shell wildcards.
C.   - allow * as wildcard in URLs (in the path element part)
     - allow normal wildcards for the string pattern in list and find
     - for all other wildcards ([a-z], ?, {one,two,three}) use
        expand(), and require user level loops over te result.
A and B both have the problem of bloat -- not too badly though (6 calls).
B: why the limitation to relative path names?
Not really needed, indeed, but conceptually, wildcard expansion operates
on a directory, and we are talking about methods on directories here.
...
...
Comparing both options, Ceriel and myself are in favour of B.
It comes with less methods and a simpler and more obvious-to-use interface.
I vote for C *blush*.
<F2>
...
A is a very indirect solution where a user first has to build a list of URLs
from a wildcard string, and then has to pass this list of URLs to, e.g., copy.
Agree.
...
With B, the user can directly pass the wildcard string to, e.g., copy.
The "trick" is that the string is restricted in its expressiveness, namely to
pathnames relative to the CWD.
For C speaks that '*' is, probably, the most commonly used
wildcard - so using that in the standard URL calls would
help a lot.  As for the other wildcards, a detour via expand
does not sound too bad anymore...
I can live with C :-) although it is a bit of an ad-hoc solution.
I like B a bit better, because it is more explicit about which methods
accept wildcards.

Ceriel

Re: [SAGA-RG] URLs and wildcards

Ceriel Jacobs