Re: [SAGA-RG] URLs and wildcards (was: More confusion)

3 Dec 2007

      ...
And to permissions_allow / permissions_deny.
Yep.
...
And in list and find of course, but those take strings, not
URLs, no problem here.  Here we can (and should) leave the
full wildcards IMHO.
Yes, but that is an unrelated story.
...
...
2. In ns_directory, method list has a parameter pattern, while method find
   has name_pattern. This should both be "pattern". It refers to the same kind
   of thing.
Right.  The parameter for find is called name_pattern to
distinguish it from the additional attrib_pattern pattern in
the overloaded find method in the replica package...  But
yes, they are the same thing.  So, if you want to have the
same parameter name, it should be name_pattern I guess?
OK.
...
...
Further thoughts about URLs and wildcards.
Another take on "why NOT having wildcards in URLs denoting files and 
directories":

1. the reason for having wildcards in the first place is to have something
   with the "look and feel" of POSIX shell wildcards in SAGA calls.

   ==> everything that contradicts this look-and-feel is to be ruled OUT

1.a. this means that all character sequences requiring octet-encoding of
   wildcard characters are OUT.

1.b. this further means that everything that can not be used in a straight
   forward way is "OUT" (meaning: everything that is NOT simple to use)

2. when using URLs we MUST conform to RFC1738

Let's look into RFC1738: (http://www.ietf.org/rfc/rfc1738.txt)

2.2 URL Character Encoding Issues

Unsafe:

Other characters are unsafe ...
These characters are "{", "}", "|", "\", "^", "~",  "[", "]", and "`".

All unsafe characters must always be encoded within a URL. 

Reserved:

The characters ";",
   "/", "?", ":", "@", "=" and "&" are the characters which may be
   reserved for special meaning within a scheme.

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
   reserved characters used for their reserved purposes may be used
   unencoded within a URL.

Let's look into reserved characters per protocol:

FTP:
Within a name or CWD component, the characters "/" and ";" are
   reserved and must be encoded.

HTTP:
Within the <path> and <searchpart> components, "/", ";", "?" are
   reserved.

file:
(no reserved characters mentioned)

aside: the use of the '*' in the NEWS scheme is irrelevant here because
this only applies to NNTP news, NOT to files or directories

Summary: POSIX shell-like wildcards in URLs:
- some characters like [ ] must be encoded
- depending on the protocol, other characters MUST be encoded or not

This means, we can NOT provide wildcards in URLs with an intuitive,
obvious to use (e.g., protocol-independent) way, without violating RFC1738.

We could, however, restrict ourselves to the '*' wildcard only, but this
is a very limited form of wildcards, although freqeuntly used, not really
worth being called "POSIX shell wildcards".
...
Hmm, a mail from me seem to have gone astray?  A while ago
in this thread I wrote:
So, unless my interpretation is wrong, I'd say that '*' is
explicitely allowed as wildcards.
Your interpretation IS wrong (see above, this is ONLY applicable to NNTP)
...
| And here are two other options actually for dealing with
| wildcards:
|
|  - allow only *, not the full blown shell wirldcards
Too limited (see above).
...
|
|  - or use different characters for wildcards, e.g.
|
|    data_[a-z].bin -> data_((a-z)).bin
|    image.?pg      -> image01.#pg
Not just slightly but stringly confusing, no no.
...
B: why the limitation to relative path names?
Idea: keep URLs for absolute, global identifiers. Have strings with POSIX
shell wildcards as local names, relative to the directory the operation is
working on.
...
For C speaks that '*' is, probably, the most commonly used
wildcard - so using that in the standard URL calls would
help a lot.  As for the other wildcards, a detour via expand
does not sound too bad anymore...
It leaves us wild the feelilng of a "hack" while we could also have a clean
solution: URLs without and relative strings with wildcards.

Thilo
-- 
Thilo Kielmann                                 http://www.cs.vu.nl/~kielmann/