
Hi all, Andre Merzky wrote:
Hi Thilo, all,
Quoting [Thilo Kielmann] (Nov 29 2007):
Ceriel and I have been chatting about this issue, producing a proposal for a solution.
Two observations (Thilo only) up front:
1. wildcards are ONLY applicable to the methods copy, link, move, and remove in class ns_directory, and to nothing else in the whole name space package.
And to permissions_allow / permissions_deny.
Agreed.
And in list and find of course, but those take strings, not URLs, no problem here. Here we can (and should) leave the full wildcards IMHO.
Agreed.
2. In ns_directory, method list has a parameter pattern, while method find has name_pattern. This should both be "pattern". It refers to the same kind of thing.
Right. The parameter for find is called name_pattern to distinguish it from the additional attrib_pattern pattern in the overloaded find method in the replica package... But yes, they are the same thing. So, if you want to have the same parameter name, it should be name_pattern I guess?
Further thoughts about URLs and wildcards.
3. In ns_directory, list and find with their "pattern" parameter actually refer to pathnames, relative to the current working directory (CWD). We should say that explicitly in the spec.
4. URLs, according to the RFC do NOT provide wildcards for files.
Hmm, a mail from me seem to have gone astray? A while ago in this thread I wrote:
| Quoting [Thilo Kielmann] (Nov 26 2007): | || URLs, however, do not allow for wildcards, according to RFC1738. | | Well, RFC1738 actually refers wildcards explicitely, e.g. in | Section 3.6. NEWS: | | If <newsgroup-name> is "*" (as in <URL:news:*>), it is | used to refer to "all available news groups".
So, unless my interpretation is wrong, I'd say that '*' is explicitely allowed as wildcards.
True, I think I mentioned in my original mail that '*' was OK, but the other wildcard characters are not.
(Non-)options:
a) add specific wildcards (like '*') to the URLs we use. This would not be corformant to the RFC, so it would no longer be URLs.
See above.
b) "Use" the query mechanism for http to express wildcards for files. While possible "in theory" this would be far from obvious, so this would NOT be anything "simple" to use. (remember the "S" in SAGA)
Yep, I agree.
c) Wildcard characters could be brought into URLs by %-escape sequences. Argument as with query: non-intuitive, not simple for the user.
I agree. Another options would be (also from my previous mail):
| And here are two other options actually for dealing with | wildcards: | | - allow only *, not the full blown shell wirldcards | | - or use different characters for wildcards, e.g. | | data_[a-z].bin -> data_((a-z)).bin | image.?pg -> image01.#pg | | I would find the second one slightly confusing, but an | option it is.
Summary: we MUST NOT introduce file wildcards to URLs.
Hhmmmm... ;-)
This leaves us (IOHO - Ceriel and me) with two possible options for wildcards for namespace entries (as expressed for operations on ns_directories):
A. Have an additional method expand that takes a string parameter describing a pathname, relative to the CWD, (possibly) containing POSIX-style shell wildcards. expand() has an output parameter, an array of URLs, the expansion.
In addition to expand(), we add versions of the methods copy, link, move, and remove from ns_directory that accept arrays of URLs instead of single URLs. (If we do not add these versions, we force the users to resort to bulk execution of tasks for a simple thing like "remove *.doc")
B. Add versions of the methods copy, link, move, and remove from ns_directory that accept a string parameter describing a pathname, relative to the CWD, (possibly) containing POSIX-style shell wildcards.
C. - allow * as wildcard in URLs (in the path element part) - allow normal wildcards for the string pattern in list and find - for all other wildcards ([a-z], ?, {one,two,three}) use expand(), and require user level loops over te result.
A and B both have the problem of bloat -- not too badly though (6 calls).
B: why the limitation to relative path names?
Not really needed, indeed, but conceptually, wildcard expansion operates on a directory, and we are talking about methods on directories here.
Comparing both options, Ceriel and myself are in favour of B. It comes with less methods and a simpler and more obvious-to-use interface.
I vote for C *blush*. <F2>
A is a very indirect solution where a user first has to build a list of URLs from a wildcard string, and then has to pass this list of URLs to, e.g., copy.
Agree.
With B, the user can directly pass the wildcard string to, e.g., copy. The "trick" is that the string is restricted in its expressiveness, namely to pathnames relative to the CWD.
For C speaks that '*' is, probably, the most commonly used wildcard - so using that in the standard URL calls would help a lot. As for the other wildcards, a detour via expand does not sound too bad anymore...
I can live with C :-) although it is a bit of an ad-hoc solution. I like B a bit better, because it is more explicit about which methods accept wildcards. Ceriel