Re: [SAGA-RG] spec

newer
SAGA-related sessions att OGF22

older
Fwd (andre@merzky.net): Re: More...

Andre Merzky

12 Dec 2007 12 Dec '07

8:23 p.m.

Quoting [Thilo Kielmann] (Dec 12 2007):

...

On Tue, Dec 11, 2007 at 11:37:16PM +0100, Andre Merzky wrote:

...
From: Andre Merzky <andre@merzky.net> To: Shantenu Jha <sjha@cct.lsu.edu>, Thilo Kielmann <kielmann@cs.vu.nl> Subject: spec

[...]

So, I'll go ahead and submit the spec tomorrow, ok? :-)

Veto!!!

The spec is NOT OK as the wildcard thing is not resolved.

What's the solution???

Thilo

Ah, right, that one... That indeed needs resolving. I post this answer to the list again then. I guess its 'your' way (B), as there were two votes (Ceriel, Thilo) against one (me). Hartmut voted as well for your suggestion, but on a choice which did not include 'my' version (C). So, either we just go with B (which I don't like too much, but hey... ;-), or do another straw poll, which is boring. Not a win-win-situation IMHO :-P Ok, Proposal: I summarize the options and their pro and cons below, and you and all list members you has a chance to do the right thing and vote 'C' until tomorrow evening. If nothing happens, 'B' is the one we will use. Options: (B) Add versions of the methods copy, link, move, and remove from ns_directory that accept a string parameter describing a pathname, relative to the CWD, (possibly) containing POSIX-style shell wildcards Correction 1: also for permission_allow/permission_deny Correction 2: remove the limitation to relative paths (C) Allow '*' as wildcard in URLs (in the path element part). Add an expand method for more elaborate wildcard expansion, which accepts a string and returns a list of URLs. Looping is in user space. In both cases, wildcards for attribs, find etc (which all are string patterns) are unaffected. In both cases, wildcards should only be allowed in the path element elements of the URLs (i.e. not in the scheme, host, port, auth or query elements of the URL). I am not sure if we came to a closure on 'Correction 2', but I think that Ceriels answer ("if you need abs paths, then do a 'cd()' to somewhere where you can use rel paths") is a rather inelegant workaround for a limitation which is not too well motivated, IMHO. So, please raise your voice now or hold your peace for evermore! Cheers, Andre. PS.: sorry for this biased mail ;-) -- No trees were destroyed in the sending of this message, however, a significant number of electrons were terribly inconvenienced.

Show replies by date

Ceriel Jacobs

13 Dec 13 Dec

8:39 a.m.

New subject: spec

Andre Merzky wrote:

...

Quoting [Thilo Kielmann] (Dec 12 2007):

...
On Tue, Dec 11, 2007 at 11:37:16PM +0100, Andre Merzky wrote:

...
From: Andre Merzky <andre@merzky.net> To: Shantenu Jha <sjha@cct.lsu.edu>, Thilo Kielmann <kielmann@cs.vu.nl> Subject: spec

[...]

So, I'll go ahead and submit the spec tomorrow, ok? :-)

Veto!!!

The spec is NOT OK as the wildcard thing is not resolved.

What's the solution???

Thilo

Ah, right, that one... That indeed needs resolving. I post this answer to the list again then.

I guess its 'your' way (B), as there were two votes (Ceriel, Thilo) against one (me).

Well, I was not aware that I have an official vote here! I have a slight preference for B, but am also the origin of the expand method from 'your' version C. I don't really care, as long as the issue is resolved.

...

I am not sure if we came to a closure on 'Correction 2', but I think that Ceriels answer ("if you need abs paths, then do a 'cd()' to somewhere where you can use rel paths") is a rather inelegant workaround for a limitation which is not too well motivated, IMHO.

I did not write this to defend rel paths, but to point out that it could be done, where you said that it could not be done. Anyway, since the subject of this mail is "spec", here are another couple of small issues, which are easily fixed: - rpc package: the parameter constructor does not throw NotImplemented, but the buffer constructor does, and parameter extends buffer. - there are many places where attributes are defined as having mode "Read". This should be "ReadOnly". (do grep 'mode: Read$' *.tex and grep 'mode: Read,' *.tex to find them.) - Appendix A, example 1: still uses strings instead of urls, and uses file where only ns_entry is needed. Cheers, Ceriel

Andre Merzky

1:32 p.m.

New subject: spec

Hi Ceriel, Quoting [Ceriel Jacobs] (Dec 13 2007):

...

...
Ah, right, that one... That indeed needs resolving. I post this answer to the list again then.

I guess its 'your' way (B), as there were two votes (Ceriel, Thilo) against one (me).

Well, I was not aware that I have an official vote here!

:-) Basically, everybody who subscribes to the mailing list or participates in the meetings can vote.

...

I have a slight preference for B, but am also the origin of the expand method from 'your' version C. I don't really care, as long as the issue is resolved.

...
I am not sure if we came to a closure on 'Correction 2', but I think that Ceriels answer ("if you need abs paths, then do a 'cd()' to somewhere where you can use rel paths") is a rather inelegant workaround for a limitation which is not too well motivated, IMHO.

I did not write this to defend rel paths, but to point out that it could be done, where you said that it could not be done.

Ok, point taken.

...

Anyway, since the subject of this mail is "spec", here are another couple of small issues, which are easily fixed:

- rpc package: the parameter constructor does not throw NotImplemented, but the buffer constructor does, and parameter extends buffer.

- there are many places where attributes are defined as having mode "Read". This should be "ReadOnly". (do grep 'mode: Read$' *.tex and grep 'mode: Read,' *.tex to find them.)

- Appendix A, example 1: still uses strings instead of urls, and uses file where only ns_entry is needed.

Thanks for those, I'll fix them! Please note that the wildcard issue is REALLY the last known issue - as soon as we have that settled the spec is out. So if you have discovered anything else... ;)

...

Cheers, Ceriel

Thanks, Andre. -- No trees were destroyed in the sending of this message, however, a significant number of electrons were terribly inconvenienced.

Ceriel Jacobs

1:58 p.m.

New subject: spec

Andre Merzky wrote:

...

Please note that the wildcard issue is REALLY the last known issue - as soon as we have that settled the spec is out. So if you have discovered anything else... ;)

Well, it is not as if I have a supply of them and just report a couple of them whenever I feel like it :-) But, I just noticed another one: in the monitoring package, metric class, there is a sentence that states that The metric |type|s are the same as defined for attributes, and the metric |value|s are to be formatted as described for the respective attribute types. But what about the Trigger type? This is not mentioned as one of the attribute types. In fact, it is never really specified what a 'Trigger' is. Cheers, Ceriel

Andre Merzky

1:58 p.m.

New subject: spec

Quoting [Ceriel Jacobs] (Dec 13 2007):

...

Andre Merzky wrote:

...
Please note that the wildcard issue is REALLY the last known issue - as soon as we have that settled the spec is out. So if you have discovered anything else... ;)

Well, it is not as if I have a supply of them and just report a couple of them whenever I feel like it :-)

So, you mean, your private list of issues is indeed finite? Good news!! :-)

...

But, I just noticed another one: in the monitoring package, metric class, there is a sentence that states that

The metric |type|s are the same as defined for attributes, and the metric |value|s are to be formatted as described for the respective attribute types.

Ah, well, finite + n ... ;-)

...

But what about the Trigger type? This is not mentioned as one of the attribute types. In fact, it is never really specified what a 'Trigger' is.

Ah, good one, will fix that. In short: 'Checkpoint' would be a good example for a trigger. Its not boolean: you never notify an application 'Do no checkpoint now!' - but you use the metric to trigger a checkpoint 'Checkpoint now!'. I guess this is equivalent to an attribute who's existence or absence carries semantic meaning (e.g. 'Checkpointable'). That, however, maps pretty nicely to boolean attributes in all cases I can think of ('Checkpointable=yes'). Does that make sense? Thanks, Andre.

...

Cheers, Ceriel

-- No trees were destroyed in the sending of this message, however, a significant number of electrons were terribly inconvenienced.

Ceriel Jacobs

2:32 p.m.

New subject: spec

Andre Merzky wrote:

...

Quoting [Ceriel Jacobs] (Dec 13 2007):

...

...
But, I just noticed another one: in the monitoring package, metric class, there is a sentence that states that

The metric |type|s are the same as defined for attributes, and the metric |value|s are to be formatted as described for the respective attribute types.

Ah, well, finite + n ... ;-)

...
But what about the Trigger type? This is not mentioned as one of the attribute types. In fact, it is never really specified what a 'Trigger' is.

Ah, good one, will fix that. In short: 'Checkpoint' would be a good example for a trigger. Its not boolean: you never notify an application 'Do no checkpoint now!' - but you use the metric to trigger a checkpoint 'Checkpoint now!'. I guess this is equivalent to an attribute who's existence or absence carries semantic meaning (e.g. 'Checkpointable'). That, however, maps pretty nicely to boolean attributes in all cases I can think of ('Checkpointable=yes').

I think a Trigger is a metric that has no value (or just a single value) but that may get fired (read: the registered callbacks are called). This indeed maps nicely to boolean attributes, with a "fire" when the value changes to "true". But I also think the value is immaterial. Maybe the getAttribute("Value") method should throw an exception when the metric is of type "Trigger"? Then you don't have to think about the attribute type of a "Trigger". Ceriel

Andre Merzky

2:26 p.m.

New subject: spec

Quoting [Ceriel Jacobs] (Dec 13 2007):

...

Andre Merzky wrote:

...
Quoting [Ceriel Jacobs] (Dec 13 2007):

...
...
But, I just noticed another one: in the monitoring package, metric class, there is a sentence that states that

The metric |type|s are the same as defined for attributes, and the metric |value|s are to be formatted as described for the respective attribute types.

Ah, well, finite + n ... ;-)

...
But what about the Trigger type? This is not mentioned as one of the attribute types. In fact, it is never really specified what a 'Trigger' is.

Ah, good one, will fix that. In short: 'Checkpoint' would be a good example for a trigger. Its not boolean: you never notify an application 'Do no checkpoint now!' - but you use the metric to trigger a checkpoint 'Checkpoint now!'. I guess this is equivalent to an attribute who's existence or absence carries semantic meaning (e.g. 'Checkpointable'). That, however, maps pretty nicely to boolean attributes in all cases I can think of ('Checkpointable=yes').

I think a Trigger is a metric that has no value (or just a single value) but that may get fired (read: the registered callbacks are called). This indeed maps nicely to boolean attributes, with a "fire" when the value changes to "true". But I also think the value is immaterial. Maybe the getAttribute("Value") method should throw an exception when the metric is of type "Trigger"? Then you don't have to think about the attribute type of a "Trigger".

Good point, the actual value is indeed immaterial. Will do that, thanks! Andre.

...

Ceriel

-- No trees were destroyed in the sending of this message, however, a significant number of electrons were terribly inconvenienced.

Thilo Kielmann

11:06 a.m.

New subject: spec

On Wed, Dec 12, 2007 at 09:23:41PM +0100, Andre Merzky wrote:

...

Options:

(B) Add versions of the methods copy, link, move, and remove from ns_directory that accept a string parameter describing a pathname, relative to the CWD, (possibly) containing POSIX-style shell wildcards

Correction 1: also for permission_allow/permission_deny Correction 2: remove the limitation to relative paths

(C) Allow '*' as wildcard in URLs (in the path element part). Add an expand method for more elaborate wildcard expansion, which accepts a string and returns a list of URLs. Looping is in user space.

...

I am not sure if we came to a closure on 'Correction 2', but I think that Ceriels answer ("if you need abs paths, then do a 'cd()' to somewhere where you can use rel paths") is a rather inelegant workaround for a limitation which is not too well motivated, IMHO.

The motivation for relative paths is: an absoute path immediately becomes a URL, (well, a URL-shaped string) where we cannot have wildcards. This would LOOK LIKE allowing both strings and URLs for abolute paths, utmost confusing to the user (because of the subtle differences in acceptable syntax although both LOOK pretty much identical). It also feels like feature bloat. I still consider 'C' a bloody hack because even the '*' violates URL syntax according to the RFC.

...

So, please raise your voice now or hold your peace for evermore!

Raised ;-) Thilo

...

PS.: sorry for this biased mail ;-) Sorry for this biased reply ;-) -- Thilo Kielmann http://www.cs.vu.nl/~kielmann/

Andre Merzky

1:52 p.m.

New subject: spec

Quoting [Thilo Kielmann] (Dec 13 2007):

...

On Wed, Dec 12, 2007 at 09:23:41PM +0100, Andre Merzky wrote:

...
Options:

(B) Add versions of the methods copy, link, move, and remove from ns_directory that accept a string parameter describing a pathname, relative to the CWD, (possibly) containing POSIX-style shell wildcards

Correction 1: also for permission_allow/permission_deny Correction 2: remove the limitation to relative paths

(C) Allow '*' as wildcard in URLs (in the path element part). Add an expand method for more elaborate wildcard expansion, which accepts a string and returns a list of URLs. Looping is in user space.

...
I am not sure if we came to a closure on 'Correction 2', but I think that Ceriels answer ("if you need abs paths, then do a 'cd()' to somewhere where you can use rel paths") is a rather inelegant workaround for a limitation which is not too well motivated, IMHO.

The motivation for relative paths is: an absoute path immediately becomes a URL, (well, a URL-shaped string)

Why is that? tmp/data.bin <-- relative /tmp/data.bin <-- absolute http://localhost/tmp/data.bin <-- relative http://localhost//tmp/data.bin <-- absolute There is no real difference here in appearance, apart from the slash leading the path element of the URL. I say its confusing to allow one form but not the other.

...

where we cannot have wildcards. This would LOOK LIKE allowing both strings and URLs for abolute paths, utmost confusing to the user (because of the subtle differences in acceptable syntax although both LOOK pretty much identical). It also feels like feature bloat.

I still consider 'C' a bloody hack because even the '*' violates URL syntax according to the RFC.

No, it does not. '*' is explicitely a valid character which can be used w/o encoding in the path element of any URL (in fact in any other element, too). See second to last paragraph in 2.2 on http://www.ietf.org/rfc/rfc1738.txt: Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL. Sorry Thilo, but I think thats pretty clear. The RFC does not prescribe the interpretation of '*' as wildcard, thats true. In several places the RFC _does_ describe character semantics, but that does not affect '*': Many URL schemes reserve certain characters for a special meaning: their appearance in the scheme-specific part of the URL has a designated semantics. If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme. No other characters may be reserved within a scheme. These characters are used in various schemes to distinguish different elements of the URL, e.g. query elements in HTTP URLs. Similarly, the RFC does not prescribe the use of '.' as path element to refer to the current directory - but '.' is a safe character, so URL schemes can safely use '.' that way. Same arguments hold for '*', IMHO.

...

...
So, please raise your voice now or hold your peace for evermore!

Raised ;-)

Back to square one we are, young Dutchman ;-)

...

Thilo

...
PS.: sorry for this biased mail ;-) Sorry for this biased reply ;-)

hehe :-) Cheers, Andre. -- No trees were destroyed in the sending of this message, however, a significant number of electrons were terribly inconvenienced.

Ceriel Jacobs

2:55 p.m.

New subject: spec

Andre Merzky wrote:

...

Quoting [Thilo Kielmann] (Dec 13 2007):

...

...
The motivation for relative paths is: an absoute path immediately becomes a URL, (well, a URL-shaped string)

Why is that?

tmp/data.bin <-- relative /tmp/data.bin <-- absolute

http://localhost/tmp/data.bin <-- relative

Well, according to RFC 1738 it is, but RFC 1738 has been superseeded by RFC 2396, which in turn has been superseeded by RFC 3986. Both of these consider the above an absolute URI, with an absolute path "/tmp/data.bin".

...

http://localhost//tmp/data.bin <-- absolute

Yes, but the second '//' is equivalent to '/'. And, the idea is: "relative path", not "absolute URI with relative path".

...

There is no real difference here in appearance, apart from the slash leading the path element of the URL. I say its confusing to allow one form but not the other.

Thilo's suggestion allows neither. He only wants to allow relative \emph{path}s, which I agree is the cleanest solution. Cheers, Ceriel

Andre Merzky

9:48 p.m.

New subject: spec

Quoting [Ceriel Jacobs] (Dec 13 2007):

...

Andre Merzky wrote:

...
Quoting [Thilo Kielmann] (Dec 13 2007):

...
...
The motivation for relative paths is: an absoute path immediately becomes a URL, (well, a URL-shaped string)

Why is that?

tmp/data.bin <-- relative /tmp/data.bin <-- absolute

http://localhost/tmp/data.bin <-- relative

Well, according to RFC 1738 it is, but RFC 1738 has been superseeded by RFC 2396, which in turn has been superseeded by RFC 3986. Both of these consider the above an absolute URI, with an absolute path "/tmp/data.bin".

Uhm, how is a relative path then expressed? I tried to read that from the document but couldn't... Or is that impossible in an absolute URI? (I take that this is an URI where scheme and authority are present?) Thanks, Andre.

...

...
http://localhost//tmp/data.bin <-- absolute

Yes, but the second '//' is equivalent to '/'.

And, the idea is: "relative path", not "absolute URI with relative path".

...
There is no real difference here in appearance, apart from the slash leading the path element of the URL. I say its confusing to allow one form but not the other.

Thilo's suggestion allows neither. He only wants to allow relative \emph{path}s, which I agree is the cleanest solution.

Cheers, Ceriel

-- No trees were destroyed in the sending of this message, however, a significant number of electrons were terribly inconvenienced.

Ceriel Jacobs

14 Dec 14 Dec

7:31 a.m.

New subject: spec

Andre Merzky wrote:

...

Quoting [Ceriel Jacobs] (Dec 13 2007):

...
Andre Merzky wrote:

...
Quoting [Thilo Kielmann] (Dec 13 2007):

...
...
The motivation for relative paths is: an absoute path immediately becomes a URL, (well, a URL-shaped string) Why is that?

tmp/data.bin <-- relative /tmp/data.bin <-- absolute

http://localhost/tmp/data.bin <-- relative Well, according to RFC 1738 it is, but RFC 1738 has been superseeded by RFC 2396, which in turn has been superseeded by RFC 3986. Both of these consider the above an absolute URI, with an absolute path "/tmp/data.bin".

Uhm, how is a relative path then expressed? I tried to read that from the document but couldn't... Or is that impossible in an absolute URI? (I take that this is an URI where scheme and authority are present?)

I guess the idea is that a relative path is always with respect to another URI. An absolute URI is one with a scheme. So, if it has a scheme, it is absolute and not relative to some other URI. So, indeed, you cannot specify a relative path in an absolute URI. Note that a relative URI can represent an absolute path but still be relative to another URI with respect to scheme and authority. Cheers, Ceriel

Thilo Kielmann

15 Dec 15 Dec

2 p.m.

New subject: spec

I've spent some more time studying RFC's and thinking about this thread of discussion. I hope we can at least agree upon our design goals: whatever we specify into SAGA has to be "simple to use", and as such has to "do the obvious thing" in its respective context. The aim of the exercise is to provide "POSIX shell wild cards" for files (well, actually for name space entries.) While trying to do so, we came across two sub topics: a) wild cards possibly in URLs b) wild cards in strings About wild cards in URLs. The valid RFC for URLs is RFC3986 "Uniform Resource Identifier (URI): Generic Syntax" It says (Introduction, second paragraph): "This document obsoletes [RFC2396], which merged "Uniform Resource Locators" [RFC1738] and "Relative Uniform Resource Locators" [RFC1808] in order to define a single, generic syntax for all URIs. It obsoletes [RFC2732], which introduced syntax for an IPv6 address. It excludes portions of RFC 1738 that defined the specific syntax of individual URI schemes; those portions will be updated as separate documents. ..." I have checked IETF's site with RFCs and could not find any RFC documents that would desribe new schemes for "file", "ftp", or "http". This means, RFC3986 describes the general URI syntax, while the relevant URL types for us (file, ftp, http) are still valid as described in RFC1738. Having said this, I made the following two observations: 1. RFC3986 says (Introduction, first paragraph, first sentence): "A Uniform Resource Identifier (URI) provides a simple and extensible means for identifying a resource." I'd like to put the emphasis here on "a resource", rather than "a resource or a group of resources". Besides, RFC3986 does NOT contain the terms "wild card", nor "wildcard", not even "pattern". 2. In RFC1738, the character '*' is not required to be used in escape sequences. (While other special characters from POSIX shell wild cards are). In a previous discussion we had already ruled out such wild card characters that would require to be escaped as too complicated and non-obvious to use. However, the URL schemes for "file", "ftp", and "http" do not define any wild card patterns. (Only the "news" schema uses the '*' character as a simple wild card. But this is not relevant for us.) From both observations I am drawing the conclusion that we MUST NOT use any wild cards, not even the '*' character in URLs. This is because adding a wild-card semantics to these URLs would deviate from both the definitions in RFC3986 and RFC1738, and also from "common use" of URLs, namely for "identifying a single resource." This leaves us with option b) "wild cards in strings". We do have consensus about using wild cards for name-space entries in strings. More specifically: in path elements, expressed in strings. However, we do not yet fully agree on the proposal to limit these to path elements that are relative to the name space (read: directory) on which the wild-card enabled functions operate. Both camps argue with simplicity for the user. The argument AGAINST restricting strings to relative paths is the possible confusion of parts of syntactically valid paths (absolute ones) not beeing valid by the semantical restriction to relative paths. The argument FOR restricting strings to relative paths is that absolute paths coincide with URLs and that this would give a second (string) representation for URLs, however with wild cards allowed (see discussion above), having two representations for (almost) the same thing is considered confusing for the user. Argument by Andre:

...

...
...
...
tmp/data.bin <-- relative /tmp/data.bin <-- absolute

Well, I would say that this "absolute" path still is relative, namely to the base URI "file://localhost/". Absolute paths on the same machine form a corner case in grids. Really "absolute" paths identify the machine on which a file/directory resides. As pointed out by Ceriel, URI's according to RFC 3986 always contain absolute paths, especially after "normalization" has been applied. This means, URI's can not hold relative paths, not in the general case. (And we are asking for problems if we require implementations to NEVER normalize a URI...) This argument goes like: A string with an absolute path coincides with a URI, where wild cards are not allowed/desirable. A string with a relative path is "sufficiently different" from a URL such that it is obvious for the user where wild cards are allowed and where they are not (in URLs). If we agree to restrict strings to relative paths, which use cases are we missing? What can NOT be expresed then? We can still do the following: saga::directory dir(url); dir.copy("sub/*/bla[1-9].doc",target-url); Which can be, for the running aplication, a third-party copy, honoring wild cards. I can currently not think of any use case where it would be a problem to first create the dir object first (and instead do the same copy with two URLs directly, but then on which directory object???) To summarize: I hereby propose to limit the use of wild cards to strings, and in there to relative paths, because this: - is sufficiently different from absolute URLs to avoid confusion - is sufficiently expressive Regards, Thilo -- Thilo Kielmann http://www.cs.vu.nl/~kielmann/

Ceriel Jacobs

16 Dec 16 Dec

7:27 p.m.

New subject: spec

Thilo Kielmann wrote:

...

If we agree to restrict strings to relative paths, which use cases are we missing? What can NOT be expresed then? We can still do the following:

saga::directory dir(url); dir.copy("sub/*/bla[1-9].doc",target-url);

Which can be, for the running aplication, a third-party copy, honoring wild cards.

Agreed.

...

I can currently not think of any use case where it would be a problem to first create the dir object first (and instead do the same copy with two URLs directly, but then on which directory object???)

There were no use cases that required wildcards anyway. I asked Andre some time ago, and he said that people requested the feature, but that it was not explicit in any of the use cases.

...

To summarize:

I hereby propose to limit the use of wild cards to strings, and in there to relative paths, because this: - is sufficiently different from absolute URLs to avoid confusion - is sufficiently expressive

A very thorough and well-put discussion! I agree. Ceriel

Andre Merzky

7:35 p.m.

New subject: spec

Good arguments. I don't agree with some points as you know, but your line of argumentation makes sense. So, lets do that finally. Big thanks, Andre. Quoting [Thilo Kielmann] (Dec 15 2007):

...

From: Thilo Kielmann <kielmann@cs.vu.nl> To: Ceriel Jacobs <ceriel@cs.vu.nl> Cc: Andre Merzky <andre@merzky.net>, Thilo Kielmann <kielmann@cs.vu.nl>, Shantenu Jha <sjha@cct.lsu.edu>, Hartmut Kaiser <hartmut.kaiser@gmail.com>, SAGA RG <saga-rg@ogf.org> Subject: Re: spec

I've spent some more time studying RFC's and thinking about this thread of discussion.

I hope we can at least agree upon our design goals: whatever we specify into SAGA has to be "simple to use", and as such has to "do the obvious thing" in its respective context.

The aim of the exercise is to provide "POSIX shell wild cards" for files (well, actually for name space entries.) While trying to do so, we came across two sub topics:

a) wild cards possibly in URLs b) wild cards in strings

About wild cards in URLs.

The valid RFC for URLs is RFC3986 "Uniform Resource Identifier (URI): Generic Syntax"

It says (Introduction, second paragraph):

"This document obsoletes [RFC2396], which merged "Uniform Resource Locators" [RFC1738] and "Relative Uniform Resource Locators" [RFC1808] in order to define a single, generic syntax for all URIs. It obsoletes [RFC2732], which introduced syntax for an IPv6 address. It excludes portions of RFC 1738 that defined the specific syntax of individual URI schemes; those portions will be updated as separate documents. ..."

I have checked IETF's site with RFCs and could not find any RFC documents that would desribe new schemes for "file", "ftp", or "http". This means, RFC3986 describes the general URI syntax, while the relevant URL types for us (file, ftp, http) are still valid as described in RFC1738.

Having said this, I made the following two observations:

1. RFC3986 says (Introduction, first paragraph, first sentence): "A Uniform Resource Identifier (URI) provides a simple and extensible means for identifying a resource." I'd like to put the emphasis here on "a resource", rather than "a resource or a group of resources". Besides, RFC3986 does NOT contain the terms "wild card", nor "wildcard", not even "pattern".

2. In RFC1738, the character '*' is not required to be used in escape sequences. (While other special characters from POSIX shell wild cards are). In a previous discussion we had already ruled out such wild card characters that would require to be escaped as too complicated and non-obvious to use. However, the URL schemes for "file", "ftp", and "http" do not define any wild card patterns. (Only the "news" schema uses the '*' character as a simple wild card. But this is not relevant for us.)

From both observations I am drawing the conclusion that we MUST NOT use any wild cards, not even the '*' character in URLs. This is because adding a wild-card semantics to these URLs would deviate from both the definitions in RFC3986 and RFC1738, and also from "common use" of URLs, namely for "identifying a single resource."

This leaves us with option b) "wild cards in strings".

We do have consensus about using wild cards for name-space entries in strings. More specifically: in path elements, expressed in strings. However, we do not yet fully agree on the proposal to limit these to path elements that are relative to the name space (read: directory) on which the wild-card enabled functions operate. Both camps argue with simplicity for the user.

The argument AGAINST restricting strings to relative paths is the possible confusion of parts of syntactically valid paths (absolute ones) not beeing valid by the semantical restriction to relative paths.

The argument FOR restricting strings to relative paths is that absolute paths coincide with URLs and that this would give a second (string) representation for URLs, however with wild cards allowed (see discussion above), having two representations for (almost) the same thing is considered confusing for the user.

Argument by Andre:

...
...
...
...
tmp/data.bin <-- relative /tmp/data.bin <-- absolute

Well, I would say that this "absolute" path still is relative, namely to the base URI "file://localhost/". Absolute paths on the same machine form a corner case in grids. Really "absolute" paths identify the machine on which a file/directory resides.

As pointed out by Ceriel, URI's according to RFC 3986 always contain absolute paths, especially after "normalization" has been applied. This means, URI's can not hold relative paths, not in the general case. (And we are asking for problems if we require implementations to NEVER normalize a URI...)

This argument goes like: A string with an absolute path coincides with a URI, where wild cards are not allowed/desirable. A string with a relative path is "sufficiently different" from a URL such that it is obvious for the user where wild cards are allowed and where they are not (in URLs).

If we agree to restrict strings to relative paths, which use cases are we missing? What can NOT be expresed then? We can still do the following:

saga::directory dir(url); dir.copy("sub/*/bla[1-9].doc",target-url);

Which can be, for the running aplication, a third-party copy, honoring wild cards.

I can currently not think of any use case where it would be a problem to first create the dir object first (and instead do the same copy with two URLs directly, but then on which directory object???)

To summarize:

I hereby propose to limit the use of wild cards to strings, and in there to relative paths, because this: - is sufficiently different from absolute URLs to avoid confusion - is sufficiently expressive

Regards,

Thilo -- No trees were destroyed in the sending of this message, however, a significant number of electrons were terribly inconvenienced.

Ceriel Jacobs

14 Dec 14 Dec

9:07 a.m.

New subject: spec

Hi, here is another glitch: the Job.checkpoint method has as Pre and Post conditions that the Job is in Running state. However, the Notes say that an IncorrectState exception is thrown when the job is not in Running or Suspended state. Can you checkpoint a job in suspended state? Ceriel

Andre Merzky

9:11 a.m.

New subject: spec

Ugh, tough one! For system level checkpoint this should work in Suspended state as well, but not for application level CP obviously. But we don't distinguish between them... And what state is the job if it resumes? Safest might be to not allow CP in suspended state then, as semantics is difficult to nail down for the various backends? Andre Quoting [Ceriel Jacobs] (Dec 14 2007):

...

From: Ceriel Jacobs <ceriel@cs.vu.nl> To: Andre Merzky <andre@merzky.net> CC: Thilo Kielmann <kielmann@cs.vu.nl>, Shantenu Jha <sjha@cct.lsu.edu>, Hartmut Kaiser <hartmut.kaiser@gmail.com>, SAGA RG <saga-rg@ogf.org> Subject: Re: spec

Hi,

here is another glitch:

the Job.checkpoint method has as Pre and Post conditions that the Job is in Running state. However, the Notes say that an IncorrectState exception is thrown when the job is not in Running or Suspended state. Can you checkpoint a job in suspended state?

Ceriel

-- No trees were destroyed in the sending of this message, however, a significant number of electrons were terribly inconvenienced.

6407

Age (days ago)

6411

Last active (days ago)

List overview

Download

16 comments

3 participants

participants (3)

Andre Merzky
Ceriel Jacobs
Thilo Kielmann

Re: [SAGA-RG] spec

tags

participants (3)