
Hello all, Thank you Andrew for your comments and points of clarification. I do hope to incorporate the discussed changes in the RNS draft within the next couple of days. Please see my individuals comments in response to your post below: -------------------------
High level comments:
1) The basic directory function is to provide a mapping handle f(“string”), i.e., to map strings to some form of handle.
Thus, in section 1.1. Why do we need four different types of junctions? Should we not have one type of junction: mainly to an EPR. Thus a directory would map a string to an EPR. EPR f(“string”); Instead there are four “types” of junctions: EPR’s Virtualized reference – either contains an EPR or a URL that points to some other service Referrals – point to another RNS service. Q: could this not be modeled as an EPR? Alias – points to another “entry” within the “same” RNS service. This seems to imply a container like model – more on this later.
I suggest that we need only ONE type of junction – an EPR. That will simplify client coding – and the model significantly.
You are correct in asserting that the “basic” directory function is mapping a string to some identifier. That is only the “basic” function, a hierarchical namespace service that is capable of scaling, delegation, and federation of namespace services must have some means of junctioning namespaces. Although it is arguably unnecessary for the service consumer to be aware of referral junctions in normal query related operations, it is necessary for management operations; particularly the create operation which allows administrative grafting of namespaces by the use of referral entries. First, the “virtualized reference” really only makes sense if the abstract name resolution service is tightly coupled with the namespace service. Since we are factoring the resolver service out, I am also eliminating the “virtualized reference” junction. As for the “alias”, this type of junction is really nothing more than a “referral junction”. After discussing this with the GFS-WG, they feel strongly that “aliases” should be a basic feature of the service, however, agree that aliases (with hardlink “like” behavior) should be optional. So we are ultimately left with two “basic” junctions: EPR Junction and Referral Junction. A referral junction is necessary since the service is intended to accommodate requests that issue paths that span multiple repositories. It is important to differentiate between the treatment of a referral junction and an EPR junction; as noted in section 1.4.6 (RNSJunctionFault) when a path extends beyond the leaf node of an EPR junction the service targeted is unknown and therefore may not be “assumed” to be a delegate namespace service. For this reason, junction types are necessary to enable discriminatory treatment.
2) Implied model – Repositories There is a notion in the spec of “same service instance” that in conversations has also been called a “repository”. The basic idea is that an RNS service may “contain” a set of directories – and that the service has a root. Thus junction types distinguish, for example, between internal and external things that they point to. Thus an RNS service is a rooted tree – that may point at the leaves to other rooted trees. So, first of all – it is implied. If we’re going to have that model it should be right up front and discussed.
This strongly casted description of “rooted trees” is only realized when federating namespace services, which is described accordingly in section 2. The core focus of the specification should be the service itself.
What is a repository? What are it’s special port types if any, etc.
The term “repository” simply refers to the “container of stored namespace entries”. Incidentally, any given RNS service provider only “services” names that are contained within its corresponding repository; there is only one repository per RNS provider. You can easily consider the repository the backend datastore or database where names, mappings, and related metadata are persistently stored.
Second, I think that it is not the right way to think about the problem. Directories should be the resources – not collections of directories. If a particular implementation chooses to multiplex a large number of logically independent directories in a single container – great – we will certainly do that too. The issue is what is the model. I feel fairly strongly about this.
This is not clear to me. The statement “directories should be the resources - not collections of directories”, what are referring to with regard to RNS? RNS describes “virtual” directories as namespace entries that enable hierarchal relationship. They enable partitioning, grouping, categorizing, etc., however how they are used and what they are used for is not mandated. Virtual directories only facilitate directory structure; they do not hinder, restrict, or otherwise alter how namespaces are junctioned together (or how you can refer to another namespace). So what is the “issue” with the “model”?
Links “into” other RNS servers:1.1.2.4 “Alias Junction” is restricted to pointing to entries in the same repository. I think they should be able to point to anything – including directories in “other” repositories.
First an “alias” is intended to facilitate hardlink “like” behavior; the alias junction may need to be described further to communicate this point. Consequently, it far to impractical to allow aliases to refer to namespace entries in other repositories if they indeed behave like hardlinks; if you simply want to “refer” to any namespace entry (not just a root node) in another repository you would use a “referral junction”.
It has been claimed that using the EPR of the “repository” and a path you can get that effect.
As stated above, you would use a referral junction, not an EPR junction.
However, what if the path changes in the other container? My link would break – even if the directory itself still exists.
Correct, this is why some have expressed the desire to describe an optional “alias” type that would function like a hardlink; understanding the practical constraints of hardlinks this is only feasible within a single repository.
3) Full path names In ANY directory system lookup really takes two parameters – a “root” at which to start, and a path. Often the “root” is implied, or is at some well-known location. RNS – as written, implies that all lookups are based on full paths with an implied, unspecified root.
This is not true. The root of each RNS is specified, since the only way to communicate with an RNS service provider is to first establish an EPR to it. “Full paths” are then interpreted as the path from the root of the current operating service provider.
Assuming that full paths are to be used on all lookups, the potential for both hot spots AND single point of failure are clear.
First we need to be careful distinguishing between “full paths” and “absolute paths”, please see section 1.1.3. In the comment just above, I used the term “full path” to consistently signify the path from the root of a given repository. I may need to clarify this a bit in the draft, but RNS operations must be able to handle “absolute paths”. If the absolute path extends beyond the current repository (supposing a federation of namespace services with delegated namespaces), then RNS must be able to gracefully refer the client/application to the targeted namespace service provider (see section 2.2). If, however, the absolute path can be resolved within the current repository (which is expected to be the majority of cases), then RNS will simply respond with the appropriate answer. In practice RNS operations do effectively operate use “full paths”, but must be able to handle “absolute paths” (utilizing referral messages as the redirecting mechanism). The obvious optimization for an application/client is to simply avoid sending resolution requests to the upper-level RNS service providers if you are working in a lower-level of a federated namespace. For example, if I am resolving names in directory “/a/b/c”, and “b” is a referral to a delegate RNS service provider, then I would simply continue my interaction with the RNS service provider that is serving “b” (see section 2.2.1.1).
In conversation with Manual he mentioned that his clients cache intermediate parts of the tree in the sense that “/foo/bar/d1” as a prefix leads to a particular RNS service, and then use that info to not always traverse the tree. Besides the obvious implementation challenges of cache consistency when the tree is changing (a problem we certainly had/have in Legion) there is the modeling issue. If we expect clients to do that – then perhaps the architecture/specification should accommodate that and say that all lookups are relative path lookups with respect to some “root”. The root could be a true “root”, or interior node in a tree, which is itself a “root” of the subtree it defines.
As stated above, this is described in the draft. The implication of caching here is obscure in light of my description of the EPR that is used to communicate with the most appropriate RNS service provider. What is the modeling issue that you refer to here?
4) Resolve and file system profile. We discussed these on the last ogsa call, my understanding is that they are going out.
Correct, we have agreed to remove these two items understanding the following: 1) The Grid File System naming profile will be factored out into an independent document under the GFS-WG. 2) The resolver work will be discussed and an attempt to avoid duplicate effort will be pursued. Additionally, the OGSA-Naming WG will not simply disregard this existing work.
5) Iterators. OGSA-Data-WG has discussed iterators in a more general way, e.g., on data base query results etc., I think that whatever is done in RNS should be consistent with whatever is done in OGSA-Data (note – consistency can happen either way).
RNS should be consistent, however OGSA-Data has not had any material discussion along these lines. Since an iterator is necessary in RNS and can always be modified, if necessary, later we are going to leave it as is for this first submission.
Medium level comments:
S 1.1 “In all cases, junctions are capable of maintaining a list of references (EPRs/URLs) per entry, that is a single junction my render several available EPRs, each of which represent replicas, copies of the same resource, or operationally identical services. “
Why? Are you saying that replication issues and semantics should be dealt with in the directory structure? Or are you saying that directories are not “sets” in the sense of only one entry – but rather “multi-sets” in the sense that one string can map to multiple things. If the later – what are the implied semantics. I think it may be safer to keep them as sets.
We are not in attempting to deal with replication issues and semantics, however there is no apparent reason why we should not facilitate a “one-to-many” mapping at the entry level. I am not sure what you are trying to say regarding “sets” and their relationship to “directories”, there is no relationship between what is stated in section 1.1 and the concept of directories. Section 1.1 is attempting to describe the capability of a single junction. Each junction is represented as a name in the namespace, however maintains a mapping to “one or more” targets. The expounded statement regarding “replicas, copies of the same resource, or operationally identical services” is attempting to describe the necessary requirement of ensuring that all of the targets of a single junction refer to “the same thing”. For example, junction “foo” may map to a resource named “bar”, with one in San Jose and one in New York. The resource “bar” must be the same in all locations to ensure that “/a/b/c/foo” renders effectively the same resource regardless of which target EPR is selected. foo => http://sanjose.abc.com/bar => http://newyork.abc.com/bar Thank you for your time in review and comments. Best regards, Manuel Pereira =============================== IBM Almaden Research Center =============================== 650 Harry Road, San Jose, CA 95120 1-408-927-1935 [T/L 457] Pager: 800-946-4646 1492425 http://mvp3.almaden.ibm.com mpereira@us.ibm.com