Re: [ogsa-wg] RNS critique

26 Jul 2005

      Hello all,

Thank you Andrew for your comments and points of clarification.  I do hope
to incorporate the discussed changes in the RNS draft within the next
couple of days.

Please see my individuals comments in response to your post below:

-------------------------
...
High level comments:
1) The basic directory function is to provide a mapping handle
f(“string”), i.e., to map strings to some form of handle.
Thus, in section 1.1.
Why do we need four different types of junctions? Should we not have
one type of junction: mainly to an EPR. Thus a directory would map a
string to an EPR.
EPR f(“string”);
Instead there are four “types” of junctions:
            EPR’s
            Virtualized reference – either contains an EPR or a URL
that points to some other service
            Referrals – point to another RNS service. Q: could this
not be modeled as an EPR?
            Alias – points to another “entry” within the “same” RNS
service. This seems to imply a container like model – more on this later.
I suggest that we need only ONE type of junction – an EPR. That will
simplify client coding – and the model significantly.
You are correct in asserting that the “basic” directory function is mapping
a string to some identifier.  That is only the “basic” function, a
hierarchical namespace service that is capable of scaling, delegation, and
federation of namespace services must have some means of junctioning
namespaces.  Although it is arguably unnecessary for the service consumer
to be aware of referral junctions in normal query related operations, it is
necessary for management operations; particularly the create operation
which allows administrative grafting of namespaces by the use of referral
entries.

First, the “virtualized reference” really only makes sense if the abstract
name resolution service is tightly coupled with the namespace service.
Since we are factoring the resolver service out, I am also eliminating the
“virtualized reference” junction.

As for the “alias”, this type of junction is really nothing more than a
“referral junction”.  After discussing this with the GFS-WG, they feel
strongly that “aliases” should be a basic feature of the service, however,
agree that aliases (with hardlink “like” behavior) should be optional.

So we are ultimately left with two “basic” junctions: EPR Junction and
Referral Junction.  A referral junction is necessary since the service is
intended to accommodate requests that issue paths that span multiple
repositories.  It is important to differentiate between the treatment of a
referral junction and an EPR junction; as noted in section 1.4.6
(RNSJunctionFault) when a path extends beyond the leaf node of an EPR
junction the service targeted is unknown and therefore may not be “assumed”
to be a delegate namespace service.  For this reason, junction types are
necessary to enable discriminatory treatment.
...
2) Implied model – Repositories
There is a notion in the spec of “same service instance” that in
conversations has also been called a “repository”. The basic idea is
that an RNS service may “contain” a set of directories – and that
the service has a root. Thus junction types distinguish, for
example, between internal and external things that they point to.
Thus an RNS service is a rooted tree – that may point at the leaves
to other rooted trees. So, first of all – it is implied. If we’re
going to have that model it should be right up front and discussed.
This strongly casted description of “rooted trees” is only realized when
federating namespace services, which is described accordingly in section 2.
The core focus of the specification should be the service itself.
...
What is a repository? What are it’s special port types if any, etc.
The term “repository” simply refers to the “container of stored namespace
entries”.  Incidentally, any given RNS service provider only “services”
names that are contained within its corresponding repository; there is only
one repository per RNS provider.  You can easily consider the repository
the backend datastore or database where names, mappings, and related
metadata are persistently stored.
...
Second, I think that it is not the right way to think about the
problem. Directories should be the resources – not collections of
directories. If a particular implementation chooses to multiplex a
large number of logically independent directories in a single
container – great – we will certainly do that too. The issue is what
is the model. I feel fairly strongly about this.
This is not clear to me.  The statement “directories should be the
resources - not collections of directories”, what are referring to with
regard to RNS?  RNS describes “virtual” directories as namespace entries
that enable hierarchal relationship.  They enable partitioning, grouping,
categorizing, etc., however how they are used and what they are used for is
not mandated.  Virtual directories only facilitate directory structure;
they do not hinder, restrict, or otherwise alter how namespaces are
junctioned together (or how you can refer to another namespace).  So what
is the “issue” with the “model”?
...
Links “into” other RNS servers:1.1.2.4 “Alias Junction” is
restricted to pointing to entries in the same repository. I think
they should be able to point to anything – including directories in
“other” repositories.
First an “alias” is intended to facilitate hardlink “like” behavior; the
alias junction may need to be described further to communicate this point.
Consequently, it far to impractical to allow aliases to refer to namespace
entries in other repositories if they indeed behave like hardlinks; if you
simply want to “refer” to any namespace entry (not just a root node) in
another repository you would use a “referral junction”.
...
It has been claimed that using the EPR of the
“repository” and a path you can get that effect.
As stated above, you would use a referral junction, not an EPR junction.
...
However, what if
the path changes in the other container? My link would break – even
if the directory itself still exists.
Correct, this is why some have expressed the desire to describe an optional
“alias” type that would function like a hardlink; understanding the
practical constraints of hardlinks this is only feasible within a single
repository.
...
3) Full path names
In ANY directory system lookup really takes two parameters – a
“root” at which to start, and a path. Often the “root” is implied,
or is at some well-known location. RNS – as written, implies that
all lookups are based on full paths with an implied, unspecified root.
This is not true.  The root of each RNS is specified, since the only way to
communicate with an RNS service provider is to first establish an EPR to
it.  “Full paths” are then interpreted as the path from the root of the
current operating service provider.
...
Assuming that full paths are to be used on all lookups, the
potential for both hot spots AND single point of failure are clear.
First we need to be careful distinguishing between “full paths” and
“absolute paths”, please see section 1.1.3.  In the comment just above, I
used the term “full path” to consistently signify the path from the root of
a given repository.

I may need to clarify this a bit in the draft, but RNS operations must be
able to handle “absolute paths”.  If the absolute path extends beyond the
current repository (supposing a federation of namespace services with
delegated namespaces), then RNS must be able to gracefully refer the
client/application to the targeted namespace service provider (see section
2.2).  If, however, the absolute path can be resolved within the current
repository (which is expected to be the majority of cases), then RNS will
simply respond with the appropriate answer.  In practice RNS operations do
effectively operate use “full paths”, but must be able to handle “absolute
paths” (utilizing referral messages as the redirecting mechanism).  The
obvious optimization for an application/client is to simply avoid sending
resolution requests to the upper-level RNS service providers if you are
working in a lower-level of a federated namespace.  For example, if I am
resolving names in directory “/a/b/c”, and “b” is a referral to a delegate
RNS service provider, then I would simply continue my interaction with the
RNS service provider that is serving “b” (see section 2.2.1.1).
...
In conversation with Manual he mentioned that his clients cache
intermediate parts of the tree in the sense that “/foo/bar/d1” as a
prefix leads to a particular RNS service, and then use that info to
not always traverse the tree. Besides the obvious implementation
challenges of cache consistency when the tree is changing (a problem
we certainly had/have in Legion) there is the modeling issue. If we
expect clients to do that – then perhaps the
architecture/specification should accommodate that and say that all
lookups are relative path lookups with respect to some “root”. The
root could be a true “root”, or interior node in a tree, which is
itself a “root” of the subtree it defines.
As stated above, this is described in the draft.  The implication of
caching here is obscure in light of my description of the EPR that is used
to communicate with the most appropriate RNS service provider.  What is the
modeling issue that you refer to here?
...
4) Resolve and file system profile. We discussed these on the last
ogsa call, my understanding is that they are going out.
Correct, we have agreed to remove these two items understanding the
following:
1)    The Grid File System naming profile will be factored out into an
independent document under the GFS-WG.
2)    The resolver work will be discussed and an attempt to avoid duplicate
effort will be pursued.  Additionally, the OGSA-Naming WG will not simply
disregard this existing work.
...
5) Iterators. OGSA-Data-WG has discussed iterators in a more general
way, e.g., on data base query results etc., I think that whatever is
done in RNS should be consistent with whatever is done in OGSA-Data
(note – consistency can happen either way).
RNS should be consistent, however OGSA-Data has not had any material
discussion along these lines.  Since an iterator is necessary in RNS and
can always be modified, if necessary, later we are going to leave it as is
for this first submission.
...
Medium level comments:
S 1.1
“In all cases, junctions are capable of maintaining a list of
references (EPRs/URLs) per entry, that is a single junction my
render several available EPRs, each of which represent replicas,
copies of the same resource, or operationally identical services. “
Why? Are you saying that replication issues and semantics should be
dealt with in the directory structure? Or are you saying that
directories are not “sets” in the sense of only one entry – but
rather “multi-sets” in the sense that one string can map to multiple
things. If the later – what are the implied semantics.
I think it may be safer to keep them as sets.
We are not in attempting to deal with replication issues and semantics,
however there is no apparent reason why we should not facilitate a
“one-to-many” mapping at the entry level.

I am not sure what you are trying to say regarding “sets” and their
relationship to “directories”, there is no relationship between what is
stated in section 1.1 and the concept of directories.  Section 1.1 is
attempting to describe the capability of a single junction.  Each junction
is represented as a name in the namespace, however maintains a mapping to
“one or more” targets.  The expounded statement regarding “replicas, copies
of the same resource, or operationally identical services” is attempting to
describe the necessary requirement of ensuring that all of the targets of a
single junction refer to “the same thing”.  For example, junction “foo” may
map to a resource named “bar”, with one in San Jose and one in New York.
The resource “bar” must be the same in all locations to ensure that
“/a/b/c/foo” renders effectively the same resource regardless of which
target EPR is selected.

foo  =>  http://sanjose.abc.com/bar
     =>  http://newyork.abc.com/bar

Thank you for your time in review and comments.

Best regards,
Manuel Pereira
===============================
IBM Almaden Research Center
===============================
650 Harry Road, San Jose, CA 95120
1-408-927-1935  [T/L 457]
Pager: 800-946-4646 1492425
http://mvp3.almaden.ibm.com
mpereira@us.ibm.com