All,
Here are some of my comments on the RNS document. I jammed
my finger yesterday in basketball, so typing is painful. Therefore, I’m
limiting my comments to the most significant. Thank you to the RNS team for all
of their work.
Andrew
High level comments:
1) The basic directory function is to provide a mapping handle
f(“string”), i.e., to map strings to some form of handle.
Thus, in section 1.1.
Why do we need four different types of junctions? Should we
not have one type of junction: mainly to an EPR. Thus a directory would map a
string to an EPR.
EPR f(“string”);
Instead there are four “types” of junctions:
EPR’s
Virtualized
reference – either contains an EPR or a URL that points to some other
service
Referrals
– point to another RNS service. Q: could this not be modeled as an EPR?
Alias
– points to another “entry” within the “same” RNS
service. This seems to imply a container like model – more on this later.
I suggest that we need only ONE type of junction – an EPR.
That will simplify client coding – and the model significantly.
2) Implied model – Repositories
There is a notion in the spec of “same service
instance” that in conversations has also been called a “repository”.
The basic idea is that an RNS service may “contain” a set of
directories – and that the service has a root. Thus junction types
distinguish, for example, between internal and external things that they point
to. Thus an RNS service is a rooted tree – that may point at the
leaves to other rooted trees. So, first of all – it is implied. If we’re
going to have that model it should be right up front and discussed. What is a
repository? What are it’s special port types if any, etc. Second, I think
that it is not the right way to think about the problem. Directories should be
the resources – not collections of directories. If a particular
implementation chooses to multiplex a large number of logically independent
directories in a single container – great – we will certainly do
that too. The issue is what is the model. I feel fairly strongly about this.
Links “into” other RNS servers:1.1.2.4 “Alias
Junction” is restricted to pointing to entries in the same repository. I
think they should be able to point to anything – including directories in
“other” repositories. It has been claimed that using the EPR of the
“repository” and a path you can get that effect. However, what if
the path changes in the other container? My link would break – even if
the directory itself still exists.
3) Full path names
In ANY directory system lookup really takes two parameters –
a “root” at which to start, and a path. Often the “root”
is implied, or is at some well-known location. RNS – as written, implies
that all lookups are based on full paths with an implied, unspecified root.
Assuming that full paths are to be used on all lookups, the
potential for both hot spots AND single point of failure are clear.
In conversation with Manual he mentioned that his clients
cache intermediate parts of the tree in the sense that “/foo/bar/d1”
as a prefix leads to a particular RNS service, and then use that info to not
always traverse the tree. Besides the obvious implementation challenges of
cache consistency when the tree is changing (a problem we certainly had/have in
Legion) there is the modeling issue. If we expect clients to do that –
then perhaps the architecture/specification should accommodate that and say
that all lookups are relative path lookups with respect to some “root”.
The root could be a true “root”, or interior node in a tree, which
is itself a “root” of the subtree it defines.
4) Resolve and file system profile. We discussed these on
the last ogsa call, my understanding is that they are going out.
5) Iterators. OGSA-Data-WG has discussed iterators in a more
general way, e.g., on data base query results etc., I think that whatever is
done in RNS should be consistent with whatever is done in OGSA-Data (note –
consistency can happen either way).
Medium level comments:
S 1.1
“In all cases, junctions are capable of maintaining a list of
references (EPRs/URLs) per entry, that is a single junction my render several
available EPRs, each of which represent replicas, copies of the same resource,
or operationally identical services. “
Why? Are you saying that replication issues and semantics should be dealt
with in the directory structure? Or are you saying that directories are not “sets”
in the sense of only one entry – but rather “multi-sets” in
the sense that one string can map to multiple things. If the later – what
are the implied semantics.
I think it may be safer to keep them as sets.