Hi, Word of warning - the diagram I posted out was not normative in any way. It was my take of what was in the data architecture document at the time. The links may thus not be associated with the right terms. It was supposed to engender discussion ... like this ... I'll try to provide answers to some of your points below.
- DataSet is data externalised from the data resource: do you mean that what determines a dataset is not residing in the data resource itself but somewhere external? i seem to be missing something..
I think the concept - as we use - originally was introduced by Malcolm and adopted in DAIS. The current definition there is: A data set is an encoding of data suitable for externalization outside a data service, for example, as an XML document or as a binary stream. The concept of a data set is introduced to describe data as it appears in the messages passing to and from data services, i.e. between the consumer and the data service. Does that make it any clearer? I think Malcolm's original conception was much richer. It allowd for provenance to be included in the data set envelope and the payload could be an instruction on how to generate the data - not the data itself. This was all being talked about a long time ago and has not carried through. The ability to talk about an externalised piece of data was useful though and was adopted in DAIS and is now being used in the OGSA-D WG. Cc'ing Malcolm in and Simon in case they wish to pass comment.
- composite data service: what do you mean by the unbounded link? (sorry if i missed some discussions here)
Not all of these things are my definitions :-) A simple data service can represent more than one data resource or data service. Did not think that there was a limit hence I put unbounded.
- i don't understand how data virtualization can provide data management and data access. i understand how it provides data creation (i think) but also then shouldn't it be described the other way round? i.e. data virtualization can trigger data creation? again i seem to be missing something.. so i still have problems with understanding what different people mean by 'virtualization', do you we have a final definition if this yet? i may have an outdated version of the doc..
I've had problem with the virtualization term myself. We phased it out of DAIS because it was such an overloaded term and everyone had their own view as to what it meant. I did not quite see what the association was in the document. I think if you present a modified (or updated) version of the concept map as to how you think it should look then that would be cool.
- looking at your picture, it reads that data capabilities are access,federation,location management and transfer. the problem here is that this is easily misunderstood since 'data capabilities' refers to the grid architecture capabilities, not to capabilities of actual data being stored. so having this in the same image may be confusing. it may help to talk about architectural capabilities and models in a separate image.
I don't disagree. I was just trying to get a picture as to what the conceptual relationships in the document then was. It needs to be reviewed and where it's still not clear then the document needs to be clarified.
- i agree with your annotations in the cmap! i think the data description is metadata to be stored either with the data or in a data catalog; and dataset is just like that too, its metadata of the actual data, not data itself. for transfers you would need to look up the metadata and you're right, it doesn't necessarily materialize except on the actual target.
If you want to take a shot at updating the CMAP then that would be cool. Would not mind iterating with someone/others. Should be clear though as to what comes from the document and what comes from a perception as to what it should be (so that it is later clarified in the document). My initial take was to derive the cmap from what was in the document. I should try to update it but if you want to have a go first then please feel free :-). Thanks Peter, Mario
peter
On Tue, 2005-05-03 at 13:54 +0100, Mario Antonioletti wrote:
Hi, I think one thing that has been lacking a bit in the OGSA-D WG so far has been the bigger picture trying to tie some of the concepts together. A long time ago I remember seeing the concept maps that were produced for the WS architecture (see: http://www.w3.org/TR/ws-arch/) and I quite liked these. I don't know what tool they used but I found something similar, if not the same. See:
It's a java application and I think, though I've not looked too closely, is free. I have tried to create a concept map for the data architecture and have included an exported image as an attachment to this email as well as the cmap file that contains the source to the diagram in case anyone wants to play with it. It was very easy to generate and change and I believe the cmaps can be shared so they could be developed in a collaborative fashion. I think it helps to ellicit detail from the model, e.g. to what does the data description bind to: the virtualization or the service? Does the idea of having a data set defined in the data transfer make sense? i.e. I think of a data set as a disconnected piece of data that has been materialised outside the data resource and this would only make sense if it were staged. Do we benefit at all from making the distinction between a simple and aggregated data service? etc, etc, etc.
Note that I am not advocating that these should be included in the document, nor that everything should go in one diagram (as is currently). I think that the diagrams can be nested (maybe). Is this a worthwhile thing to do? If the answer is yes can it be done so the diagrams are shared? Thoughts?
I will try to distribute something on data access by the close of day today (UK time) albeit it'll be far from complete.
Mario
+-----------------------------------------------------------------------+ |Mario Antonioletti:EPCC,JCMB,The King's Buildings,Edinburgh EH9 3JZ. | |Tel:0131 650 5141|mario@epcc.ed.ac.uk|http://www.epcc.ed.ac.uk/~mario/ | +-----------------------------------------------------------------------+ --
CERN, 1211 Geneva 23, Switzerland
+-----------------------------------------------------------------------+ |Mario Antonioletti:EPCC,JCMB,The King's Buildings,Edinburgh EH9 3JZ. | |Tel:0131 650 5141|mario@epcc.ed.ac.uk|http://www.epcc.ed.ac.uk/~mario/ | +-----------------------------------------------------------------------+