Hi Andre, Coincidentally, I'm looking at a very similar thing right now. I'm trying to extend an archive which I've been building here at CCT. In the archive currently, we have netCDF files, for the coastal modelers, which support this kind of subsetting. We also plan to roll out the archive to the physicists, who will want to put their huge HD5 files in the archive, and then do hyperslabbing on these (essentially some kind of subset, but with a cool name). I had imagined passing some specification to the archive, represented by attribute/value pairs, along with the LogicalFileName. The service on the end to prepare the data for me, and places it in a temporary store, and return me the URLs to the prepared file. I would then access the file in the normal way. When your original dataset is 1TB, you have problems. You can't simply prepare the data in the time that it takes to do a call and reply. You need to go asynchronous. With the solution I've gone for, I can simply say "this isn't ready yet, but I'm working on it" rather than returning the URLs. The user can check back later (polling), or I can tell them when it's ready (notification). Then they access the data. How do you make your proposed eRead operation "go asynchronous" if things would take a long time? Or would the first read just hang until the data was prepared? Jon. On Jun 13, 2005, at 5:38 AM, Andre Merzky wrote:
Hallo Hartmut,
Quoting [Hartmut Kaiser] (Jun 13 2005):
Agreed here.
That extension would look like:
void lsEModes (out array
emodes ); void eWrite (in string emode, in string spec, in string buffer out long len_out ); void eRead (in string emode, in string spec, out string buffer, out long len_out ); - hooks for gridftp-like opaque ERET/ESTO features - spec: string for pattern as in GridFTP's ESTO/ERET - emode: string for ident. as in GridFTP's ESTO/ERET
EMode: a specific remote I/O command supported lsEModes: list the EModes available in this implementation eRead/eWrite: read/write data according to the emode spec
Example (in perl for brevity):
my $file = SAGA::File new ("http://www.google.com/intl/en/images/logo.gif"); my @emodes = $file->lsEModes ();
if ( grep (/^jpeg_block$/, @emodes) ) { my ($buff, $len) = file.eRead ("jpeg_block", "22x4+7+8"); }
I would discourage support for B, since I do not know any protocoll supporting that approach efficiently, and also it needs approximately the same infrastructure setup as C.
As A is easily implementable on application level, or within any SAGA implementation, there is no need for support on API level -- however, A is insufficient for all but some trivial cases.
This approach is very generic on the API level (that's good) but requires exact agreement on the used command syntax for the client and the server, which may get problematic. If we go this route we will definitely end up specifying at least a minimal command subset to be supported by the eRead/eWrite commands.
You are right: complexity does not go away magically, but gets moved to the specification of the eModes.
As for a minimal set: I do not think that this is necessary - the eMode is SUPPOSED to be application specific. OTOH, a intuitive example usable from some cases may be helpful. GridFTP ERET standard example is partial file access (IIRC: filename, offset, length). That is not very useful for SAGA, since that is already covered by the normal read/write operations.
I simply fear we'll have the same problems we have with the GAT today. The GAT API is in principle usable in a broad range of use cases based on a generic API. The genericity is ensured by using key/value tables in the API itself, allowing quick adaptation to any concrete need. The problem is the missing specification of these key/value pairs which makes it difficult to achieve reusability.
I absolutely agree that the problem lies right there: semantic overloading of strings. The situation is somewhat better than in GAT though:
- the preferences in GAT are really generic, and can be used for anything. The eModes have a very limited scope, and are hence much easier to agree on between different implementations
- as the mapping to GridFTP is 1:1, and GridFTP is quite commonly used, so there is at least some other instance to be used for agreement on the modes. Hence, every implementation of a eMode can be expected to do the same thing. At least there is a good chance for that.
However, again: you are right. Semantic overloading of strings is not a nice thing to do, and is here only justified by a lack of obvious alternatives.
Thanks, Andre.
Regards Hartmut
-- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+