On Jun 14, 2005, at 1:24 AM, Andre Merzky wrote:
Quoting [John Shalf] (Jun 14 2005):
Should we find some case that causes problems for a readv/pread model? The hyperslabbing is clearly not one of those cases. Actually, how would you do an HDF5 hyperslab via readv? The only way I see is instrumenting the HDF5 library, and write a readv file driver - but then you would not use SAGA anyway, that not application level anymore.
The same problem exists with any of the proposed solutions, including eRead. So I'm not sure if I see the point here.
If you want to read hyperslabs on an HDF5 file on application level with readv, you would need to mimic the HDF5 lib in order to find the offset for the data set, and would need to know details about HDF5 file structure and data layout.
Someone will need to solve the very same problem in order to implement an HDF5-specific eRead interface.
Compared to that, eread really is simplier to the application. Here an example we used for hyperslabbing a 3D scalar field:
snprintf (pattern1, 255, "(%d, %d, %d, %d)" , start1, stop1, stride1, reps1); snprintf (pattern2, 255, "(%d, %d, %d, %d, %s)", start2, stop2, stride2, reps2, pattern1); snprintf (pattern3, 255, "(%d, %d, %d, %d, %s)", start3, stop3, stride3, reps3, pattern2); res = file.eRead (pattern3, (char*) buf, buffer_size);
So you would actually need to embed this in-situ with your HDF5 code? Or would you go through the HDF5 libraries so that you can push that information string down to the driver layer? Its not clear where exactly you place these calls. And when you *do* insert these calls, it requires some understanding of the HDF5 internal file layout. Or are we going to ditch the HDF5 API and use eRead instead? How then do we use eRead to manage all of the other HDF5 features like compression, groups, iteration etc.??? What is the string spec for an HDF5 group iterator using eRead strings? This is why I fail to see the benefits of the eRead interface (it didn't prevent us from mucking with the guts of HDF5 if you want to preserve the HDF5 API, but it also didn't reduce complexity for the user if you are going to replace the HDF5 APIs with these stringy pattern requests).
start, stop, stride, reps corespond directly to the HDF5 semantics. So, the semantic info is indeed maintained on appliation level, and, as you said before, its interpretation is pushed to lower levels.
It looks like you will end up encoding the entire HDF5 API as eRead pattern strings and push it to the other end of a client-server connection. Again, I'm not sure if we made life easer for the remote HDF5 people.
How would that look for recv?
What I was thinking is that developers of HDF5 may have an interest in defining vector or patterned read operations at the VFD layer of their interface. This would enable them to propagate the kind of information you are attempting to encode in eRead strings down to the driver where vector-read interfaces can take advantage of them for deeper pipelining of high-latency operations. (they could, for instance, use some of the methods that Thorsten was referring to, or they could use vread/vwrite type operations). So the issue is that 1) if you use eRead to replace the HDF5 API, then we are talking about an enormously complex string-encoding interface. 2) if you use eRead in the VFD, then you have to instrument HDF5 to propagate information about patterned reads down the driver layer. That is of course the same thing you need if you use vread()/readp() (or any of the interfaces that Thorston described). So I don't see much of a difference in capability there except that vread/readp already has information in a form that you can do I/O with. With eRead, you still have to go through and parse some strings to gain access to the same information about the pattern of reads/writes? So its not merely that eRead is pushing complexity to a different layer... I don't see where it is reducing complexity.
Cheers, Andre.
-- +-----------------------------------------------------------------+ | Andre Merzky | phon: +31 - 20 - 598 - 7759 | | Vrije Universiteit Amsterdam (VU) | fax : +31 - 20 - 598 - 7653 | | Dept. of Computer Science | mail: merzky@cs.vu.nl | | De Boelelaan 1083a | www: http://www.merzky.net | | 1081 HV Amsterdam, Netherlands | | +-----------------------------------------------------------------+