ByteIO Working Group session at GGF14 11.00 - 12.30, 29/6/05 15 attendees Minutes -------- * Introduction - Next telecon is on the 12th of July, 2005 - Why was write included? -- The hard part is consistency -- Not part of the scope. You can have something like advisory locking outside of the ByteIO interfaces * Use Cases: Files - SAGA has found that it's often useful for the writer of a use case to be able to propose api - GridFTP is an example of a very successful "grid" service what is the advantage of byteio over gridftp? -- GridFTP folks have no problem with idea of a ByteIO interface which uses GridFTP underneath - Does stream also include push model? -- no, it doesn't, it's just a standard request for data from a stream i.e. you may only get back Y bytes even though you requested X bytes - What is RNS? -- Resource Namespace Service (see GFS-WG) - We need to go back and add the write point of view to file use cases - Two issues bundled in streams: sessions (to amortise cost of open over many operations) - can you overcome repeated authentication/authorisation overheads though? -- somewhat implementation dependant -- authorisation using X509 is a issue for sessionless access as you may need to parse every time -- when caching used to amortise cost, effectively have sessions underneath -- connection caching gives perf. benefits in GridFTP -- implementors can choose to implement either session or sessionless as required -- don't expect that Client API is always sessionless as well (client has the smarts) * Use Cases: Database - Asynchronous Query -- Similar to GridRPC - Simple and Complex Insert - Copying * Interfaces - not trying to solve all IO problems right now - trying to get useful things out to groups now - added back in sessionable streamable interface as we think it's too important - Conceptual Interface IRandomByteIO -- all returns are void? --- come back to that - SAGA -- complex IO that many apps need --- read a byte seek a byte write a byte --- can cache --- but cost of doing all ops is more than bytes transferred --- could cluster ops --- or do scheduled read/write (readv and writev) --- or do patterns (e.g I want every second byte for n bytes) --- or highlevel e.g GridFTp eread/estore (complicated to setup though) ---- out of scope for ByteIO (OGSA-D?) --- not sure it's the common case ---- it is for visualisation -- not difficult to do e.g. scattered read/writes does the complication outweigh the benefits -- suggest wait til spec phase, if someone says we want these then just do it -- if there aren't use cases then don't implement it --- send in the SAGA use cases -- return from read is obvious - very difficult to write normative spec which fits all potential profiles e.g. could use BaseFault for WSRF -- need nonnormative doc which describes what we are trying to do -- additional documents which render the concept to a particular profile, which cover e.g. exceptions and return values - is there QoS? How would you combine it? -- out of scope, look at OGSA-D - cleaner to have a separate truncate operation? -- can still truncAppend 0 bytes to acheive this -- more useful/efficient to have atomic truncAppend op - benefits for data going into SOAP, easy -- require base SOAP protocol - dataTransferMech QualifiedName is just a predefined string? -- look at BaseNotification, will define some in the group -- inclination to put in another spec, but service could come up with their own - Conceptual Interface IStreamableByteIO -- a non streamable resource, you're talking to the thing which has the bytes. destroy it, you conceptually destroy the bytes -- on streamable, if you destroy the stream notionally you don't destroy the bytes, you destroy stream -- "begining of a stream" is not defined (up to implementation). Don't require implementations to cahce forever. - seems incomplete without being able to open streams -- OGSA traditionally shies away from factories - could have a streamable ByteIO interface on a non-streamable object, hard to come up with single factory pattern which covers how you may want to open a resource e.g. additional information for telescope -- problem comes up because of the sheer variety of possible sources -- isn't everything at least streamable? -- example of the other way round: hard disk is not - sessionless interface will not be efficient for remote file servers? caching will work locally but not remotely? - why define the non-streamable interface? -- easier for implementors: garbage collection - change proposed is that you have to have an open call -- what's the difference then with the streamable interface - is sessionless going to be difficult to get performance - Mike Beckerle: I have a problem with two ops for seek and read, seek and write - propose that we have seekread, and seekwrite -- then what happens to returntype if you just want to do seek? -- seek would return a long and read would return bytes -- if you want to find out what you are could query e.g. resource properties? could imply position from seekread - seek is a delta from a seek origin which could be beginning current or end - seems still slightly cumbersone to go through resource properties - *** add getCurrentPosition, seekread and seekwrite *** - getCurrentPosition will depend on rendering to a particular profile - clients should keep track of where they are! * Bulk Data transfer - might want to make stronger statement: "if you implement DIME, you must name it in this way" but only simple is required - should number of bytes be "long"? -- never going to actually transfer a 32bit number of bytes on the wire? -- but not really going to argue e.g. how about how big is my file, give me all that many bytes -- 4GB seems big just now, but in the future... - where does the "whoami" and other details about this go? -- all in OGSA Base profile, based on BaseNotification -- appears in headers, but ByteIO doesn't see it * Wrap-Up - implementations don't do all OGSA Base profile - code isn't available yet, need to clean up - SAGA working on client side prototype for fileIO - crosspost to SAGA list when it comes out