
I think it is important that the SAGA file read and write operations are as natural and efficient as possible. I have no experience of reading and writing binary data - but I do note that the modules array and struct both appear relevant. I would expect that the operations read and readline and readlines for which I paste the standard doc below: file.read([size]) Read at most size bytes from the file (less if the read hits EOF before obtaining size bytes). If the size argument is negative or omitted, read all data until EOF is reached. The bytes are returned as a string object. An empty string is returned when EOF is encountered immediately. (For certain files, like ttys, it makes sense to continue reading after an EOF is hit.) Note that this method may call the underlying C function fread() more than once in an effort to acquire as close to size bytes as possible. Also note that when in non-blocking mode, less data than was requested may be returned, even if no size parameter was given. Note This function is simply a wrapper for the underlying fread() C function, and will behave the same in corner cases, such as whether the EOF value is cached. file.readline([size]) Read one entire line from the file. A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line). [6] If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. An empty string is returned only when EOF is encountered immediately. Note Unlike stdio‘s fgets(), the returned string contains null characters ('\0') if they occurred in the input. file.readlines([sizehint]) Read until EOF using readline() and return a list containing the lines thus read. If the optional sizehint argument is present, instead of reading up to EOF, whole lines totalling approximately sizehint bytes (possibly after rounding up to an internal buffer size) are read. Objects implementing a file-like interface may choose to ignore sizehint if it cannot be implemented, or cannot be implemented efficiently. Supporting these calls would make reading normal text files completely natural. It would appear that it could be used for binary data too in conjunction with the struct module. However I have no experience of handling complex binary structures in Python. Does anybody have practical experience of handling large volumes (at least GB) of binary data in Python? We may well require a totally different call to read into a buffer - but then how do you handle the data once you have it in your buffer other than writing it out again? Steve

Hi Steve, in my humble (non-pythonesque) opinion, a replication of the native file IO API may well make sense. The Java bindings, for example, do also provide the native Java File IO API, additional to the buffer based SAGA File IO. My $0.02, Andre. PS.: From what I gathered from Hartmut, he has pretty much the same opinion... Quoting [Steve Fisher] (Oct 21 2009):
I think it is important that the SAGA file read and write operations are as natural and efficient as possible. I have no experience of reading and writing binary data - but I do note that the modules array and struct both appear relevant.
I would expect that the operations read and readline and readlines for which I paste the standard doc below:
file.read([size])
Read at most size bytes from the file (less if the read hits EOF before obtaining size bytes). If the size argument is negative or omitted, read all data until EOF is reached. The bytes are returned as a string object. An empty string is returned when EOF is encountered immediately. (For certain files, like ttys, it makes sense to continue reading after an EOF is hit.) Note that this method may call the underlying C function fread() more than once in an effort to acquire as close to size bytes as possible. Also note that when in non-blocking mode, less data than was requested may be returned, even if no size parameter was given.
Note
This function is simply a wrapper for the underlying fread() C function, and will behave the same in corner cases, such as whether the EOF value is cached.
file.readline([size])
Read one entire line from the file. A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line). [6] If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. An empty string is returned only when EOF is encountered immediately.
Note
Unlike stdio?s fgets(), the returned string contains null characters ('\0') if they occurred in the input.
file.readlines([sizehint])
Read until EOF using readline() and return a list containing the lines thus read. If the optional sizehint argument is present, instead of reading up to EOF, whole lines totalling approximately sizehint bytes (possibly after rounding up to an internal buffer size) are read. Objects implementing a file-like interface may choose to ignore sizehint if it cannot be implemented, or cannot be implemented efficiently.
Supporting these calls would make reading normal text files completely natural. It would appear that it could be used for binary data too in conjunction with the struct module. However I have no experience of handling complex binary structures in Python. Does anybody have practical experience of handling large volumes (at least GB) of binary data in Python? We may well require a totally different call to read into a buffer - but then how do you handle the data once you have it in your buffer other than writing it out again?
Steve
-- Nothing is ever easy.
participants (2)
-
Andre Merzky
-
Steve Fisher