Python read and write

21 Oct 2009

      I think it is important that the SAGA file read and write operations
are as natural and efficient as possible. I have no experience of
reading and writing binary data - but I do note that the modules array
and struct both appear relevant.

I would expect that the operations read and readline and readlines for
which I paste the standard doc below:

file.read([size])

    Read at most size bytes from the file (less if the read hits EOF
before obtaining size bytes). If the size argument is negative or
omitted, read all data until EOF is reached. The bytes are returned as
a string object. An empty string is returned when EOF is encountered
immediately. (For certain files, like ttys, it makes sense to continue
reading after an EOF is hit.) Note that this method may call the
underlying C function fread() more than once in an effort to acquire
as close to size bytes as possible. Also note that when in
non-blocking mode, less data than was requested may be returned, even
if no size parameter was given.

    Note

    This function is simply a wrapper for the underlying fread() C
function, and will behave the same in corner cases, such as whether
the EOF value is cached.

file.readline([size])

    Read one entire line from the file. A trailing newline character
is kept in the string (but may be absent when a file ends with an
incomplete line). [6] If the size argument is present and
non-negative, it is a maximum byte count (including the trailing
newline) and an incomplete line may be returned. An empty string is
returned only when EOF is encountered immediately.

    Note

    Unlike stdio‘s fgets(), the returned string contains null
characters ('\0') if they occurred in the input.

file.readlines([sizehint])

    Read until EOF using readline() and return a list containing the
lines thus read. If the optional sizehint argument is present, instead
of reading up to EOF, whole lines totalling approximately sizehint
bytes (possibly after rounding up to an internal buffer size) are
read. Objects implementing a file-like interface may choose to ignore
sizehint if it cannot be implemented, or cannot be implemented
efficiently.

Supporting these calls would make reading normal text files completely
natural.  It would appear that it could be used for binary data too in
conjunction with the struct module. However I have no experience of
handling complex binary structures in Python. Does anybody have
practical experience of handling large volumes (at least GB) of binary
data in Python? We may well require a totally different call to read
into a buffer - but then how do you handle the data once you have it
in your buffer other than writing it out again?

Steve

Steve Fisher

Andre Merzky

tags

participants (2)