Re: [SAGA-RG] Python bindings: Buffer class issue

12 Nov 2009

      On Wed, 2009-11-11 at 21:11 +0100, Andre Merzky wrote:
...
Quoting [Manuel Franceschini] (Nov 11 2009):
...
On Mon, Nov 9, 2009 at 11:10 PM, Andre Merzky <andre@merzky.net> wrote:
...
Quoting [Manuel Franceschini] (Nov 09 2009):
...
Hi all,
Quick summary from GFD.90: the SAGA I/O Buffer encapsulates a sequence
of bytes to be used for I/O operations, e.g. read()/write() on files
and streams, and call() on rpc instances. The recent removal of the
buffer class from the Python bindings of the C++ SAGA implementation
led us to think again about this issue. The GFD is C/C++ oriented
Well, it should not be C/C++ oriented, but the bias of the authors
probably shows :-)  The intent was to support binary I/O on any
language, as that was mentioned in many use cases.
...
and therefore the Python implementation is all but clear in this regard.
Given that that memory management is automatic in Python, the notion
of application-managed and implementation-managed Buffer disappears.
From what I learned during the discussion in Banff, this is not
really true: one *can* allocate an array in user space and pass it
to an API by-reference, which actually makes it a application
managed memory segment.  The point in python seems to be that nobody
is doing that...
Well, in Python there is *only* by-reference parameter passing,
references to objects that is. Version 2.6 introduced an io module
that allows to do what you describe. One problem with this is that our
JySAGA bindings can't support this new feature as Jython just reached
version 2.5.1 and it looks like there is quite a long way to go to
2.6.
That is an implementation problem, and should not influence the
python bindings, right? ;-)
Well, defining bindings that break all current implementations and their
usage won't work either. The C++ wrapper now also requires Python >=
2.2. Would all current users be willing/able to upgrade to >= 2.6?

The bindings will have to define which Python version is required. It
not only a matter of 2.x or 3.x; the 2.x versions also contain
increasingly more relevant functionality.

We can either opt for something low (e.g. >= 2.2) to increase
acceptance, or something high (e.g. >= 2.6 or >= 3.0) if these contain
features that are essential for the bindings. A third option is to
specify optional additional functionality for an implementation that's
only targeted at newer versions of Python, but that will probably
generate a lot of confusion.

I'd say we stick to >= 2.2; widely used, and supported by all current
implementations.
...
...
I did some memory profiling with large chunks of data copied from one
file to another and the automatic memory management in Python seemed
to be very efficient. In my tests the garbage collection was
instantaneously. In other words, as soon as there was no more
references to a data chunk, memory was deallocated. So when shuffling
1MB chunks 10000 times from one file to another, the memory
consumption of the test program never exceeded 2,5 MB. If somebody can
come up with a test program that shows the advantage of using the new
io module in relevant use cases, we could think about using it in the
C++ bindings. Otherwise, why optimize when there's not real problem?
Fair point.
But, BTW, I don't see app managed buffers for optimizing memory
consumption, but for optimizing latency, as you save memcopy calls.
In theory at least...
...
...
...
There is no need for a Python SAGA user to tell the bindings who
manages the Buffer, since it is managed by the underlying Python VM.
Another more critical issue is the data type used to hold binary data
in Python. In Python 2.x the immutable 'str' type is used whereas
Python 3.x has a newly introduced immutable 'bytes' type. Let's forget
about 3.x for a moment, since 2.x will be around for at least a couple
of more years. In order to manipulate large binary datasets, the mmap
class [0] could be used, which basically transforms a immutable 'str'
into a mutable mmap object. In other words it provides the ability to
efficiently modify binary data.
Not really; it memory-maps a file, not an arbitrary string. However, you
can easily convert a string to a list or array and manipulate that in
place.

The real question is: which use cases are we trying to optimize? What
will SAGA Python apps do with binary data?
...
...
...
...
In the VU Python bindings the buffer class is still present, while, as
previously said, in the C++ Python bindings it was removed recently. I
do not see any issues with the removal of the Buffer class in the
Python bindings. However, I'm not sure whether I am forgetting some
corner cases (e.g. async) that would require a dedicated Buffer class.
When removing the Buffer class, the user would simply deal with 'str'
type data to pass data back and forth to a SAGA file, stream or rpc.
If the bindings decide to go for strings, then that should pose no
problem for the async calls, as far as I can tell: semantics of sync
and async calls is identical (apart from synchronization obviously).
...
Now, I identified the following crucial questions:
1) Can the Buffer class be safely removed from the Python bindings?
According to the original SAGA use cases: no
According to current SAGA users: yes
What were the original use cases that required a Buffer class?
...
...
...
So, tough call ;-)
What do other people think?
anybody??
...
...
...
3) Is compliance to Python 3.x a concern right now? In other words, is
the eventual migration to 3.x to take into consideration?
If 3.x makes something easier, it might be good to be aware of it at
least.  I think all agree that 2.x will be around for a long time,
and that limiting the bindings to 3.x is not an option.  OTOH, it
should be possible to have slightly differing bindings for 2.x and
3.x, depending on the changes in the language itself.
Yeah, I don't think we should think too much about that now. But for
the future it will bring several benefits to the Python bindings.
agree.
Cheers, Andre.
-Mathijs

Re: [SAGA-RG] Python bindings: Buffer class issue

Mathijs den Burger