[PyQt] QString API v2 concern...

Thu May 9 20:40:13 BST 2013

Hi Matt,

I'm in no position to comment on your wider point, but...

On 9 May 2013 19:59, Matt Newell <newellm at blur.com> wrote:

> On Monday, May 06, 2013 07:49:25 AM Phil Thompson wrote:
> > The first PyQt5 snapshots are now available. You will need the current
> SIP
> > snapshot. PyQt5 can be installed alongside PyQt4.
> >
> > I welcome any suggestions for additional changes - as PyQt5 is not
> > intended to be compatible with PyQt4 it is an opportunity to fix and
> > improve things.
> >
> > Current changes from PyQt4:
> >
> > - Versions of Python earlier than v2.6 are not supported.
> >
> > - PyQt4 supported a number of different API versions (QString, QVariant
> > etc.). PyQt5 only implements v2 of those APIs for all versions of Python.
> >
>
> I haven't looked into this deeper but I am a bit worried about the possible
> performance impacts of QString always being converted to a python
> str/unicode.
> (Not to mention the added porting work when going c++ <-> python).
>
> The vast majority of the PyQt code that we use loads data from libraries
> that
> deal with Qt types, and either directly loads that data into widgets, or
> does
> some processing then loads the data into widgets.  I suspect that this
> kind of
> usage is very common.
>
> As an example a user of QtSql with the qsqlpsql driver that loads data and
> displays it in a list view is going to see the following data
> transformations/copies:
>
> PyQt4 with v1 QString api:
>
> libpq data comes from socket
> -> QString (probable utf8->utf16)
> -> PyQt wrapper of QString (actual data not copied or converted)
> -> QString (pointer dereference to get Qt type)
>
> PyQt5, PyQt4 with v2 QString api:
>
> libpq data comes from socket
> -> QString (probable utf8->utf16)
> -> unicode (deep copy of data)
> -> QString (deep copy of data)
>
> So instead of one conversion we now have one conversion and two deep
> copies.
> Another very probable side-effect is that in many cases either the original
> QString and/or the unicode object will be held in memory, resulting in two
> or
> possibly even three copies of the data.  Even if all but the last stage is
> freed, there will still be 2 or 3 copies in memory during processing
> depending
> on how the code is written, which can reduce performance quite a bit
> depending
> on data size because of cpu cache flushing.
>
> So far this is completely theoretical, and I'm sure in a large portion of
> applications will have no noticeable effect, however I don't like the idea
> that
> things may get permanently less efficient for apps that do process and
> display
> larger data sets.
>
> The one thing that stands out to me as possibly being a saving grace is the
> fact that (at least in my understanding) both Qt and python use utf16 as
> their
> internal string format, which means fast copies instead of slower
> conversions,
> and that it may be possible with some future Qt/python changes to actually
> allow QString -> unicode -> QString without any data copies.
>

FWIW, the Python-uses-roughly-utf16 meme is a common oversimplification.
First, as I'm sure most people know, there are significant changes between
Python2 str/unicode and Python3 str. That cannot but be reflected in
differences between the CPython usage across the 2/3 boundary.

What is less well known is that there is a significant change to CPython
between 3.2 and 3.3 where the latter can store a str as either an array of
8, 16 or 32 bit values with automatic run-time conversions between them
(and API changes to match). So whatever else happens within PyQt, I don't
think the aspiration to the old 1-copy model can be relied on. In the
event, this is what I came up with for the QString to str direction
(corrections/optimisations welcome!):

PyObject *Python::unicode(const QString &string){#if PY_MAJOR_VERSION < 3
    /* Python 2.x. http://docs.python.org/2/c-api/unicode.html */
    PyObject *s = PyString_FromString(PQ(string));
    PyObject *u = PyUnicode_FromEncodedObject(s, "utf-8", "strict");
    Py_DECREF(s);
    return u;#elif PY_MINOR_VERSION < 3
    /* Python 3.2 or less.
http://docs.python.org/3.2/c-api/unicode.html#unicode-objects */#ifdef
Py_UNICODE_WIDE
    return PyUnicode_DecodeUTF16((const char *)string.constData(),
string.length() * 2, 0, 0);#else
    return PyUnicode_FromUnicode(string.constData(),
string.length());#endif#else /* Python 3.3 or greater.
http://docs.python.org/3.3/c-api/unicode.html#unicode-objects */
    return PyUnicode_FromKindAndData(PyUnicode_2BYTE_KIND,
string.constData(), string.length());#endif}

The referenced URLs contain more material.

Hth, Shaheed

> At some point I will try to do some benchmarks and look into the actual
> code
> to see if there is an elegant solution to this potential problem.
>
>
>>
>>
>> kMatt
> _______________________________________________
> PyQt mailing list    PyQt at riverbankcomputing.com
> http://www.riverbankcomputing.com/mailman/listinfo/pyqt
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.riverbankcomputing.com/pipermail/pyqt/attachments/20130509/203c369e/attachment.html>