[PyKDE] QString to Python string conversion trouble

Phil Thompson phil at river-bank.demon.co.uk
Tue Oct 23 22:50:44 BST 2001


>===== Original Message From Andreas Gerstlauer <gerstl at ics.uci.edu> =====
>Phil Thompson wrote:
>
>> Damn. The change was required to handle the case where the QString 
contained
>> Unicode characters > 128 so that it wouldn't raise an exception. Note that 
the
>> two strings are logically the same - one encodes newline as \n, the other 
as
>> \012.
>>
>Well, if I think about it, the behavior I would expect from PyQt in
>terms of unicode and non-unicode strings is that if I use the str()
>function I will get back a Python string object and if I use the
>unicode() function a Python unicode object. In the former case, it's
>ok to get an exception if the string containts characters > 128 but
>in the latter case such characters should come out as wanted.

It's not acceptable to get an exception when applying str() to a QString - you 
should get a Python string object with any characters > 128 properly escaped.

>> Strictly speaking the new behaviour is correct - but I don't want to break
>> existing code.  It's wrong to say that QString/Python string conversion 
should
>> be transparent - they hold different (but similar) types of data.
>>
>Ok, yes. Let's just say then that it would be nice if they are as
>transparent as possible. And newlines are rather common characters...

When I replied originally I was away from my system. When I actually repeat 
your test I don't get the same result. My transcript is...

>>> from qt import *
>>> l = "Test\nTest"
>>> print l
Test
Test
>>> s = QString(l)
>>> print str(s)
Test\nTest

You said you were using Python 2.0 - I was using Python 2.1.

>> Is the change only cosmetic, or does it break things?
>>
>Well, it does for me. The specific case we are having here is that I
>am reading a QString from a widget, do some massaging in Python on it
>and then pass it into an DOM object for parsing (the string is XML). The
>DOM objects chokes on the "/012" strings in there...
>The only options I would have is to do all massaging on QStrings, not python
>strings, or to 'decode' the string I get out PyQt...
>In generally, this problem will appear whenever multiline strings are
>coming out of Qt into Python (e.g. reading from a file, a text editor, etc.)
>
>> The only thing I could do is to check the QString and use the old 
conversion
>> if all the characters are ASCII, and the new one if any character is
>> non-ASCII.
>>
>Hm, but even then, you should only encode the characters > 128 ...
>No, I think the problem/solution lies deeper...
>
>I looked into your routine and your __str__ method currently always returns
>a Python string object where the chars > 128 are "escape" encoded.

That is the correct behaviour.

>If I use the unicode() function on a QString this way, is Python smart
>enough to decode those escape strings again and return me a Python unicode
>object with actual Unicode characters (and not the escape string) in it?
>Not in the case of the default ASCII encoding used by unicode() at least...

unicode() does not use the default encoding.

>Therefore, currently, even unicode() won't return a correct unicode
>representation of the QString. So the current approach doesn't really work
>anyway (other than getting rid of the exception but the result is still
>not what wanted).
>
>However, Python, on the other hand, is smart enough to be
>able to convert a unicode object to a string object if the __str__ method
>returns a Python unicode (if the str() method is applied; naturally
>raising an exception if there are >128 chars in there but that is ok).

As I said - that is not Ok. Also, __str__ has to return a string object. In 
Python 2.2 __unicode__ returns a Unicode object.

>That way, if you apply unicode() you get the real unicode and if you apply
>str() you get the string without any escape encoding.
>
>I tried that here with a test class:
>   class Test:
>       def __str__(self):
>           return u"Test\nTest"
>           #return u"Test\0400Test"
>and it works like a charm, i.e. when applying unicode() to Test instances
>I get the actual unicode representation and when applying str() I get
>the ASCII encoded string (with an exception in the case of chars > 128).
>
>I would expect/like if unicode() and str() applied on the QString class
>would behave like this!
>
>I played with your code and started by removing
>  PyUnicode_AsUnicodeEscapeString()
>call in __str__ of qstring.sip and returning the unicode object directly
>instead of the escape encoded string. I tried that, basically works.
>
>But then I ran into the problem that in qt.py the shadow class for QString
>does a str() call in its __str__ method. That will invoke Python's
>unicode-to-string automatism and hence result in an exception in the
>unicode case.
>
>Now I am at the end of my knowledge of Python's C API and sip. I couldn't
>figure out how to simply pass the unicode object returned by the QString
>tp_str function code in libqtcmodule.so through the __str__ method of
>the shadow QString in qt.py (such that __str__ returns the unicode object
>and doesn't try to convert it to a string)... Any ideas?

You are basically asking me to restore the previous behaviour - which 
Boudewijn and Marc-Andre (eventually) convinced me was incorrect.

>I think if the last part can be solved that would be the solution to the
>whole problem.

The first thing to try is upgrading your version of Python and we can take it 
from there.

Phil





More information about the PyQt mailing list