[PyKDE] QString to Python string conversion trouble

Andreas Gerstlauer gerstl at ics.uci.edu
Tue Oct 23 20:45:36 BST 2001


Phil Thompson wrote:

> Damn. The change was required to handle the case where the QString contained 
> Unicode characters > 128 so that it wouldn't raise an exception. Note that the
> two strings are logically the same - one encodes newline as \n, the other as 
> \012.
>
Well, if I think about it, the behavior I would expect from PyQt in
terms of unicode and non-unicode strings is that if I use the str()
function I will get back a Python string object and if I use the
unicode() function a Python unicode object. In the former case, it's
ok to get an exception if the string containts characters > 128 but
in the latter case such characters should come out as wanted.

> Strictly speaking the new behaviour is correct - but I don't want to break 
> existing code.  It's wrong to say that QString/Python string conversion should 
> be transparent - they hold different (but similar) types of data.
>
Ok, yes. Let's just say then that it would be nice if they are as
transparent as possible. And newlines are rather common characters...

> Is the change only cosmetic, or does it break things?
> 
Well, it does for me. The specific case we are having here is that I
am reading a QString from a widget, do some massaging in Python on it
and then pass it into an DOM object for parsing (the string is XML). The
DOM objects chokes on the "/012" strings in there...
The only options I would have is to do all massaging on QStrings, not python 
strings, or to 'decode' the string I get out PyQt...
In generally, this problem will appear whenever multiline strings are
coming out of Qt into Python (e.g. reading from a file, a text editor, etc.)

> The only thing I could do is to check the QString and use the old conversion 
> if all the characters are ASCII, and the new one if any character is 
> non-ASCII.
> 
Hm, but even then, you should only encode the characters > 128 ...
No, I think the problem/solution lies deeper...

I looked into your routine and your __str__ method currently always returns
a Python string object where the chars > 128 are "escape" encoded.
If I use the unicode() function on a QString this way, is Python smart
enough to decode those escape strings again and return me a Python unicode
object with actual Unicode characters (and not the escape string) in it?
Not in the case of the default ASCII encoding used by unicode() at least...

Therefore, currently, even unicode() won't return a correct unicode
representation of the QString. So the current approach doesn't really work
anyway (other than getting rid of the exception but the result is still
not what wanted).

However, Python, on the other hand, is smart enough to be
able to convert a unicode object to a string object if the __str__ method
returns a Python unicode (if the str() method is applied; naturally
raising an exception if there are >128 chars in there but that is ok).
That way, if you apply unicode() you get the real unicode and if you apply
str() you get the string without any escape encoding.

I tried that here with a test class:
   class Test:
       def __str__(self):
           return u"Test\nTest"
           #return u"Test\0400Test"
and it works like a charm, i.e. when applying unicode() to Test instances
I get the actual unicode representation and when applying str() I get
the ASCII encoded string (with an exception in the case of chars > 128).

I would expect/like if unicode() and str() applied on the QString class
would behave like this!

I played with your code and started by removing
  PyUnicode_AsUnicodeEscapeString()
call in __str__ of qstring.sip and returning the unicode object directly
instead of the escape encoded string. I tried that, basically works.

But then I ran into the problem that in qt.py the shadow class for QString 
does a str() call in its __str__ method. That will invoke Python's 
unicode-to-string automatism and hence result in an exception in the 
unicode case.

Now I am at the end of my knowledge of Python's C API and sip. I couldn't
figure out how to simply pass the unicode object returned by the QString
tp_str function code in libqtcmodule.so through the __str__ method of
the shadow QString in qt.py (such that __str__ returns the unicode object
and doesn't try to convert it to a string)... Any ideas?

I think if the last part can be solved that would be the solution to the
whole problem.

Andreas






More information about the PyQt mailing list