[PyQt] QString to str auto-conversion issue with SMP codepoints

Shriramana Sharma samjnaa at gmail.com
Fri Jan 3 04:30:30 GMT 2014


Hello. Please see the attached test code. I have PyQt4 4.10.3 on Qt4
4.8.4 on my Kubuntu Saucy system.

I need to be able to handle SMP codepoints in my application.

For instance
𑀅 U+11005 BRAHMI LETTER A
(I hope the SMP character arrives at your inbox correctly!)

Inserting the above character into the QPlainTextBox and clicking Render prints:

len of string: 2
repr of string: '\ud804\udc05'

IIUC, with Py 3 (or is it after a certain version of PyQt, I forget),
PyQt automatically converts Qt's QStrings to Python strings. QStrings
use UTF-16 and hence any SMP characters input into textboxes and such
are converted to the equivalent surrogate pair.

However, the surrogates are in themselves useless except as a pair to
represent the trans-BMP codepoint. So I'd like to see PyQt4
auto-handle this when converting QStrings to actual usable Unicode
codepoints, especially to convert surrogate

That is, I'd like to get a length 1 string '\U00011005' and not a
length 2 string '\ud804\udc05'.

Or if it would be performance-wise costly for PyQt to auto-convert
every QString this way, I'd like to know what is an easy way to
process the resultant string to get my desired form.

Thanks!

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.py3
Type: application/octet-stream
Size: 950 bytes
Desc: not available
URL: <http://www.riverbankcomputing.com/pipermail/pyqt/attachments/20140103/7fe83615/attachment.obj>


More information about the PyQt mailing list