[PyQt] QString to str auto-conversion issue with SMP codepoints

Phil Thompson phil at riverbankcomputing.com
Sat Jan 4 17:28:20 GMT 2014


On 03-01-2014 4:30 am, Shriramana Sharma wrote:
> Hello. Please see the attached test code. I have PyQt4 4.10.3 on Qt4
> 4.8.4 on my Kubuntu Saucy system.
>
> I need to be able to handle SMP codepoints in my application.
>
> For instance
> 𑀅 U+11005 BRAHMI LETTER A
> (I hope the SMP character arrives at your inbox correctly!)
>
> Inserting the above character into the QPlainTextBox and clicking
> Render prints:
>
> len of string: 2
> repr of string: '\ud804\udc05'
>
> IIUC, with Py 3 (or is it after a certain version of PyQt, I forget),
> PyQt automatically converts Qt's QStrings to Python strings. QStrings
> use UTF-16 and hence any SMP characters input into textboxes and such
> are converted to the equivalent surrogate pair.
>
> However, the surrogates are in themselves useless except as a pair to
> represent the trans-BMP codepoint. So I'd like to see PyQt4
> auto-handle this when converting QStrings to actual usable Unicode
> codepoints, especially to convert surrogate
>
> That is, I'd like to get a length 1 string '\U00011005' and not a
> length 2 string '\ud804\udc05'.
>
> Or if it would be performance-wise costly for PyQt to auto-convert
> every QString this way, I'd like to know what is an easy way to
> process the resultant string to get my desired form.
>
> Thanks!

Should be fixed in tonights PyQt4 and PyQt5 snapshots.

Thanks,
Phil


More information about the PyQt mailing list