[PyQt] Fwd: QString to str auto-conversion issue with SMP codepoints

Shriramana Sharma samjnaa at gmail.com
Mon Jan 6 16:52:20 GMT 2014


Thanks Phil for your answers.
I meant to send the following to the list. It may be useful for others.

---------- Forwarded message ----------
From: Shriramana Sharma
Date: Mon, Jan 6, 2014 at 7:08 PM
...

BTW for fixing the presence of the surrogate pairs, as a temporary fix
I concocted the below:

def fixSurrogatePresence(s) :
    '''Returns the input UTF-16 string with surrogate pairs replaced
by the character they represent'''
    # ideas from:
    # http://www.unicode.org/faq/utf_bom.html#utf16-4
    # http://stackoverflow.com/a/6928284/1503120
    def joinSurrogates(match) :
        SURROGATE_OFFSET = 0x10000 - ( 0xD800 << 10 ) - 0xDC00
        return chr ( ( ord(match.group(1)) << 10 ) +
ord(match.group(2)) + SURROGATE_OFFSET )
    import re
    return re.sub ( '([\uD800-\uDBFF])([\uDC00-\uDFFF])', joinSurrogates, s )

-- 
Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा


More information about the PyQt mailing list