[PyQt] PyQt cannot trasform QString into str when reading emoji symbol from QClipboard

Ilya Kulakov kulakov.ilya at gmail.com
Thu Jan 22 12:13:49 GMT 2015


I'm testing the following symbol: 😷

I wrote simple Objective-C application to check how native frameworks would encode this into UTF-8. Here is the code:

    NSString *str = [[NSPasteboard generalPasteboard] stringForType:@"public.utf8-plain-text"];
    const char *cstr = str.UTF8String;
    size_t i = 0;
    while (cstr[i] != 0)
    {
        NSLog(@"0x%x", cstr[i]);
        ++i;
    }

Then I wrote a simple Qt app to ensure that returned QString has the same bytes:

    QClipboard *clipboard = QApplication::clipboard();
    QString originalText = clipboard->text();
    QByteArray bytes = originalText.toUtf8();
    for (size_t i = 0; i < bytes.count(); ++i)
        qDebug("0x%x", bytes.at(i));

In both apps output is:

    0xfffffff0
    0xffffff9f
    0xffffff98
    0xffffffb7

However when I extract text by using PyQt (python 3):

    QApplication.clipboard().text()

returned str consists of 1 string and cannot be encoded to UTF-8 due to surrogate '\ud83d' at position 0.
However, as you can see above, there is no such symbol.

That raises 2 questions:
1. How this symbols was introduced
2. How to handle this in an application

The original bug report we received was from our Windows user, but we were not able to reproduce it there. However it's pretty easy to reproduce on Mac OS X.

Best Regards,
Ilya Kulakov



More information about the PyQt mailing list