[PyQt] Inconsistent pylupdate5 behaviour on UTF8 data

Giuseppe Corbelli corbelligiuseppe at mesdan.it
Tue Feb 18 10:37:43 GMT 2020


On 2/16/20 2:01 PM, Phil Thompson wrote:
> On 12/02/2020 15:27, Giuseppe Corbelli wrote:
>> Hi all
>> I found a puzzling pylupdate5 behaviour inconsistency between Linux
>> and Windows versions.
>> Scenario: I am extracting translatable strings from python modules.
>> The files are saved as UTF8, I run pylupdate and get different
>> representations in the XML output.
>>
>> pylupdate5 v5.14.1 as Debian package on Linux and fresh pip install in
>> a venv on Windows 10.
>>
>> As you can find in the attached test data:
>>
>> - on windows the 'ç' character (U+00E7    ç    c3 a7    LATIN SMALL 
>> LETTER C
>> WITH CEDILLA) is converted to <source>this needs UTF8 encoding:
>> &#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>
>>
>> - on linux the same 'ç' correctly converts to <source>this needs UTF8
>> encoding: &#xe7;&#xb0;&#xa7;</source>
>>
>> So it seems that on windows each byte of the utf8 string is replaced
>> with its unicode point in xml numeric character format, while on linux
>> the same applies (correctly) to the character itself (formed by two
>> bytes in UTF8).
>>
>> Am I doing something wrong?
> 
> I can't reproduce this - I get identical results on Windows, Linux and 
> macOS.
> 
> If you want to try and debug your own installation then look at 
> evilBytes() in qpy\pylupdate\metatranslator.cpp

Turns out that there's something in XML re-parsing (or maybe something 
else that escapes me). Same dataset as my previous email applies.

This is what happens if you run pylupdate (5.14.1) two times in a row in 
a windows 10 box:

(venv_latest) C:\devel\Dynamometer\Supervisor\norms>pylupdate5 -verbose 
test.pro
Updating 'locale/it_IT.ts'...
     Found 2 source texts (2 new and 0 already existing)

(venv_latest) C:\devel\Dynamometer\Supervisor\norms>pylupdate5 -verbose 
test.pro
Updating 'locale/it_IT.ts'...
     Found 2 source texts (1 new and 1 already existing)
     Kept 0 obsolete translations
     Removed 1 obsolete untranslated entry

The second time the UTF8 entry gets screwed up.
Everything is fine on Linux, same pylupdate version.

Digging some more...

-- 
Giuseppe Corbelli


More information about the PyQt mailing list