[PyQt] Inconsistent pylupdate5 behaviour on UTF8 data

Phil Thompson phil at riverbankcomputing.com
Mon Mar 2 16:19:04 GMT 2020


Giuseppe,

The fundamental problem is that in moving the code from Qt4 to Qt5 I did 
the least work I could get away with rather than doing the job properly.

Can you confirm that you have a workaround for this?

I'd rather not try and fix it at this stage (given Qt6 is sooner rather 
than later). For PyQt6 I'd rather replace the whole lot with a pure 
Python implementation that compiles and inspects the Python byte code.

Thanks,
Phil

On 19/02/2020 09:06, Giuseppe Corbelli wrote:
> On 2/18/20 5:58 PM, Phil Thompson wrote:
>> What if you use trUtf8() instead if tr()?
> 
> I explored all the combinations I could think of on Windows 10, pyqt
> 5.14.1 from pip and linguist 5.13.2 and I could NOT find any working
> combination. Below I am attaching the test results. Rather lengthy and
> boring I fear.
> 
> If gist is preferrable:
> https://gist.github.com/cowo78/26057f575ddfa3ee20a0b636acd894ff
> 
> 
> Section A - using trUtf8() in code
> ===============================================================================
> Using trUtf8 I ALWAYS get a 'Non-ASCII character detected in trUtf8
> string' warning
> 
> Case 1 - NOT working
> -------------------------------------------------------------------------------
> trUtf8()
> # CODECFORSRC = UTF-8
> # CODECFORTR = UTF-8
> 
> Message created:
> <message encoding="UTF-8">
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: ç°§</source>
>     <translation type="unfinished"></translation>
> </message>
> 
> Repeated pylupdate5 runs are OK, the same message is consistently 
> generated.
> 
> Processed by linguist 5.13.2:
> <message>
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: ç°§</source>
>     <translation>UTF8</translation>
> </message>
> 
> Reprocessed by pylupdate5
> <message>
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: &#xe7;&#xb0;&#xa7;</source>
>     <translation type="obsolete">UTF8</translation>
> </message>
> <message encoding="UTF-8">
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: ç°§</source>
>     <translation type="unfinished"></translation>
> </message>
> 
> 
> Case 2 - NOT working
> -------------------------------------------------------------------------------
> trUtf8()
> CODECFORSRC = UTF-8
> # CODECFORTR = UTF-8
> 
> Message created the FIRST time and subsequent ODD runs
> <message encoding="UTF-8">
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: 簧</source>
>     <translation type="unfinished"></translation>
> </message>
> 
> Message created the SECOND time and subsequent EVEN runs
> <message encoding="UTF-8">
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: ç°§</source>
>     <translation type="unfinished"></translation>
> </message>
> 
> 
> Case 3 - NOT working
> -------------------------------------------------------------------------------
> trUtf8()
> # CODECFORSRC = UTF-8
> CODECFORTR = UTF-8
> 
> Message created:
> <message encoding="UTF-8">
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: ç°§</source>
>     <translation type="unfinished"></translation>
> </message>
> 
> Repeated pylupdate5 runs are OK, the same message is consistently 
> generated.
> 
> Processed by linguist 5.13.2:
> <message>
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: ç°§</source>
>     <translation>utf8</translation>
> </message>
> 
> Reprocessed by pylupdate5
> <message>
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: &#xe7;&#xb0;&#xa7;</source>
>     <translation type="obsolete">utf8</translation>
> </message>
> <message encoding="UTF-8">
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: ç°§</source>
>     <translation type="unfinished"></translation>
> </message>
> 
> 
> Case 4 - NOT working
> -------------------------------------------------------------------------------
> trUtf8()
> CODECFORSRC = UTF-8
> CODECFORTR = UTF-8
> 
> Message created:
> <message encoding="UTF-8">
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: ç°§</source>
>     <translation type="unfinished"></translation>
> </message>
> 
> Repeated pylupdate5 runs are OK, the same message is consistently 
> generated.
> 
> Processed by linguist 5.13.2:
> <message>
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: ç°§</source>
>     <translation>utf8</translation>
> </message>
> 
> Reprocessed by pylupdate5:
> <message>
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: &#xe7;&#xb0;&#xa7;</source>
>     <translation type="obsolete">utf8</translation>
> </message>
> <message encoding="UTF-8">
>     <location filename="../translations_for_testsuite.py" line="6"/>
>     <source>this needs UTF8 encoding: ç°§</source>
>     <translation type="unfinished"></translation>
> </message>
> 
> 
> Section B - using tr() in code
> ===============================================================================
> Case 1 - NOT working
> 
> -------------------------------------------------------------------------------
> 
> tr()
> 
> # CODECFORSRC = UTF-8
> 
> # CODECFORTR = UTF-8
> 
> 
> 
> Message created:
> 
> <message>
> 
>     <location filename="../translations_for_testsuite.py" line="6"/>
> 
>     <source>this needs UTF8 encoding:
> &#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>
> 
>     <translation type="unfinished"></translation>
> 
> </message>
> 
> 
> 
> Repeated runs OK.
> 
> 
> 
> Linguist shows WRONG characters as the source is incorrectly formatted.
> 
> 
> 
> 
> 
> Case 2 - NOT working
> 
> -------------------------------------------------------------------------------
> 
> tr()
> 
> CODECFORSRC = UTF-8
> 
> # CODECFORTR = UTF-8
> 
> 
> 
> Message created the FIRST time and subsequent ODD runs
> 
> <message>
> 
>     <location filename="../translations_for_testsuite.py" line="6"/>
> 
>     <source>this needs UTF8 encoding: &#xe7;&#xb0;&#xa7;</source>
> 
>     <translation type="unfinished"></translation>
> 
> </message>
> 
> 
> 
> Message created the SECOND time and subsequent EVEN runs
> 
> <message>
> 
>     <location filename="../translations_for_testsuite.py" line="6"/>
> 
>     <source>this needs UTF8 encoding:
> &#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>
> 
>     <translation type="unfinished"></translation>
> 
> </message>
> 
> 
> 
> 
> 
> Case 3 - NOT working
> 
> -------------------------------------------------------------------------------
> 
> tr()
> 
> # CODECFORSRC = UTF-8
> 
> CODECFORTR = UTF-8
> 
> 
> 
> Message created:
> 
> <message>
> 
>     <location filename="../translations_for_testsuite.py" line="6"/>
> 
>     <source>this needs UTF8 encoding:
> &#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>
> 
>     <translation type="unfinished"></translation>
> 
> </message>
> 
> 
> 
> Linguist shows WRONG characters as the source is incorrectly formatted.
> 
> 
> 
> 
> 
> Case 4 - NOT working
> 
> -------------------------------------------------------------------------------
> 
> tr()
> 
> CODECFORSRC = UTF-8
> 
> CODECFORTR = UTF-8
> 
> 
> 
> Message created:
> 
> <message encoding="UTF-8">
> 
>     <location filename="../translations_for_testsuite.py" line="6"/>
> 
>     <source>this needs UTF8 encoding: ç°§</source>
> 
>     <translation type="unfinished"></translation>
> 
> </message>
> 
> 
> 
> Repeated pylupdate5 runs are OK, the same message is consistently 
> generated.
> 
> 
> 
> Processed by linguist 5.13.2:
> 
> <message>
> 
>     <location filename="../translations_for_testsuite.py" line="6"/>
> 
>     <source>this needs UTF8 encoding: ç°§</source>
> 
>     <translation>utf8</translation>
> 
> </message>
> 
> 
> 
> Reprocessed by pylupdate5:
> 
> <message>
> 
>     <location filename="../translations_for_testsuite.py" line="6"/>
> 
>     <source>this needs UTF8 encoding: &#xe7;&#xb0;&#xa7;</source>
> 
>     <translation>utf8</translation>
> 
> </message>
> 
> 
> 
> Reprocessed by pylupdate5 on subsequent runs:
> 
> <message>
> 
>     <location filename="../translations_for_testsuite.py" line="6"/>
> 
>     <source>this needs UTF8 encoding: &#xe7;&#xb0;&#xa7;</source>
> 
>     <translation type="obsolete">utf8</translation>
> 
> </message>
> 
> <message>
> 
>     <location filename="../translations_for_testsuite.py" line="6"/>
> 
>     <source>this needs UTF8 encoding:
> &#xc3;&#xa7;&#xc2;&#xb0;&#xc2;&#xa7;</source>
> 
>     <translation type="unfinished"></translation>
> 
> </message>
> 
> 
> 
> Those who survived until here must be brave.



More information about the PyQt mailing list