[PyQt] UnicodeDecodeError with output from Windows OS command

Thu Nov 30 22:59:20 GMT 2017

(I assume you accidentally removed the list from the reply, so I re-added it)

On Thu, Nov 30, 2017 at 10:28:58PM +0000, J Barchan wrote:
> Did I mention that I have learnt now what output is causing the error?  It
> turns out it's if robocopy --- which simply reports filenames as it goes
> --- encounters a filename with a "£" (UK pound sterling) character in its
> name.  It would happen just as much if the command were, say, "dir" instead
> of "robocopy".  That pound character is a single byte of 0x9c in the
> output, which decode('utf-8') is barfing it.

Right, because it is not output encoded as utf-8, so utf-8 is the wrong
encoding to decode it with ;-)

> In extensive investigations, all I came across was
> https://riverbankcomputing.com/pipermail/pyqt/2010-January/025564.html:
> 
> > >> > really often I have this kind of code in my application when it comes
> > >> > to
> > >> > converting a QByteArray to s string under Python 3.1.
> > >> >
> > >> > s = bytes(QByteArray).decode()
> > >> >
> > >> > In Python 2 one could use
> > >> >
> > >> > s = unicode(QByteArray)
> > >> >
> > >> > to get the same result. Did I miss something or could QByteArray get
> a
> > >> > decode() method to make it similar to a Python3 bytes or bytearray
> > >> > type?
> > >>
> > >> The Python3 way to do it is...
> > >>
> > >> s = str(QByteArray, encoding='ascii')
> > >>
> > >> ...or whatever encoding is used.
> > >>
> > >> It would be possible to change things so that...
> > >>
> > >> s = str(QByteArray)
> > >>
> > >> ...automatically uses the default encoding. However that would then
> make
> > >> it
> > >> inconsistent with the behaviour of...
> > >>
> > >> s = str(bytes)
> > >>
> > >> ...and I'm not sure that that is a good idea.
> 
> See, that guy is saying in Python 3 "*...or whatever encoding is used*.".
> But in Python 2 he says it "*automatically uses the default encoding*".

That person seems to be talking about ascii encoding only, which is the most
simple case.

> *I just want the Python 2 behaviour, what was the "default encoding" used
> there, which I meant I didn't have to explicitize one? *  Python 3
> equivalent.

It's picking ascii and erroring out if that doesn't work, which is basically
what you're seeing ;-)

https://docs.python.org/2/howto/unicode.html#the-unicode-type

    The first argument is converted to Unicode using the specified encoding; if
    you leave off the encoding argument, the ASCII encoding is used for the
    conversion, so characters greater than 127 will be treated as errors: [...]

> Did it just use bytearray.decode('utf-8', 'replace') like you say for the
> C++?

No, the equivalent of bytearray.decode('ascii', 'error').

What you probably want in this case is:

   bytearray.decode(sys.getfilesystemencoding())

Assuming that robocopy doesn't somehow re-encode the filenames it gets.

Florian

-- 
https://www.qutebrowser.org  | me at the-compiler.org (Mail/XMPP)
   GPG: 916E B0C8 FD55 A072  | https://the-compiler.org/pubkey.asc
         I love long mails!  | https://email.is-not-s.ms/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://www.riverbankcomputing.com/pipermail/pyqt/attachments/20171130/34bb3ef3/attachment.sig>