[Eric] Issue with non-english text in Django plugin

detlev detlev at die-offenbachs.de
Sun Jun 20 13:40:27 BST 2010


Hi,

please change line 215 of LexerDjango.py from

for token, txt in self.__lexer.get_tokens(unicode(self.editor.text())[:end + 
1]:

to

for token, txt in self.__lexer.get_tokens(unicode(self.editor.text())[:end + 
1].encode("utf-8")):

Please report back.

Detlev


On Sonntag, 20. Juni 2010, Игорь Митренко wrote:
> > There might be two different issues.
> >
> > 1. The encoding was set to utf-8-default, which means, no suitable
> > encoding was detected and eric simply chose the default setting. My
> > question is, was enhanced encoding detection activated on the
> > Editor->Filehandling page of the config dialog? What is the correct
> > encoding of the file? Would you please send it.
> >
> > 2. The styling may be wrong. Please check, if selecting "Alternative:
> > Django/Jinja" as the lexer language (via edittor context menu) gives a
> > correct highlighting.
> >
> > Regards,
> > Detlev
> >
> > 1. Correct file encoding is utf-8 (system-wide locale), on Filehandling
> 
> page also set utf-8, moreover, I've tried to turn off encoding detection
> with no result. There is my file http://www.mediafire.com/?fdytlqdb51z
> 
> 2. Because I don't have the Jinja plugin installed, I've tried another
> lexers, such as HTML/PHP, Python and others - all of them works fine with
> the same file(s) (I mean there was no pretty django highlighting, but
>  string mangling has gone too).
> 
> 3. Finally, I'd like you to pay attention to my previous messages - the
> problem seems to be gone if I use the str() instead of the unicode()
> function. According to the
> http://boodebr.org/main/python/all-about-python-and-unicode , Python may
> return wrong values on len function when you call it for unicode
>  strings.Now I see that str() works properly only due to it encodes unicode
>  to ascii. So it likely won't work with, for example, japanese locale,
>  which has no ascii implementation.
> Look at short demo I prepared
> http://img571.imageshack.us/img571/4782/snapshot2g.png (I show only first
> tag "block", text bellow is unimportant) :
> 
>    1. Everything works great until I use russain.
>    2. For example, english works fine.
>    3. I wrote 1 russian symbol. In this moment lexer highlighted closing
>  tag not fully, note that exactly one symbol highlighted wrong.
>    4. I wrote second symbol and you can see that the lexer now not
>    highlighted two symbols.
>    5. Each added russian symbol cause lexer "forget" to highlight one more
>    symbol in closing tag.
> 
> So, here is my explanation:
> One russian letter in my case takes two bytes.
> len(unicode(one_russian_letter)) returns, as expected, 1. But lexer,
> obviously, assumed that one letter takes one byte - here we get one-byte
> shift and symbol corruption (did you note that lexer corrupts only odd
> number of symbols?), so lexer badly interprets length of non-english
> strings. And if I replace unicode() with str(), strings encodes to one-byte
> ascii and lexer works fine.
> So, here comes two conclusions:
> 
>    1. Eric's lexer subsystyem works with strings as common ascii strings,
>    not unicode.
>    2. Other lexers (Python, HTML/PHP) works fine, cause they convert
>  strings to ascii.
> 
> Please, fix me if I'm wrong.
> 
> P.S.: absolutely the same happens if I don't replace unicode() with str()
> and add .encode() to it in styleText method, so it looks now like "for
> token, txt in
>  self.__lexer.get_tokens(unicode(self.editor.text())*.encode()*[:end +
>  1]):" (w/o quotes)
> Also excuse me for my not so good english and my manner to write much
>  boring text.
> 


-- 
Detlev Offenbach
detlev at die-offenbachs.de


More information about the Eric mailing list