[QScintilla] Search Whole Word matches Only on regular expression symbol

Baz Walter bazwal at ftml.net
Sat Oct 12 23:29:20 BST 2013


On 12/10/13 11:52, Phil Thompson wrote:
> So I need to call SCI_SETWORDCHARS when a lexer is set using the value
> returned by the lexer's wordCharacters() method.
>
> Is this likely to cause any unforeseen problems?

As usual with Scintilla, the main source of potential problems is 
single-byte vs multi-byte encodings. For latin-1, any byte in the range 
0-255 can be set as a word character. But for utf-8, only the ascii 
range is relevant - all unicode characters above 127 are always treated 
as word characters, regardless of what has been set using SCI_SETWORDCHARS.

However, Scintilla's default set of word characters (i.e. those set via 
SCI_SETCHARSDEFAULT) includes the standard alphanumerics and underscore, 
*plus* all the characters in the range 128-255 (regardless of the 
code-page setting).

So, assuming the current lexer wordCharacters functions only ever return 
ascii, there is some potential for changes in behaviour if QScintilla is 
being used in *latin-1* mode (utf-8 mode should be unaffected).

The only other potential issue I can think of at the moment, is that 
setting the word characters automatically resets the whitespace and 
punctuation characters to their default values.

-- 
Regards
Baz Walter


More information about the QScintilla mailing list