[PyQt] Search method for Arabic text

Maziar Parsijani maziar.parsijani at gmail.com
Tue Aug 28 08:57:06 BST 2018


Hi Zachary Scheuren
Thanks a lot for your answer .The reason I have Email PYQT is that I use
its widgets and I forgot to say that I use Qtextedit.And for more detail I
can say that I found that pyarabic library before and as you said I can
remove marks with its strip_tashkeel(text) function ,But if you take a look
at below example I think you will find what I want to do.And I can refer
you to http://tanzil.net here if you search for "السماء" it will find and
highlight "ٱلسَّمَآءِ" I want to know if its possible in python?I use
sqlite database and I could find the results like this but I can not
highlight them.
Example :
search for :" السماء "
but I want to show Quranuthmani and find :" ٱلسَّمَآءِ" in it and
highlight them.
I can find them with no problem cause of using sqlite database table with
different Quran text But the problem is highlighting them cause of using
regex
and " السماء " ," ٱلسَّمَآءِ" are not the same so I can not highlight them.
Please accept my apologizes for asking my question before your permission .

On Mon, Aug 27, 2018 at 10:14 AM Zachary Scheuren <angryjaga at gmail.com>
wrote:

> This isn't really a PyQt question. You can do all that in basic Python,
> but it can help if you have something like the pyarabic library. With that
> you can strip out the vocalization before comparing strings. You also need
> to consider all the possible Alefs like in str1 you have Alef with Wasla,
> but str2 only has Alef. pyarabic can also help there with araby.ALEFAT
> which is a list of all possible Alefs with marks. You need to manually
> check that because Alef with Wasla has no Unicode decomposition and the
> wasla isn't encoded as a separate mark. There have been Unicode proposals
> for that, but nothing has happened so far. Anyway, I did a quick little
> test with your strings...
>
> import re
> from pyarabic import araby
> str3_nomarks = araby.separate(str3)[0]  # strips all diacritics
> for c in araby.ALEFAT:  # replace any Alef with a mark by base Alef
>     str3_nomarks = str3_nomarks.replace(c, araby.ALEF)
>
> re.findall(str2, str3_nomarks)
>
> Something like that will get you matches, but if you need to track the
> position in a string you'll have to do some more work since dropping the
> diacritics will throw off the index.
>
>
>
> On Wed, Aug 22, 2018 at 12:43 AM, Maziar Parsijani <
> maziar.parsijani at gmail.com> wrote:
>
>> Hi
>> I have some Arabic strings in mt database now I want to if I search like
>> this :
>>
>>   str1 = "ٱلْمُفْلِحُونَ"
>>   str2 = "المفلحون"
>> as you can see str1 is the same as str2 but in Arabic text str1 has more
>> alphabetical characters.
>> Is there anyway to search str2 but I could find both of them in a string
>> like :
>>  str3 = " المفلحون ٱلْمُفْلِحُونَ ٱلنَّاسُ المفلحون ٱلْمُفْلِحُونَ
>> المفلحون ٱلنَّاسُ المفلحون ٱلنَّاسُ "
>>
>> _______________________________________________
>> PyQt mailing list    PyQt at riverbankcomputing.com
>> https://www.riverbankcomputing.com/mailman/listinfo/pyqt
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.riverbankcomputing.com/pipermail/pyqt/attachments/20180828/56f78a6c/attachment.html>


More information about the PyQt mailing list