Enhancement: Arabic Search Normalization (Diacritics and Character Variants) #10483

Open
opened 2025-03-13 03:11:06 +00:00 by abdullahO2 · 0 comments
abdullahO2 commented 2025-03-13 03:11:06 +00:00 (Migrated from github.com)

Hello Organic Maps developers,

I'm writing to request an enhancement to the Arabic search functionality within Organic Maps. Currently, searching for Arabic place names is unreliable if the search term doesn't exactly match the diacritics (tashkeel), hamza variations (أ، إ، ئ، ؤ), and other orthographic details present in the underlying map data. This is a common problem, as many users omit these markings when searching.

OsmAnd recently addressed a similar issue by implementing normalization during both indexing and searching, as discussed here: [Enhanced Arabic Search Functionality: Addressing Diacritics, Character Variants, and Other Search-Related Improvements]. This significantly improves the accuracy and usability of Arabic search.

The solution involves normalizing both the indexed place names and the user's search query. A Java example of such a normalization function is shown below (this could be adapted to C++ or whichever language Organic Maps uses):

import java.text.Normalizer;
import java.util.regex.Pattern;

public class ArabicNormalizer {

    private static final Pattern DIACRITICS_PATTERN = Pattern.compile("\\p{Mn}");

    public static String normalizeArabic(String text) {
        if (text == null) {
            return null;
        }

        String normalized = Normalizer.normalize(text, Normalizer.Form.NFD);
        normalized = DIACRITICS_PATTERN.matcher(normalized).replaceAll(""); // Remove diacritics

        // Hamza variations
        normalized = normalized.replace("إ", "ا");
        normalized = normalized.replace("أ", "ا");
        normalized = normalized.replace("ئ", "ي");
        normalized = normalized.replace("ؤ", "و");

        // Other normalizations
        normalized = normalized.replace("آ", "ا");
        normalized = normalized.replace("ى", "ي");
        normalized = normalized.replace("ة", "ه");

        normalized = normalized.trim().replaceAll("\\s+", " "); // Normalize whitespace

        return normalized;
    }
}
Hello Organic Maps developers, I'm writing to request an enhancement to the Arabic search functionality within Organic Maps. Currently, searching for Arabic place names is unreliable if the search term doesn't exactly match the diacritics (tashkeel), hamza variations (أ، إ، ئ، ؤ), and other orthographic details present in the underlying map data. This is a common problem, as many users omit these markings when searching. OsmAnd recently addressed a similar issue by implementing normalization during both indexing and searching, as discussed here: [[Enhanced Arabic Search Functionality: Addressing Diacritics, Character Variants, and Other Search-Related Improvements](https://github.com/osmandapp/OsmAnd/issues/11709)]. This significantly improves the accuracy and usability of Arabic search. The solution involves normalizing both the indexed place names and the user's search query. A Java example of such a normalization function is shown below (this could be adapted to C++ or whichever language Organic Maps uses): ```java import java.text.Normalizer; import java.util.regex.Pattern; public class ArabicNormalizer { private static final Pattern DIACRITICS_PATTERN = Pattern.compile("\\p{Mn}"); public static String normalizeArabic(String text) { if (text == null) { return null; } String normalized = Normalizer.normalize(text, Normalizer.Form.NFD); normalized = DIACRITICS_PATTERN.matcher(normalized).replaceAll(""); // Remove diacritics // Hamza variations normalized = normalized.replace("إ", "ا"); normalized = normalized.replace("أ", "ا"); normalized = normalized.replace("ئ", "ي"); normalized = normalized.replace("ؤ", "و"); // Other normalizations normalized = normalized.replace("آ", "ا"); normalized = normalized.replace("ى", "ي"); normalized = normalized.replace("ة", "ه"); normalized = normalized.trim().replaceAll("\\s+", " "); // Normalize whitespace return normalized; } }
This repo is archived. You cannot comment on issues.
No labels
Accessibility
Accessibility
Address
Address
Android
Android
Android Auto
Android Auto
Android Automotive (AAOS)
Android Automotive (AAOS)
API
API
AppGallery
AppGallery
AppStore
AppStore
Battery and Performance
Battery and Performance
Blocker
Blocker
Bookmarks and Tracks
Bookmarks and Tracks
Borders
Borders
Bug
Bug
Build
Build
CarPlay
CarPlay
Classificator
Classificator
Community
Community
Core
Core
CrashReports
CrashReports
Cycling
Cycling
Desktop
Desktop
DevEx
DevEx
DevOps
DevOps
dev_sandbox
dev_sandbox
Directions
Directions
Documentation
Documentation
Downloader
Downloader
Drape
Drape
Driving
Driving
Duplicate
Duplicate
Editor
Editor
Elevation
Elevation
Enhancement
Enhancement
Epic
Epic
External Map Datasets
External Map Datasets
F-Droid
F-Droid
Fonts
Fonts
Frequently User Reported
Frequently User Reported
Fund
Fund
Generator
Generator
Good first issue
Good first issue
Google Play
Google Play
GPS
GPS
GSoC
GSoC
iCloud
iCloud
Icons
Icons
iOS
iOS
Legal
Legal
Linux Desktop
Linux Desktop
Linux packaging
Linux packaging
Linux Phone
Linux Phone
Mac OS
Mac OS
Map Data
Map Data
Metro
Metro
Navigation
Navigation
Need Feedback
Need Feedback
Night Mode
Night Mode
NLnet 2024-06-281
NLnet 2024-06-281
No Feature Parity
No Feature Parity
Opening Hours
Opening Hours
Outdoors
Outdoors
POI Info
POI Info
Privacy
Privacy
Public Transport
Public Transport
Raw Idea
Raw Idea
Refactoring
Refactoring
Regional
Regional
Regression
Regression
Releases
Releases
RoboTest
RoboTest
Route Planning
Route Planning
Routing
Routing
Ruler
Ruler
Search
Search
Security
Security
Styles
Styles
Tests
Tests
Track Recording
Track Recording
Translations
Translations
TTS
TTS
UI
UI
UX
UX
Walk Navigation
Walk Navigation
Watches
Watches
Web
Web
Wikipedia
Wikipedia
Windows
Windows
Won't fix
Won't fix
World Map
World Map
No milestone
No project
No assignees
1 participant
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: organicmaps/organicmaps-tmp#10483
No description provided.