Get all translations for articles matched by title #15
Labels
No labels
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
wontfix
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: organicmaps/wikiparser#15
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Currently the program checks for matches against the list of article titles and wikidata QIDs.
The QIDs are language agnostic, so all translations of them will be picked up.
For titles however, there's no way to figure out if an article is the translation of another title in the list, so only the article in the title's language is matched on.
Example
For the Eiffel Tower, if OSM doesn't have a
wikidata=
tag, onlywikipedia=fr:Tour Eiffel
, we don't know to extracten:Eiffel Tower
orru:Эйфелева башня
until we process the page in thefr
dump and get its wikidata QID.At the same time there will be russian-only tags that need to be mapped to other languages, but can't be resolved until we process the
ru
dump.For objects with a
wikidata=
tag this is not a problem, and there arewikipedia:lang=
tags, but the generator needs to be updated to handle those and not every OSM object has all of the tags.Solution
A complete mapping from title to QID would need to include all titles and redirects in each supported language.
We can build that by scanning through all the dumps initially, or by parsing some smaller dumps of redirects and QIDs, by using or doing something similar to this wikimapper project.
Some options to resolve the problem:
I think writing the missed QIDs out after the first scan is a good first step, if doing two passes increases runtime too much we can investigate the smaller dump option.