Investigate escaping in article titles and urls #7
Labels
No labels
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: organicmaps/wikiparser#7
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Wikipedia articles can contain slashes (
/
). Wikipedia accepts them in urls escaped or not, e.g.https://en.wikipedia.org/wiki/Baltimore%2FWashington_International_Airport
and
https://en.wikipedia.org/wiki/Baltimore/Washington_International_Airport
return the same page, and neither redirects to the other.
The generator attempts to decode urls from OSM tags, and then encodes '%' again when it converts them back into urls.
My guess is that some of the tags that are not urls still have url encoding in them, but determining which are actually url-encoded and which just have
%
in them is a little tricky, and the generator doesn't do that.It looks like some of the resulting urls are encoded twice, thankfully a small number:
Of those, all except the three below are malformed:
Some seem to be arbitrary character data, for example:
https://sv.wikipedia.org/wiki/Kanngjutarm%25C3%25A4starens_hus
with the extra escaped
%25
s removed becomes:https://sv.wikipedia.org/wiki/Kanngjutarm%C3%A4starens_hus
which the browser converts to:
https://sv.wikipedia.org/wiki/Kanngjutarmästarens_hus