Preserve whitespace of removed "empty" elements #48
No reviewers
Labels
No labels
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: organicmaps/wikiparser#48
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "nbsp"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The first commit removes the pretty-printing from the test examples and adds a lot of noise to the diff.
Some articles use non-breaking spaces between quantities and units, which Wikipedia seems to wrap with a span. Elements with no or whitespace-only text were previously removed to prune
<link>
s and parents of other removed elements.This fix preserves the internal whitespace of elements that would other wise be removed for being "empty". It does not distinguish between "meaningful" whitespace and padding between elements that would otherwise be collapsed by HTML formatting rules. It also cannot distinguish between elements that started with only whitespace and nodes that now contain only whitespace after previous steps. The preserved whitespace in the latter case is unlikely to remain because of later processing steps.
Fixes #47, fixes organicmaps/organicmaps#8651
Thanks!
Is there any minification used later, to remove unnecessary line endings for final HTML pages?
Would it save a bit more space if the nbsp were encoded directly as
, instead of
?It would, but we don't control that part of the writing.
html5ever
converts the literal to the escaped version, I assume because it is part of the serialization spec.It's possible to write another
Serializer
like the pretty-printer that minifies instead, but I haven't figured out the whitespace collapsing rules enough to write one. There aren't any crates that implement anhtml5ever::Serializer
minifier, so adding an external minifier would need to re-parse the html.See above - we don't do a proper minification step, so whitespace within elements is left in the output.