Preserve whitespace of removed "empty" elements
Some articles use non-breaking spaces between quantities and units, which Wikipedia seems to wrap with a span. Elements with no or whitespace-only text were previously removed to prune `<link>`s and parents of other removed elements. This fix preserves the internal whitespace of elements that would otherwise be removed for being "empty". It does not distinguish between "meaningful" whitespace and padding between elements that would be collapsed by HTML formatting rules. It also cannot distinguish between elements that _started_ with only whitespace and nodes that now contain only whitespace after previous steps. The preserved whitespace in the latter case is unlikely to remain because of later processing steps. Fixes #47, fixes organicmaps/organicmaps#8651 Signed-off-by: Evan Lloyd New-Schmidt <evan@new-schmidt.com>
This commit is contained in:
parent
3579410659
commit
bab29c0de9
2 changed files with 7 additions and 4 deletions
|
@ -202,7 +202,7 @@ pub fn simplify(document: &mut Html, lang: &str) {
|
|||
|
||||
remove_empty_sections(document);
|
||||
|
||||
remove_empty(document);
|
||||
expand_empty(document);
|
||||
|
||||
remove_non_element_nodes(document);
|
||||
|
||||
|
@ -305,7 +305,8 @@ fn remove_toplevel_whitespace(document: &mut Html) {
|
|||
remove_ids(document, to_remove.drain(..));
|
||||
}
|
||||
|
||||
fn remove_empty(document: &mut Html) {
|
||||
/// Expand elements that contain no text or only whitespace, leaving only their contents.
|
||||
fn expand_empty(document: &mut Html) {
|
||||
let mut to_remove = Vec::new();
|
||||
|
||||
for el in document
|
||||
|
@ -318,7 +319,9 @@ fn remove_empty(document: &mut Html) {
|
|||
}
|
||||
}
|
||||
|
||||
remove_ids(document, to_remove.drain(..));
|
||||
for id in to_remove.drain(..) {
|
||||
expand_id(document, id);
|
||||
}
|
||||
}
|
||||
|
||||
fn remove_empty_sections(document: &mut Html) {
|
||||
|
|
|
@ -7,7 +7,7 @@
|
|||
<li>Chatyr-Dag yayla</li>
|
||||
<li>Dologorukovskaya (Subatkan) yayla</li>
|
||||
<li>Demirci yayla</li>
|
||||
<li>Qarabiy yayla</li></ul><h2>Highest peaks</h2><p>The Crimea's highest peak is the Roman-Kosh (Ukrainian: <span lang="uk">Роман-Кош</span>; Russian: <span lang="ru">Роман-Кош</span>, Crimean Tatar: <span lang="crh">Roman Qoş</span>) on the Babugan Yayla at 1,545 metres (5,069ft). Other important peaks over 1,200 metres include:</p><ul><li>Demir-Kapu (Ukrainian: <span lang="uk">Демір-Капу</span>, Russian: <span lang="ru">Демир-Капу</span>, Crimean Tatar: <span lang="crh">Demir Qapı</span>) 1,540 m in the Babugan Yayla;</li>
|
||||
<li>Qarabiy yayla</li></ul><h2>Highest peaks</h2><p>The Crimea's highest peak is the Roman-Kosh (Ukrainian: <span lang="uk">Роман-Кош</span>; Russian: <span lang="ru">Роман-Кош</span>, Crimean Tatar: <span lang="crh">Roman Qoş</span>) on the Babugan Yayla at 1,545 metres (5,069 ft). Other important peaks over 1,200 metres include:</p><ul><li>Demir-Kapu (Ukrainian: <span lang="uk">Демір-Капу</span>, Russian: <span lang="ru">Демир-Капу</span>, Crimean Tatar: <span lang="crh">Demir Qapı</span>) 1,540 m in the Babugan Yayla;</li>
|
||||
<li>Zeytin-Kosh (Ukrainian: <span lang="uk">Зейтин-Кош</span>; Russian: <span lang="ru">Зейтин-Кош</span>, Crimean Tatar: <span lang="crh">Zeytün Qoş</span>) 1,537 m in the Babugan Yayla;</li>
|
||||
<li>Kemal-Egerek (Ukrainian: <span lang="uk">Кемаль-Егерек</span>, Russian: <span lang="ru">Кемаль-Эгерек</span>, Crimean Tatar: <span lang="crh">Kemal Egerek</span>) 1,529 m in the Babugan Yayla;</li>
|
||||
<li>Eklizi-Burun (Ukrainian: <span lang="uk">Еклізі-Бурун</span>, Russian: <span lang="ru">Эклизи-Бурун</span>, Crimean Tatar: <span lang="crh">Eklizi Burun</span>) 1,527 m in the Chatyrdag Yayla;</li>
|
||||
|
|
Loading…
Add table
Reference in a new issue