Proof of Concept #3
1089
Cargo.lock
generated
13
Cargo.toml
|
@ -4,10 +4,21 @@ version = "0.0.0"
|
|||
license = "AGPL-3.0-only"
|
||||
edition = "2021"
|
||||
repository = "https://github.com/organicmaps/wikiparser/"
|
||||
|
||||
default-run = "om-wikiparser"
|
||||
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
|
||||
|
||||
[dependencies]
|
||||
anyhow = { version = "1.0.71", features = ["backtrace"] }
|
||||
clap = { version = "4.3.2", features = ["derive"] }
|
||||
env_logger = "0.10.0"
|
||||
log = "0.4.18"
|
||||
once_cell = "1.18.0"
|
||||
scraper = "0.16.0"
|
||||
serde = { version = "1.0.163", features = ["derive"] }
|
||||
serde_json = "1.0.96"
|
||||
url = "2.3.1"
|
||||
urlencoding = "2.1.2"
|
||||
|
||||
[profile.release]
|
||||
debug = true
|
||||
overflow-checks = true
|
||||
|
|
|
@ -1,3 +1,8 @@
|
|||
# wikiparser
|
||||
|
||||
_Extracts articles from [Wikipedia database dumps](https://en.wikipedia.org/wiki/Wikipedia:Database_download) for embedding into the `mwm` map files created by [the Organic Maps generator](https://github.com/organicmaps/organicmaps/blob/master/tools/python/maps_generator/README.md)._
|
||||
|
||||
## Usage
|
||||
|
||||
[`article_processing_config.json`](article_processing_config.json) should be updated when adding a new language.
|
||||
|
||||
It defines article sections that are not important for users and should be removed.
|
||||
|
|
44
article_processing_config.json
Normal file
|
@ -0,0 +1,44 @@
|
|||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
{
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"sections_to_remove": {
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"de": [
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Anmerkungen",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Anmerkungen und Einzelnachweise",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Einzelbelege",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Einzelnachweise",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Filme",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Literatur",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Siehe auch",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Weblinks"
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
],
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"en": [
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Bibliography",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"External links",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Further reading",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"References",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"See also",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Sources"
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
],
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"es": [
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Enlaces externos",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Referencias",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Véase también",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Vínculos de interés"
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
],
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"fr": [
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Articles connexes",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Bibliographie",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Lien externe",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Liens externes",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Notes et références",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Références",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Voir aussi"
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
],
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"ru": [
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Библиография",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Литература",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Примечания",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"См. также",
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
"Ссылки"
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
]
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
}
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
||||
}
|
||||
![]() Does it make sense to sort sections by name? Does it make sense to sort sections by name?
|
43
benches/id_parsing.rs
Normal file
|
@ -0,0 +1,43 @@
|
|||
#![feature(test)]
|
||||
use std::{collections::HashSet, str::FromStr};
|
||||
|
||||
extern crate om_wikiparser;
|
||||
extern crate test;
|
||||
|
||||
#[bench]
|
||||
fn parse_wikipedia(b: &mut test::Bencher) {
|
||||
b.iter(|| {
|
||||
let title = om_wikiparser::wm::WikipediaTitleNorm::from_url(
|
||||
"https://en.wikipedia.org/wiki/Article_Title",
|
||||
)
|
||||
.unwrap();
|
||||
});
|
||||
}
|
||||
|
||||
#[bench]
|
||||
fn hash_wikipedia(b: &mut test::Bencher) {
|
||||
let title = om_wikiparser::wm::WikipediaTitleNorm::from_url(
|
||||
"https://en.wikipedia.org/wiki/Article_Title",
|
||||
)
|
||||
.unwrap();
|
||||
let mut set = HashSet::new();
|
||||
b.iter(|| {
|
||||
set.insert(&title);
|
||||
});
|
||||
}
|
||||
|
||||
#[bench]
|
||||
fn parse_wikidata(b: &mut test::Bencher) {
|
||||
b.iter(|| {
|
||||
let qid = om_wikiparser::wm::WikidataQid::from_str("Q123456789").unwrap();
|
||||
});
|
||||
}
|
||||
|
||||
#[bench]
|
||||
fn hash_wikidata(b: &mut test::Bencher) {
|
||||
let qid = om_wikiparser::wm::WikidataQid::from_str("Q123456789").unwrap();
|
||||
let mut set = HashSet::new();
|
||||
b.iter(|| {
|
||||
set.insert(&qid);
|
||||
});
|
||||
}
|
18
src/bin/simplify_html.rs
Normal file
|
@ -0,0 +1,18 @@
|
|||
//! Apply html article simplification to stdin, and write it to stdout.
|
||||
//!
|
||||
//! Usage:
|
||||
//! simplify_html < article.html > simplified.html
|
||||
use std::io::{stdin, stdout, Read, Write};
|
||||
|
||||
use om_wikiparser::html::simplify;
|
||||
|
||||
fn main() -> anyhow::Result<()> {
|
||||
let mut input = String::new();
|
||||
stdin().read_to_string(&mut input)?;
|
||||
|
||||
let output = simplify(&input, "en");
|
||||
|
||||
stdout().write_all(output.as_bytes())?;
|
||||
|
||||
Ok(())
|
||||
}
|
92
src/html.rs
Normal file
|
@ -0,0 +1,92 @@
|
|||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
use std::collections::{BTreeMap, BTreeSet};
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
use once_cell::sync::Lazy;
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
use scraper::{ElementRef, Html, Selector};
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
use serde::Deserialize;
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
#[derive(Debug, Deserialize)]
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
struct Config<'a> {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
#[serde(borrow)]
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
sections_to_remove: BTreeMap<&'a str, BTreeSet<&'a str>>,
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
static CONFIG: Lazy<Config<'static>> = Lazy::new(|| {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
serde_json::from_str(include_str!(concat!(
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
env!("CARGO_MANIFEST_DIR"),
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
"/article_processing_config.json"
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
)))
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
.expect("\"article_processing_config.json\" is either invalid json or the wrong structure")
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
});
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
static HEADERS: Lazy<Selector> =
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
Lazy::new(|| Selector::parse("h1, h2, h3, h4, h5, h6, h7").unwrap());
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
pub fn simplify(html: &str, lang: &str) -> String {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
let mut document = Html::parse_document(html);
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
let mut to_remove = Vec::new();
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
// Remove configured sections and all trailing elements until next section.
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
if let Some(bad_sections) = CONFIG.sections_to_remove.get(lang) {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
for header in document.select(&HEADERS) {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
// TODO: Should this join all text nodes?
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
let Some(title) = header.text().next() else {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
continue
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
};
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
if bad_sections.contains(&title.trim()) {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
to_remove.push(header.id());
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
let header_level = header.value().name();
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
// Strip trailing nodes.
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
for sibling in header.next_siblings() {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
if let Some(element) = sibling.value().as_element() {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
if element.name() == header_level {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
// TODO: Should this check for a higher level?
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
break;
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
to_remove.push(sibling.id());
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
for id in to_remove.drain(..) {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
if let Some(mut node) = document.tree.get_mut(id) {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
node.detach();
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
} else {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
warn!("No sections to remove configured for lang {lang:?}");
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
// Remove elements with no text that isn't whitespace.
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
for element in document
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
.root_element()
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
.descendants()
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
.filter_map(ElementRef::wrap)
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
{
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
if element.text().all(|t| t.trim().is_empty()) {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
to_remove.push(element.id());
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
for id in to_remove.drain(..) {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
if let Some(mut node) = document.tree.get_mut(id) {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
node.detach();
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
![]() Can copy-paste be avoided? Can copy-paste be avoided?
![]() I'll be improving and refactoring this in the next PR, if this bit survives I'll move it to a function. I'll be improving and refactoring this in the next PR, if this bit survives I'll move it to a function.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
document.html()
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
#[cfg(test)]
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
mod test {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
![]() Is it hard to make a simple test for the function above? Is it hard to make a simple test for the function above?
![]() No, I just didn't bother since I'll be changing this in the next PR. I can add some if you'd like. No, I just didn't bother since I'll be changing this in the next PR. I can add some if you'd like.
|
||||
use super::*;
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
#[test]
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
fn static_config_parses() {
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
assert!(!CONFIG.sections_to_remove.is_empty());
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
||||
}
|
||||
![]() Is there a more robust way to exclude some sections for all languages? Is there a more robust way to exclude some sections for all languages?
![]() Do you mean moving this to a configuration file, or something that works independent of the language? Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names. We could collapse it into a single set and apply it to all languages. Do you mean moving this to a configuration file, or something that works independent of the language?
Trying to parse the template text could be language independent, but I think that would be less robust than checking the header names.
We could collapse it into a single set and apply it to all languages.
![]() No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code) No need to collapse, a config looks like a better option (keep adding many other languages in mind, and potential contributors who doesn't read rust code)
![]() Do you want to load it at compile time or runtime? Do you want to load it at compile time or runtime?
![]() I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change. I added a compile-time json config in 7e6b39a, adding in a flag for loading one at runtime is straightforward if we want to do that later. Using a different config language is also a quick change.
![]()
nit: Normal sentences are more readable in many cases. Here and in other places. ```suggestion
// Remove sections.
```
nit: Normal sentences are more readable in many cases. Here and in other places.
![]() What's needed to get right answers to these TODOs? What's needed to get right answers to these TODOs?
![]() Should title be trimmed? Should title be trimmed?
![]() I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size. I need to look through a good sample of the articles and check for things like formatting in headers, tags that are used, and which wikimedia meta tags are responsible for a third of the document size.
That's part of the next work.
|
5
src/lib.rs
Normal file
|
@ -0,0 +1,5 @@
|
|||
pub mod html;
|
||||
pub mod wm;
|
||||
|
||||
#[macro_use]
|
||||
extern crate log;
|
128
src/main.rs
|
@ -1,54 +1,118 @@
|
|||
// Usage:
|
||||
// pv ~/Downloads/enwiki-NS0-20230401-ENTERPRISE-HTML.json.tar.gz | tar xzO | cargo run --release > /dev/null
|
||||
// # prep outputs from map generator
|
||||
// cut -f 2 ~/Downloads/id_to_wikidata.csv > /tmp/wikidata_ids.txt
|
||||
// tail -n +2 ~/Downloads/wiki_urls.txt | cut -f 3 > /tmp/wikipedia_urls.txt
|
||||
// # feed gzipped tarfile
|
||||
// pv ~/Downloads/enwiki-NS0-20230401-ENTERPRISE-HTML.json.tar.gz | tar xzO \
|
||||
// | cargo run --release -- \
|
||||
// --wikidata-ids /tmp/wikidata_ids.txt \
|
||||
// --wikipedia-urls /tmp/wikipedia_urls.txt \
|
||||
// output_dir
|
||||
use std::{
|
||||
fs::{create_dir, File},
|
||||
io::{stdin, BufRead, Write},
|
||||
path::{Path, PathBuf},
|
||||
};
|
||||
|
||||
use serde::Deserialize;
|
||||
use std::io::{self, stdin, BufRead, BufReader, Write};
|
||||
use anyhow::bail;
|
||||
use clap::Parser;
|
||||
#[macro_use]
|
||||
extern crate log;
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct Page {
|
||||
// TODO: check if CoW has a performance impact
|
||||
name: String,
|
||||
date_modified: String,
|
||||
#[serde(default)]
|
||||
url: String,
|
||||
main_entity: Option<Wikidata>,
|
||||
// TODO: see what impact parsing/unescaping/allocating this has
|
||||
article_body: ArticleBody,
|
||||
#[serde(default)]
|
||||
redirects: Vec<Redirect>,
|
||||
use om_wikiparser::{
|
||||
html::simplify,
|
||||
wm::{is_wikidata_match, is_wikipedia_match, parse_wikidata_file, parse_wikipedia_file, Page},
|
||||
};
|
||||
|
||||
#[derive(Parser)]
|
||||
struct Args {
|
||||
output_dir: PathBuf,
|
||||
#[arg(long)]
|
||||
wikidata_ids: Option<PathBuf>,
|
||||
#[arg(long)]
|
||||
wikipedia_urls: Option<PathBuf>,
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct Wikidata {
|
||||
identifier: String,
|
||||
}
|
||||
fn write(dir: impl AsRef<Path>, page: Page) -> anyhow::Result<()> {
|
||||
let Some(qid) = page.main_entity.map(|e| e.identifier) else {
|
||||
// TODO: handle and still write
|
||||
bail!("Page in list but without wikidata qid: {:?} ({})", page.name, page.url);
|
||||
};
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct ArticleBody {
|
||||
html: String,
|
||||
}
|
||||
let mut filename = dir.as_ref().to_owned();
|
||||
filename.push(qid);
|
||||
filename.push(&page.in_language.identifier);
|
||||
filename.set_extension("html");
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct Redirect {
|
||||
url: String,
|
||||
name: String,
|
||||
debug!("{:?}: {:?}", page.name, filename);
|
||||
|
||||
if filename.exists() {
|
||||
debug!("Exists, skipping");
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let subfolder = filename.parent().unwrap();
|
||||
if !subfolder.exists() {
|
||||
create_dir(subfolder)?;
|
||||
}
|
||||
|
||||
let html = simplify(&page.article_body.html, &page.in_language.identifier);
|
||||
|
||||
let mut file = File::create(&filename)?;
|
||||
file.write_all(html.as_bytes())?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn main() -> anyhow::Result<()> {
|
||||
let dump = BufReader::new(stdin());
|
||||
env_logger::Builder::new()
|
||||
.filter_level(log::LevelFilter::Info)
|
||||
.parse_default_env()
|
||||
.try_init()?;
|
||||
|
||||
// TODO: compare different deserialization methods
|
||||
// docs warn against using a reader directly, and it's slower than tar can decompress the dump
|
||||
let args = Args::parse();
|
||||
|
||||
info!("Loading urls");
|
||||
let wikipedia_titles = args
|
||||
.wikipedia_urls
|
||||
.map(parse_wikipedia_file)
|
||||
.transpose()?
|
||||
.unwrap_or_default();
|
||||
|
||||
info!("Loading ids");
|
||||
let wikidata_ids = args
|
||||
.wikidata_ids
|
||||
.map(parse_wikidata_file)
|
||||
.transpose()?
|
||||
.unwrap_or_default();
|
||||
|
||||
if !args.output_dir.is_dir() {
|
||||
bail!("output dir {:?} does not exist", args.output_dir)
|
||||
}
|
||||
|
||||
info!("Processing dump");
|
||||
let dump = stdin().lock();
|
||||
|
||||
// TODO: Compare different deserialization methods.
|
||||
// The docs warn against using a reader directly, and it's slower than tar can decompress the dump.
|
||||
// let stream = serde_json::Deserializer::from_reader(dump).into_iter::<Page>();
|
||||
let stream = dump.lines().map(|r| {
|
||||
r.map_err(anyhow::Error::new)
|
||||
.and_then(|s| serde_json::from_str::<Page>(&s).map_err(anyhow::Error::new))
|
||||
});
|
||||
|
||||
let mut stdout = io::stdout();
|
||||
for page in stream {
|
||||
let page = page?;
|
||||
writeln!(stdout, "{}", page.name)?;
|
||||
|
||||
if !(is_wikidata_match(&wikidata_ids, &page).is_some()
|
||||
|| is_wikipedia_match(&wikipedia_titles, &page).is_some())
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
if let Err(e) = write(&args.output_dir, page) {
|
||||
error!("Error writing article: {}", e);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
|
|
205
src/wm/mod.rs
Normal file
|
@ -0,0 +1,205 @@
|
|||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
//! Wikimedia types
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
use std::{collections::HashSet, ffi::OsStr, fs, num::ParseIntError, str::FromStr};
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
use anyhow::{anyhow, bail, Context};
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
use url::Url;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
mod page;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
pub use page::Page;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// Read from a file of urls on each line.
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
pub fn parse_wikidata_file(path: impl AsRef<OsStr>) -> anyhow::Result<HashSet<WikidataQid>> {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let contents = fs::read_to_string(path.as_ref())?;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
contents
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.lines()
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.enumerate()
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.map(|(i, line)| {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
WikidataQid::from_str(line).with_context(|| {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let line_num = i + 1;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
format!("bad QID value on line {line_num}: {line:?}")
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
})
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
})
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.collect()
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// Read article titles from a file of urls on each line.
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
pub fn parse_wikipedia_file(
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
path: impl AsRef<OsStr>,
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
) -> anyhow::Result<HashSet<WikipediaTitleNorm>> {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let contents = fs::read_to_string(path.as_ref())?;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
contents
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.lines()
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.enumerate()
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.map(|(i, line)| {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
WikipediaTitleNorm::from_url(line).with_context(|| {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let line_num = i + 1;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
format!("bad wikipedia url on line {line_num}: {line:?}")
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
})
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
})
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.collect()
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
pub fn is_wikidata_match(ids: &HashSet<WikidataQid>, page: &Page) -> Option<WikidataQid> {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let Some(wikidata) = &page.main_entity else { return None;};
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let wikidata_id = &wikidata.identifier;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let wikidata_id = match WikidataQid::from_str(wikidata_id) {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
Ok(qid) => qid,
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
Err(e) => {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
warn!(
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
"Could not parse QID for {:?}: {:?}: {:#}",
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
page.name, wikidata_id, e
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
);
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
return None;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
};
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
ids.get(&wikidata_id).map(|_| wikidata_id)
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
pub fn is_wikipedia_match(
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
titles: &HashSet<WikipediaTitleNorm>,
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
page: &Page,
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
) -> Option<WikipediaTitleNorm> {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
match WikipediaTitleNorm::from_title(&page.name, &page.in_language.identifier) {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
Err(e) => warn!("Could not parse title for {:?}: {:#}", page.name, e),
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
Ok(title) => {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
if titles.get(&title).is_some() {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
return Some(title);
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
for redirect in &page.redirects {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
match WikipediaTitleNorm::from_title(&redirect.name, &page.in_language.identifier) {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
Err(e) => warn!(
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
"Could not parse redirect title for {:?}: {:?}: {:#}",
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
page.name, redirect.name, e
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
),
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
Ok(title) => {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
if titles.get(&title).is_some() {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
return Some(title);
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
None
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// Wikidata QID/Q Number
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
///
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// See https://www.wikidata.org/wiki/Wikidata:Glossary#QID
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
///
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// ```
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// use std::str::FromStr;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// use om_wikiparser::wm::WikidataQid;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
///
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// let with_q = WikidataQid::from_str("Q12345").unwrap();
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// let without_q = WikidataQid::from_str(" 12345 ").unwrap();
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// assert_eq!(with_q, without_q);
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
![]() Does it make sense to test it? Does it make sense to test it?
![]() Same as above. Same as [above](https://github.com/organicmaps/wikiparser/pull/3#discussion_r1239061793).
|
||||
///
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// assert!(WikidataQid::from_str("q12345").is_ok());
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// assert!(WikidataQid::from_str("https://wikidata.org/wiki/Q12345").is_err());
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// assert!(WikidataQid::from_str("Article_Title").is_err());
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// assert!(WikidataQid::from_str("Q").is_err());
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// assert!(WikidataQid::from_str("").is_err());
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// ```
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
#[derive(Debug, PartialOrd, Ord, PartialEq, Eq, Hash)]
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
pub struct WikidataQid(u32);
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
impl FromStr for WikidataQid {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
type Err = ParseIntError;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
fn from_str(s: &str) -> Result<Self, Self::Err> {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let s = s.trim();
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let s = s.strip_prefix(['Q', 'q']).unwrap_or(s);
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
u32::from_str(s).map(WikidataQid)
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// Normalized wikipedia article title that can compare:
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// - titles `Spatial Database`
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// - urls `https://en.wikipedia.org/wiki/Spatial_database#Geodatabase`
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// - osm-style tags `en:Spatial Database`
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
///
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// ```
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// use om_wikiparser::wm::WikipediaTitleNorm;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
///
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// let title = WikipediaTitleNorm::from_title("Article Title", "en").unwrap();
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// let url = WikipediaTitleNorm::from_url("https://en.wikipedia.org/wiki/Article_Title#Section").unwrap();
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// assert_eq!(url, title);
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
![]() Does it make sense to test it? Does it make sense to test it?
![]() I added some checks for whitespace, empty strings, and tests for errors in 70f7edf, is there something else you think should be handled? I added some checks for whitespace, empty strings, and tests for errors in 70f7edf, is there something else you think should be handled?
|
||||
///
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// assert!(WikipediaTitleNorm::from_url("https://en.wikipedia.org/not_a_wiki_page").is_err());
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// assert!(WikipediaTitleNorm::from_url("https://wikidata.org/wiki/Q12345").is_err());
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
/// ```
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
#[derive(Debug, PartialOrd, Ord, PartialEq, Eq, Hash)]
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
pub struct WikipediaTitleNorm {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
lang: String,
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
name: String,
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
impl WikipediaTitleNorm {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
fn normalize_title(title: &str) -> String {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
// TODO: Compare with map generator url creation, ensure covers all cases.
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
title.trim().replace(' ', "_")
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
// https://en.wikipedia.org/wiki/Article_Title
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
pub fn from_url(url: &str) -> anyhow::Result<Self> {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let url = Url::parse(url.trim())?;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let (subdomain, host) = url
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.host_str()
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.ok_or_else(|| anyhow!("Expected host"))?
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.split_once('.')
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.ok_or_else(|| anyhow!("Expected subdomain"))?;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
if host != "wikipedia.org" {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
bail!("Expected wikipedia.org for domain")
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let lang = subdomain;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let mut paths = url
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.path_segments()
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.ok_or_else(|| anyhow!("Expected path"))?;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let root = paths
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.next()
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.ok_or_else(|| anyhow!("Expected first segment in path"))?;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
if root != "wiki" {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
bail!("Expected 'wiki' in path")
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let title = paths
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.next()
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.ok_or_else(|| anyhow!("Expected second segment in path"))?;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let title = urlencoding::decode(title)?;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
Self::from_title(&title, lang)
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
// en:Article Title
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
fn _from_osm_tag(tag: &str) -> anyhow::Result<Self> {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let (lang, title) = tag
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.trim()
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.split_once(':')
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
.ok_or_else(|| anyhow!("Expected ':'"))?;
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
Self::from_title(title, lang)
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
pub fn from_title(title: &str, lang: &str) -> anyhow::Result<Self> {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let title = title.trim();
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let lang = lang.trim();
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
if title.is_empty() {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
bail!("title cannot be empty or whitespace");
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
if lang.is_empty() {
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
bail!("lang cannot be empty or whitespace");
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let name = Self::normalize_title(title);
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
let lang = lang.to_owned();
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
Ok(Self { name, lang })
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
||||
}
|
||||
![]()
1. Is it in English only now?
2. Does it make sense to test this function?
![]() Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a Running the program multiple times with different language dumps will fill in the various
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases. Oops, that comment slipped through when I handled multiple languages in 6e5385d. I'll remove it.
> 1. Is it in English only now?
Currently it will work with any language, but it only processes a single dump at a time. So when it reads an english dump, each article json has a `.in_language.identifier` field set to `en`, and the program writes that html to a `QXXXXX/en.html` file.
Running the program multiple times with different language dumps will fill in the various `QXXXXX/$lang.html` files.
We could extend it to process multiple dumps in parallel, but I don't expect there to be much of a speedup right now.
> 2. Does it make sense to test this function?
The doctest on the type definition verifies that the two constructors parse and normalize correctly, but doesn't check for various error cases.
Do you think there should be more?
![]()
1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
![]()
I haven't tried it out yet, but there are a couple of options that come to mind:
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs. > 1. Does it imply a wrapping script that launches the app for each language/file? That is ok (and can be paralleled in bash by launching several processes simultaneously), but should be documented. Or is there a better approach?
I haven't tried it out yet, but there are a couple of options that come to mind:
- Decompress the archives serially so they're all concatenated together into stdin. This looks possible with `gunzip`/`tar`, not sure about python `pgzip`.
- As you say, run the program repeatedly through a wrapper script, using a for loop, `xargs`, `parallel`, etc.
- Pass the decompression command to the program and have it spawn the subprocess directly, it could do this in parallel and pass the results to the same worker pool.
> 2. It may make sense to check values that are in OSM. Users can make a lot of mistakes.
Understood. I have been using the world list of urls/ids from the map generator with no problems, but if we switch to using OSM data directly I'll rethink this. The program will log any issues it has parsing titles/QIDs.
![]() It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later. It should be ok to run tasks in parallel using bash for loop. We'll rethink it if necessary later.
|
45
src/wm/page.rs
Normal file
|
@ -0,0 +1,45 @@
|
|||
use serde::Deserialize;
|
||||
|
||||
// TODO: consolidate into single struct
|
||||
/// Deserialized Wikimedia Enterprise API Article
|
||||
///
|
||||
/// For all available fields, see <https://enterprise.wikimedia.com/docs/data-dictionary/>.
|
||||
#[allow(dead_code)] // TODO: reevaluate fields
|
||||
#[derive(Deserialize)]
|
||||
pub struct Page {
|
||||
// TODO: Check if CoW has a performance impact.
|
||||
pub name: String,
|
||||
pub date_modified: String,
|
||||
pub in_language: Language,
|
||||
#[serde(default)]
|
||||
pub url: String,
|
||||
pub main_entity: Option<Wikidata>,
|
||||
// TODO: See what impact parsing/unescaping/allocating this has.
|
||||
pub article_body: ArticleBody,
|
||||
#[serde(default)]
|
||||
pub redirects: Vec<Redirect>,
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
pub struct Wikidata {
|
||||
pub identifier: String,
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
pub struct ArticleBody {
|
||||
// TODO: Look into RawValue to lazily parse/allocate this:
|
||||
// https://docs.rs/serde_json/latest/serde_json/value/struct.RawValue.html
|
||||
pub html: String,
|
||||
}
|
||||
|
||||
#[allow(dead_code)] // TODO: Reevaluate fields.
|
||||
#[derive(Deserialize)]
|
||||
pub struct Redirect {
|
||||
pub url: String,
|
||||
pub name: String,
|
||||
}
|
||||
|
||||
#[derive(Deserialize)]
|
||||
pub struct Language {
|
||||
pub identifier: String,
|
||||
}
|
... It defines the article's sections that are not important for users and should be removed.