- Write a TSV file with the line number, error, and input text.
- Include OSM object id if available in tag file.
- Update run script to write file once before extracting.
Signed-off-by: Evan Lloyd New-Schmidt <evan@new-schmidt.com>
- Downloads latest enterprise dumps in requested languages
- Uses parallel downloading with wget2 if available
- Dumps are stored in subdirectories by date
Signed-off-by: Evan Lloyd New-Schmidt <evan@new-schmidt.com>
- Use rayon and osmpbf crates, output intermediate TSV file in the same
format as osmconvert, for use with the new `--osm-tags` flag.
- Number of threads spawned can be configured with `--procs` flag.
- Replace all wikidata id references with QID.
- Update script and documentation to use new subcommands.
- run.sh now expects a pbf file to extract tags from.
Signed-off-by: Evan Lloyd New-Schmidt <evan@new-schmidt.com>
This allows us to extract articles that we know the title of but not the QID of from other language's dumps in a another pass.
Signed-off-by: Evan Lloyd New-Schmidt <evan@new-schmidt.com>
Per-language section removal is configured with a static json file.
This includes a test to make sure the file exists and is formatted correctly.
Signed-off-by: Evan Lloyd New-Schmidt <evan@new-schmidt.com>