Add script for running with map generator #21

Merged
newsch merged 4 commits from add-script into main 2023-08-07 21:05:04 +00:00
newsch commented 2023-07-13 18:08:59 +00:00 (Migrated from github.com)

Closes #17.

Remaining work:

  • document env variable configuration
  • test from scratch with new maps build
Closes #17. Remaining work: - [x] document env variable configuration - [x] test from scratch with new maps build
biodranik (Migrated from github.com) reviewed 2023-07-20 08:39:15 +00:00
@ -0,0 +1,32 @@
use std::process::Command;
/// Pass git-describe through CARGO_GIT_VERSION env variable
biodranik (Migrated from github.com) commented 2023-07-20 08:24:59 +00:00

Why is it needed?

Why is it needed?
biodranik (Migrated from github.com) commented 2023-07-20 08:28:09 +00:00
set -euxo pipefail

Checking pipe failures helps.
If -x echo doesn't hurt, then it can be always used, for better understanding what magic is going under the hood.

```suggestion set -euxo pipefail ``` Checking pipe failures helps. If -x echo doesn't hurt, then it can be always used, for better understanding what magic is going under the hood.
biodranik (Migrated from github.com) commented 2023-07-20 08:28:58 +00:00

Is it a copy-paste between scripts that can be reused? :)

Is it a copy-paste between scripts that can be reused? :)
biodranik (Migrated from github.com) commented 2023-07-20 08:30:31 +00:00

Can echo be used here?

Can echo be used here?
biodranik (Migrated from github.com) commented 2023-07-20 08:31:11 +00:00

Why is it better than echo?

P.S. This and the previous comment are also related to another script.

Why is it better than echo? P.S. This and the previous comment are also related to another script.
biodranik (Migrated from github.com) commented 2023-07-20 08:31:49 +00:00

nit: here and in other places

if [ -z "${MAPS_DIR+}" ]; then
nit: here and in other places ```suggestion if [ -z "${MAPS_DIR+}" ]; then ```
biodranik (Migrated from github.com) commented 2023-07-20 08:33:29 +00:00

Why colon is needed? What is the purpose of this line?

Why colon is needed? What is the purpose of this line?
biodranik (Migrated from github.com) commented 2023-07-20 08:35:47 +00:00

ditto: why printf is better than echo?

ditto: why printf is better than echo?
biodranik (Migrated from github.com) commented 2023-07-20 08:37:49 +00:00

It would be great to clarify why the latest map build is needed at all.

It would be great to clarify why the latest map build is needed at all.
biodranik (Migrated from github.com) commented 2023-07-20 08:39:05 +00:00

Am I correctly understanding the issue with the current approach?

  1. Generator builds maps and creates csv/txt wiki ids files.
  2. Wikiparser runs and generates articles.
  3. Generator should be run again to reuse generated articles?..
Am I correctly understanding the issue with the current approach? 1. Generator builds maps and creates csv/txt wiki ids files. 2. Wikiparser runs and generates articles. 3. Generator should be run again to reuse generated articles?..
newsch (Migrated from github.com) reviewed 2023-07-20 16:07:58 +00:00
newsch (Migrated from github.com) commented 2023-07-20 16:07:58 +00:00

pipefail is unique to bash, I wrote this as a posix sh script. Happy to switch if you'd rather use bash.

`pipefail` is unique to bash, I wrote this as a posix sh script. Happy to switch if you'd rather use bash.
newsch (Migrated from github.com) reviewed 2023-07-20 16:08:29 +00:00
newsch (Migrated from github.com) commented 2023-07-20 16:08:29 +00:00

Yes, should I put this in a third file and source it?

Yes, should I put this in a third file and `source` it?
newsch (Migrated from github.com) reviewed 2023-07-20 16:15:16 +00:00
newsch (Migrated from github.com) commented 2023-07-20 16:15:16 +00:00

I used printf because echo doesn't handle escaped characters like \t and \n in a portable way, and you can format numbers and other things nicely.

I used printf because echo doesn't handle escaped characters like `\t` and `\n` in a portable way, and you can format numbers and other things nicely.
newsch (Migrated from github.com) reviewed 2023-07-20 16:34:12 +00:00
newsch (Migrated from github.com) commented 2023-07-20 16:34:12 +00:00

It sets MAPS_BUILD_ROOT to ~/maps_build if it doesn't exist already, the colon is a builtin no-op so that the expansion is evaluated but not used.

It could be replaced with:

if [ -z "${MAPS_BUILD_ROOT+}" ]; then
    MAPS_BUILD_ROOT="$HOME/maps_build"
fi
It sets `MAPS_BUILD_ROOT` to `~/maps_build` if it doesn't exist already, the colon is a builtin no-op so that the expansion is evaluated but not used. It could be replaced with: ```sh if [ -z "${MAPS_BUILD_ROOT+}" ]; then MAPS_BUILD_ROOT="$HOME/maps_build" fi ```
newsch (Migrated from github.com) reviewed 2023-07-20 16:42:44 +00:00
newsch (Migrated from github.com) commented 2023-07-20 16:42:43 +00:00

To replace the workflow with the scraper:

  1. The generator needs to be run at least to the "Features" stage in order to generate the wiki files.
  2. Running Wikiparser is a replacement for the "DownloadDescriptions" stage
  3. The generator can now be run from the "Descriptions" stage.

So if you'd like to do it in one run, we could update the generator to call wikiparser.
Or we could tweak the generator to only output the descriptions and continue so you can run wikiparser out-of-band.

To replace the workflow with the scraper: 1. The generator needs to be run at least to the ["Features" stage](https://github.com/organicmaps/organicmaps/blob/a1b596bdc64ed5db3eabf6b1e331411aa2e4ab03/tools/python/maps_generator/generator/stages_declaration.py#L116) in order to generate the wiki files. 2. Running Wikiparser is a replacement for the ["DownloadDescriptions" stage](https://github.com/organicmaps/organicmaps/blob/a1b596bdc64ed5db3eabf6b1e331411aa2e4ab03/tools/python/maps_generator/generator/stages_declaration.py#L144) 3. The generator can now be run from the ["Descriptions" stage](https://github.com/organicmaps/organicmaps/blob/a1b596bdc64ed5db3eabf6b1e331411aa2e4ab03/tools/python/maps_generator/generator/stages_declaration.py#L292). So if you'd like to do it in one run, we could update the generator to call wikiparser. Or we could tweak the generator to only output the descriptions and continue so you can run wikiparser out-of-band.
biodranik (Migrated from github.com) reviewed 2023-07-21 05:59:08 +00:00
biodranik (Migrated from github.com) commented 2023-07-21 05:59:07 +00:00

Bash is the default shell used in many companies. It is a good practice to write bash scripts and use bash features.

#!/usr/bin/env bash

Bash is the default shell used in [many companies](https://google.github.io/styleguide/shellguide.html). It is a good practice to write bash scripts and use bash features. `#!/usr/bin/env bash`
biodranik (Migrated from github.com) reviewed 2023-07-21 06:00:13 +00:00
biodranik (Migrated from github.com) commented 2023-07-21 06:00:13 +00:00

Yes, there is nothing wrong with that approach to avoid copy-paste. Imagine you'll introduce a third script, or split your current one into parts.

Yes, there is nothing wrong with that approach to avoid copy-paste. Imagine you'll introduce a third script, or split your current one into parts.
biodranik (Migrated from github.com) reviewed 2023-07-21 06:02:25 +00:00
biodranik (Migrated from github.com) commented 2023-07-21 06:02:25 +00:00
  1. Do we really expect newlines and tabs in logs? Is it good practice?
  2. What kind of number formatting happens in this line?
1. Do we really expect newlines and tabs in logs? Is it good practice? 2. What kind of number formatting happens in this line?
biodranik (Migrated from github.com) reviewed 2023-07-21 06:04:21 +00:00
biodranik (Migrated from github.com) commented 2023-07-21 06:04:21 +00:00
  1. More readable if form is preferred to the less known colon.
  2. Will FOO="${FOO:-default value}" work here?
1. More readable if form is preferred to the less known colon. 2. Will `FOO="${FOO:-default value}"` work here?
biodranik (Migrated from github.com) reviewed 2023-07-21 06:11:31 +00:00
biodranik (Migrated from github.com) commented 2023-07-21 06:11:30 +00:00

Updating the generator to call wikiparser means that generator should wait until wikiparser finishes, right? This approach may be a good temporary start, but we aim to speed up the map generation process as much as possible. That's why the ideal solution would (likely) be to start generator and wikiparser in parallel, as soon as a new osm planet dump is available (or maybe start wikiparser before the generator). So when the generator needs descriptions they will already be available.
That's why it was important to focus on speedy articles extraction/processing from the start.

WDYT?

Updating the generator to call wikiparser means that generator should wait until wikiparser finishes, right? This approach may be a good temporary start, but we aim to speed up the map generation process as much as possible. That's why the ideal solution would (likely) be to start generator and wikiparser in parallel, as soon as a new osm planet dump is available (or maybe start wikiparser before the generator). So when the generator needs descriptions they will already be available. That's why it was important to focus on speedy articles extraction/processing from the start. WDYT?
newsch (Migrated from github.com) reviewed 2023-07-25 15:02:52 +00:00
@ -0,0 +1,32 @@
use std::process::Command;
/// Pass git-describe through CARGO_GIT_VERSION env variable
newsch (Migrated from github.com) commented 2023-07-25 15:02:51 +00:00

I've found it useful in the past to embed the git commit in the binary so that when I'm looking through logs I can tell what version was running.
I can remove it if you don't think it's useful.

I've found it useful in the past to embed the git commit in the binary so that when I'm looking through logs I can tell what version was running. I can remove it if you don't think it's useful.
newsch (Migrated from github.com) reviewed 2023-07-25 15:05:40 +00:00
newsch (Migrated from github.com) commented 2023-07-25 15:05:40 +00:00

Yes, :- should work

Yes, `:-` should work
newsch (Migrated from github.com) reviewed 2023-07-25 15:11:37 +00:00
newsch (Migrated from github.com) commented 2023-07-25 15:11:37 +00:00

Yes, it replaces the blocking scraper script in it's current form.
I thought it would be better to start with this working and then separate and speed up the process.
To separate fully from the generator we need to finish #19.

In the meantime we could also use the outputs from an old map build and run the wikiparser ahead of time/in parallel.

Yes, it replaces the blocking scraper script in it's current form. I thought it would be better to start with this working and then separate and speed up the process. To separate fully from the generator we need to finish #19. In the meantime we could also use the outputs from an old map build and run the wikiparser ahead of time/in parallel.
newsch (Migrated from github.com) reviewed 2023-07-25 15:15:30 +00:00
newsch (Migrated from github.com) commented 2023-07-25 15:15:30 +00:00
  1. I think newlines and tabs are helpful for separating long content, but I don't think it's a necessity here.
  2. On line 25 it wrapper around printf, so any call to log can use printf's formatting string

In other places I've heard printf recommended over echo, but if we're using bash explicitly then the portability concerns are not relevant.

1. I think newlines and tabs are helpful for separating long content, but I don't think it's a necessity here. 2. On line 25 it wrapper around printf, so any call to log can use printf's formatting string In other places I've heard printf recommended over echo, but if we're using bash explicitly then the portability concerns are not relevant.
newsch (Migrated from github.com) reviewed 2023-08-03 14:00:37 +00:00
newsch (Migrated from github.com) commented 2023-08-03 14:00:36 +00:00

@biodranik do you want to use gnu parallel or something else here instead of a serial for loop?

@biodranik do you want to use gnu `parallel` or something else here instead of a serial for loop?
biodranik (Migrated from github.com) reviewed 2023-08-03 16:39:35 +00:00
biodranik (Migrated from github.com) commented 2023-08-03 16:39:35 +00:00

Maybe ideal workflow would be to start wikiparser in parallel with the map generation and make generator aware of when wikiparser finishes (assuming that it finishes faster than generator requires wiki articles).

If wikiparser takes a lot of time, then it's better to run it out of band in advance, considering that its data is rarely updated anyway.

Maybe ideal workflow would be to start wikiparser in parallel with the map generation and make generator aware of when wikiparser finishes (assuming that it finishes faster than generator requires wiki articles). If wikiparser takes a lot of time, then it's better to run it out of band in advance, considering that its data is rarely updated anyway.
biodranik (Migrated from github.com) reviewed 2023-08-03 16:41:40 +00:00
biodranik (Migrated from github.com) commented 2023-08-03 16:41:40 +00:00

Won't using & and wait at the end be enough?

Won't using & and `wait` at the end be enough?
biodranik (Migrated from github.com) approved these changes 2023-08-03 16:42:05 +00:00
biodranik (Migrated from github.com) left a comment

Should I try these changes on a server first?

Should I try these changes on a server first?
newsch (Migrated from github.com) reviewed 2023-08-03 16:55:36 +00:00
newsch (Migrated from github.com) commented 2023-08-03 16:55:36 +00:00

As long as the machine it's running on has enough cores, fine for the maps server but it will bog down on my laptop. I'll use & for now.

As long as the machine it's running on has enough cores, fine for the maps server but it will bog down on my laptop. I'll use `&` for now.
biodranik (Migrated from github.com) reviewed 2023-08-03 17:34:24 +00:00
biodranik (Migrated from github.com) commented 2023-08-03 17:34:24 +00:00

You may add an option to use only one core.

You may add an option to use only one core.
Sign in to join this conversation.
No description provided.