Download script #22

Merged
newsch merged 28 commits from download into main 2023-09-26 15:45:08 +00:00
3 changed files with 228 additions and 1 deletions

View file

@ -10,6 +10,42 @@ OpenStreetMap commonly stores these as [`wikipedia*=`](https://wiki.openstreetma
[`article_processing_config.json`](article_processing_config.json) should be updated when adding a new language.
It defines article sections that are not important for users and should be removed from the extracted HTML.
## Downloading Dumps
[Enterprise HTML dumps, updated twice a month, are publicly accessible](https://dumps.wikimedia.org/other/enterprise_html/). Please note that each language's dump is tens of gigabytes in size.
Wikimedia requests no more than 2 concurrent downloads, which the included [`download.sh`](./download.sh) script respects:
> If you are reading this on Wikimedia servers, please note that we have rate limited downloaders and we are capping the number of per-ip connections to 2.
> This will help to ensure that everyone can access the files with reasonable download times.
> Clients that try to evade these limits may be blocked.
> Our mirror sites do not have this cap.
See [the list of available mirrors](https://dumps.wikimedia.org/mirrors.html) for other options. Note that most of them do not include the enterprise dumps; check to see that the `other/enterprise_html/runs/` path includes subdirectories with files. The following two mirrors are known to include the enterprise html dumps as of August 2023:
- (US) https://dumps.wikimedia.your.org
- (Sweden) https://mirror.accum.se/mirror/wikimedia.org
For the wikiparser you'll want the ["NS0"](https://en.wikipedia.org/wiki/Wikipedia:Namespace) "ENTERPRISE-HTML" `.json.tar.gz` files.
They are gzipped tar files containing a single file of newline-delimited JSON matching the [Wikimedia Enterprise API schema](https://enterprise.wikimedia.com/docs/data-dictionary/).
The included [`download.sh`](./download.sh) script handles downloading the latest set of dumps in specific languages.
It maintains a directory with the following layout:
```
<DUMP_DIR>/
├── latest -> 20230701/
├── 20230701/
│ ├── dewiki-NS0-20230701-ENTERPRISE-HTML.json.tar.gz
│ ├── enwiki-NS0-20230701-ENTERPRISE-HTML.json.tar.gz
│ ├── eswiki-NS0-20230701-ENTERPRISE-HTML.json.tar.gz
│ ...
├── 20230620/
│ ├── dewiki-NS0-20230620-ENTERPRISE-HTML.json.tar.gz
│ ├── enwiki-NS0-20230620-ENTERPRISE-HTML.json.tar.gz
│ ├── eswiki-NS0-20230620-ENTERPRISE-HTML.json.tar.gz
│ ...
...
```
## Usage
To use with the map generator, see the [`run.sh` script](run.sh) and its own help documentation.

191
download.sh Executable file
View file

@ -0,0 +1,191 @@
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
#! /usr/bin/env bash
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
USAGE="Usage: ./download.sh [-hD] [-c <NUM>] <DUMP_DIR>
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
Download the latest Wikipedia Enterprise HTML dumps.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
Arguments:
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
<DUMP_DIR> An existing directory to store dumps in. Dumps will be grouped
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
into subdirectories by date, and a link 'latest' will point to
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-08-21 20:54:51 +00:00 (Migrated from github.com)
Review

Will wikiparser generator properly find/load newer versions from the latest dir without specifying explicit file names?

Will wikiparser generator properly find/load newer versions from the latest dir without specifying explicit file names?
newsch commented 2023-08-21 21:08:15 +00:00 (Migrated from github.com)
Review

For the run.sh script, you'll provide a glob of the latest directory:

./run.sh descriptions/ planet.osm.pdf $DUMP_DIR/latest/*

It doesn't have any special handling for the $DUMP_DIR layout.

For the `run.sh` script, you'll provide a glob of the latest directory: ``` ./run.sh descriptions/ planet.osm.pdf $DUMP_DIR/latest/* ``` It doesn't have any special handling for the `$DUMP_DIR` layout.
the latest complete dump subdirectory, if it exists.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
Options:
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
-h Print this help screen.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
-D Delete all old dump subdirectories if the latest is downloaded.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
-c <NUM> Number of concurrent downloads to allow. Ignored if wget2 is not
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
present or MIRROR is not set. Defaults to 2.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
Environment Variables:
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
LANGUAGES A whitespace-separated list of wikipedia language codes to
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
download dumps of.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
Defaults to the languages in 'article_processing_config.json'.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
See <https://meta.wikimedia.org/wiki/List_of_Wikipedias>.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
MIRROR A wikimedia dump mirror to use instead of the main wikimedia
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
server. See <https://dumps.wikimedia.org/mirrors.html> for a
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
list of available mirrors, note that many do not include the
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
required Enterprise HTML dumps.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
For example: MIRROR=https://mirror.accum.se/mirror/wikimedia.org
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
Exit codes:
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
0 The latest dumps are already present or were downloaded successfully.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
1 Argument error.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
16 Some of languages were not available to download. The latest dump may
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
be in progress, some of the specified languages may not exist, or the
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
chosen mirror may not host the files.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
_ Subprocess error.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
set -euo pipefail
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# set -x
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
build_user_agent() {
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# While the dump websites are not part of the API, it's still polite to identify yourself.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# See https://meta.wikimedia.org/wiki/User-Agent_policy
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
subcommand=$1
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
name="OrganicMapsWikiparserDownloaderBot"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
version="1.0"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
url="https://github.com/organicmaps/wikiparser"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
email="hello@organicmaps.app"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
echo -n "$name/$version ($url; $email) $subcommand"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
}
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# Parse options.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
DELETE_OLD_DUMPS=false
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
CONCURRENT_DOWNLOADS=
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
while getopts "hDc:" opt
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
do
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
case $opt in
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
h) echo -n "$USAGE"; exit 0;;
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
D) DELETE_OLD_DUMPS=true;;
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
c) CONCURRENT_DOWNLOADS=$OPTARG;;
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
?) echo "$USAGE" | head -n1 >&2; exit 1;;
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
esac
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
done
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
shift $((OPTIND - 1))
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if [ -z "${1:-}" ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
echo "DUMP_DIR is required" >&2
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
echo -n "$USAGE" >&2
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
exit 1
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# The parent directory to store groups of dumps in.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
DUMP_DIR=$(readlink -f "$1")
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
shift
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if [ -n "${1:-}" ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
echo "Unexpected extra argument: '$1'" >&2
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
echo "$USAGE" | head -n1 >&2
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
exit 1
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-08-29 21:54:31 +00:00 (Migrated from github.com)
Review

Can spaces be added here?

Can spaces be added here?
newsch commented 2023-09-01 16:02:39 +00:00 (Migrated from github.com)
Review

I haven't seen an example with spaces in the name. All of the browser user agents use CamelCase instead of spaces.

I haven't seen an example with spaces in the name. All of the browser user agents use CamelCase instead of spaces.
if [ ! -d "$DUMP_DIR" ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
echo "DUMP_DIR '$DUMP_DIR' does not exist" >&2
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
exit 1
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if [ -n "$CONCURRENT_DOWNLOADS" ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if [ ! "$CONCURRENT_DOWNLOADS" -ge 1 ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
echo "Number of concurrent downloads (-n) must be >= 1" >&2
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
echo "$USAGE" | head -n1 >&2
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
exit 1
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if [ -z "${MIRROR:-}" ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# NOTE: Wikipedia requests no more than 2 concurrent downloads.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# See https://dumps.wikimedia.org/ for more info.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
echo "WARN: MIRROR is not set; ignoring -n" >&2
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
CONCURRENT_DOWNLOADS=
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# Ensure we're running in the directory of this script.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
SCRIPT_PATH=$(dirname "$0")
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
cd "$SCRIPT_PATH"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
SCRIPT_PATH=$(pwd)
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# Only load library after changing to script directory.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
source lib.sh
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if [ -n "${MIRROR:-}" ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
log "Using mirror '$MIRROR'"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
BASE_URL=$MIRROR
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
else
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
BASE_URL="https://dumps.wikimedia.org"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if [ -z "${LANGUAGES:-}" ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# Load languages from config.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
LANGUAGES=$(jq -r '(.sections_to_remove | keys | .[])' article_processing_config.json)
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# shellcheck disable=SC2086 # LANGUAGES is intentionally expanded.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
log "Selected languages:" $LANGUAGES
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-08-16 22:17:42 +00:00 (Migrated from github.com)
Review

nit: Can array be used here without a warning?

nit: Can array be used here without a warning?
newsch commented 2023-08-17 00:05:40 +00:00 (Migrated from github.com)
Review

To convert it to an array with the same semantics it would need to suppress another warning:

# shellcheck disable=SC2206 # Intentionally split on whitespace.
LANGUAGES=( $LANGUAGES )
To convert it to an array with the same semantics it would need to suppress another warning: ``` # shellcheck disable=SC2206 # Intentionally split on whitespace. LANGUAGES=( $LANGUAGES ) ```
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
log "Fetching run index"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# The date of the latest dump, YYYYMMDD.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
LATEST_DUMP=$(wget "$BASE_URL/other/enterprise_html/runs/" --no-verbose -O - \
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
| grep -Po '(?<=href=")[^"]*' | grep -P '\d{8}' | sort -r | head -n1)
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
LATEST_DUMP="${LATEST_DUMP%/}"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
log "Checking latest dump $LATEST_DUMP"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
URLS=
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
MISSING_DUMPS=0
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
for lang in $LANGUAGES; do
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
url="$BASE_URL/other/enterprise_html/runs/${LATEST_DUMP}/${lang}wiki-NS0-${LATEST_DUMP}-ENTERPRISE-HTML.json.tar.gz"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if ! wget --no-verbose --method=HEAD "$url"; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
MISSING_DUMPS=$(( MISSING_DUMPS + 1 ))
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
log "Dump for '$lang' does not exist at '$url'"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
continue
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
URLS="$URLS $url"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
done
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if [ -z "$URLS" ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
log "No dumps available"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:07:06 +00:00 (Migrated from github.com)
Review

"Latest dumps are already downloaded"?

"Latest dumps are already downloaded"?
newsch commented 2023-08-16 21:17:25 +00:00 (Migrated from github.com)
Review

If URLS is empty, then none of the specified languages could be found for the latest dump.

If a newer dump isn't available, it will still check the sizes of the last downloaded dump, and exit with 0.

If URLS is empty, then none of the specified languages could be found for the latest dump. If a newer dump isn't available, it will still check the sizes of the last downloaded dump, and exit with 0.
biodranik commented 2023-08-16 22:11:40 +00:00 (Migrated from github.com)
Review

Good! The goal is to make a cron script that will update files automatically when they are published (and delete old files).

Another question: should previously generated HTML and other temporary files be deleted before relaunching the wikiparser? Does it make sense to cover it in the run script?

Good! The goal is to make a cron script that will update files automatically when they are published (and delete old files). Another question: should previously generated HTML and other temporary files be deleted before relaunching the wikiparser? Does it make sense to cover it in the run script?
newsch commented 2023-08-17 14:58:47 +00:00 (Migrated from github.com)
Review

They shouldn't need to be.

The temporary files are regenerated each time.
The generated HTML will be overwritten if it is referenced in the new planet file.

If an article isn't extracted from the dump due to #24 or something else, then having the old copy still available might be useful.

But if the HTML simplification is changed, and older articles are no longer referenced in OSM, then they will remain on disk unchanged.

They shouldn't _need_ to be. The temporary files are regenerated each time. The generated HTML will be overwritten if it is referenced in the new planet file. If an article isn't extracted from the dump due to #24 or something else, then having the old copy still available might be useful. But if the HTML simplification is changed, and older articles are no longer referenced in OSM, then they will remain on disk unchanged.
exit 16
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# The subdir to store the latest dump in.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
DOWNLOAD_DIR="$DUMP_DIR/$LATEST_DUMP"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if [ ! -e "$DOWNLOAD_DIR" ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
mkdir "$DOWNLOAD_DIR"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
log "Downloading available dumps"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if type wget2 > /dev/null; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# shellcheck disable=SC2086 # URLS should be expanded on spaces.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
wget2 --verbose --progress=bar --continue \
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
--user-agent "$(build_user_agent wget2)" \
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
--max-threads "${CONCURRENT_DOWNLOADS:-2}" \
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
--directory-prefix "$DOWNLOAD_DIR" \
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
$URLS
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
else
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
log "WARN: wget2 is not available, falling back to sequential downloads"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# shellcheck disable=SC2086 # URLS should be expanded on spaces.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
wget --continue \
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
--user-agent "$(build_user_agent wget)" \
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
--directory-prefix "$DOWNLOAD_DIR" \
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
$URLS
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if [ $MISSING_DUMPS -gt 0 ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
log "$MISSING_DUMPS dumps not available yet"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
exit 16
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
log "Linking 'latest' to '$LATEST_DUMP'"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
LATEST_LINK="$DUMP_DIR/latest"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
ln -sf -T "$LATEST_DUMP" "$LATEST_LINK"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if [ "$DELETE_OLD_DUMPS" = true ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
# shellcheck disable=SC2010 # Only matching files with numeric names are used.
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
mapfile -t OLD_DUMPS < <(ls "$DUMP_DIR" | grep -P '^\d{8}$' | grep -vF "$LATEST_DUMP")
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
if [ "${#OLD_DUMPS[@]}" -gt 0 ]; then
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
log "Deleting old dumps" "${OLD_DUMPS[@]}"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
for old_dump in "${OLD_DUMPS[@]}"; do
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
rm -r "${DUMP_DIR:?}/${old_dump:?}/"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
done
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
else
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
log "No old dumps to delete"
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.
fi
biodranik commented 2023-07-20 06:03:51 +00:00 (Migrated from github.com)
Review

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).

In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
biodranik commented 2023-07-20 06:04:33 +00:00 (Migrated from github.com)
Review

set -euxo pipefail is helpful if decide to use pipes in the script.

`set -euxo pipefail` is helpful if decide to use pipes in the script.
biodranik commented 2023-07-20 06:05:55 +00:00 (Migrated from github.com)
Review

nit: fewer lines of code are easier to read.

if [ -z "${LANGUAGES+}" ]; then
nit: fewer lines of code are easier to read. ```suggestion if [ -z "${LANGUAGES+}" ]; then ```
biodranik commented 2023-07-20 06:06:30 +00:00 (Migrated from github.com)
Review

nit: (here and below)

for lang in $LANGUAGES; do
nit: (here and below) ```suggestion for lang in $LANGUAGES; do ```
biodranik commented 2023-07-20 06:09:11 +00:00 (Migrated from github.com)
Review

TMPDIR?

TMPDIR?
biodranik commented 2023-07-20 08:15:10 +00:00 (Migrated from github.com)
Review

get_wiki_dump.sh: line 11: 1: unbound variable

get_wiki_dump.sh: line 11: 1: unbound variable
biodranik commented 2023-08-16 22:18:52 +00:00 (Migrated from github.com)
Review

Do you really need to store runs.html on disk and then clean it up?

Do you really need to store runs.html on disk and then clean it up?
newsch commented 2023-08-16 23:38:52 +00:00 (Migrated from github.com)
Review

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.

Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
newsch commented 2023-08-17 15:02:10 +00:00 (Migrated from github.com)
Review

Do you want the script to handle this?

If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?

Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Otherwise the script could delete the last dump as wikiparser is using it?
biodranik commented 2023-08-17 23:46:19 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?
  2. Dumps are produced regularly, right? We can set a specific schedule.
  3. Script may have an option to automatically delete older dumps.
1. Aren't files that were open before their deletion on Linux still accessible? 2. Dumps are produced regularly, right? We can set a specific schedule. 3. Script may have an option to automatically delete older dumps.
newsch commented 2023-08-18 17:06:30 +00:00 (Migrated from github.com)
Review
  1. Aren't files that were open before their deletion on Linux still accessible?

You're right, as long as run.sh is started before download.sh deletes them, it will be able to access the files.

  1. Dumps are produced regularly, right? We can set a specific schedule.

Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.

  1. Script may have an option to automatically delete older dumps.

👍

> 1. Aren't files that were open before their deletion on Linux still accessible? You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files. > 2. Dumps are produced regularly, right? We can set a specific schedule. Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like. > 3. Script may have an option to automatically delete older dumps. :+1:
newsch commented 2023-08-18 18:28:28 +00:00 (Migrated from github.com)
Review

I've added a new option:

-D      Delete all old dump subdirectories if the latest is downloaded
I've added a new option: ``` -D Delete all old dump subdirectories if the latest is downloaded ```
biodranik commented 2023-08-21 20:53:18 +00:00 (Migrated from github.com)
Review

-c 1, -c 2 and no option behave in the same way with wget2 installed.

`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
newsch commented 2023-08-21 21:09:41 +00:00 (Migrated from github.com)
Review

Correct, I'll clarify that.

Correct, I'll clarify that.

2
run.sh
View file

@ -36,7 +36,7 @@ set -euo pipefail
while getopts "h" opt
do
case $opt in
h) echo -n "$USAGE" >&2; exit 0;;
h) echo -n "$USAGE"; exit 0;;
?) echo "$USAGE" | head -n1 >&2; exit 1;;
esac
done