Download script #22
36
README.md
|
@ -10,6 +10,42 @@ OpenStreetMap commonly stores these as [`wikipedia*=`](https://wiki.openstreetma
|
|||
[`article_processing_config.json`](article_processing_config.json) should be updated when adding a new language.
|
||||
It defines article sections that are not important for users and should be removed from the extracted HTML.
|
||||
|
||||
## Downloading Dumps
|
||||
|
||||
[Enterprise HTML dumps, updated twice a month, are publicly accessible](https://dumps.wikimedia.org/other/enterprise_html/). Please note that each language's dump is tens of gigabytes in size.
|
||||
|
||||
Wikimedia requests no more than 2 concurrent downloads, which the included [`download.sh`](./download.sh) script respects:
|
||||
> If you are reading this on Wikimedia servers, please note that we have rate limited downloaders and we are capping the number of per-ip connections to 2.
|
||||
> This will help to ensure that everyone can access the files with reasonable download times.
|
||||
> Clients that try to evade these limits may be blocked.
|
||||
> Our mirror sites do not have this cap.
|
||||
|
||||
See [the list of available mirrors](https://dumps.wikimedia.org/mirrors.html) for other options. Note that most of them do not include the enterprise dumps; check to see that the `other/enterprise_html/runs/` path includes subdirectories with files. The following two mirrors are known to include the enterprise html dumps as of August 2023:
|
||||
- (US) https://dumps.wikimedia.your.org
|
||||
- (Sweden) https://mirror.accum.se/mirror/wikimedia.org
|
||||
|
||||
For the wikiparser you'll want the ["NS0"](https://en.wikipedia.org/wiki/Wikipedia:Namespace) "ENTERPRISE-HTML" `.json.tar.gz` files.
|
||||
|
||||
They are gzipped tar files containing a single file of newline-delimited JSON matching the [Wikimedia Enterprise API schema](https://enterprise.wikimedia.com/docs/data-dictionary/).
|
||||
|
||||
The included [`download.sh`](./download.sh) script handles downloading the latest set of dumps in specific languages.
|
||||
It maintains a directory with the following layout:
|
||||
```
|
||||
<DUMP_DIR>/
|
||||
├── latest -> 20230701/
|
||||
├── 20230701/
|
||||
│ ├── dewiki-NS0-20230701-ENTERPRISE-HTML.json.tar.gz
|
||||
│ ├── enwiki-NS0-20230701-ENTERPRISE-HTML.json.tar.gz
|
||||
│ ├── eswiki-NS0-20230701-ENTERPRISE-HTML.json.tar.gz
|
||||
│ ...
|
||||
├── 20230620/
|
||||
│ ├── dewiki-NS0-20230620-ENTERPRISE-HTML.json.tar.gz
|
||||
│ ├── enwiki-NS0-20230620-ENTERPRISE-HTML.json.tar.gz
|
||||
│ ├── eswiki-NS0-20230620-ENTERPRISE-HTML.json.tar.gz
|
||||
│ ...
|
||||
...
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
To use with the map generator, see the [`run.sh` script](run.sh) and its own help documentation.
|
||||
|
|
191
download.sh
Executable file
|
@ -0,0 +1,191 @@
|
|||
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
#! /usr/bin/env bash
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
USAGE="Usage: ./download.sh [-hD] [-c <NUM>] <DUMP_DIR>
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
Download the latest Wikipedia Enterprise HTML dumps.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
Arguments:
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
<DUMP_DIR> An existing directory to store dumps in. Dumps will be grouped
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
into subdirectories by date, and a link 'latest' will point to
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
![]() Will wikiparser generator properly find/load newer versions from the latest dir without specifying explicit file names? Will wikiparser generator properly find/load newer versions from the latest dir without specifying explicit file names?
![]() For the
It doesn't have any special handling for the For the `run.sh` script, you'll provide a glob of the latest directory:
```
./run.sh descriptions/ planet.osm.pdf $DUMP_DIR/latest/*
```
It doesn't have any special handling for the `$DUMP_DIR` layout.
|
||||
the latest complete dump subdirectory, if it exists.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
Options:
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
-h Print this help screen.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
-D Delete all old dump subdirectories if the latest is downloaded.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
-c <NUM> Number of concurrent downloads to allow. Ignored if wget2 is not
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
present or MIRROR is not set. Defaults to 2.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
Environment Variables:
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
LANGUAGES A whitespace-separated list of wikipedia language codes to
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
download dumps of.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
Defaults to the languages in 'article_processing_config.json'.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
See <https://meta.wikimedia.org/wiki/List_of_Wikipedias>.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
MIRROR A wikimedia dump mirror to use instead of the main wikimedia
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
server. See <https://dumps.wikimedia.org/mirrors.html> for a
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
list of available mirrors, note that many do not include the
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
required Enterprise HTML dumps.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
For example: MIRROR=https://mirror.accum.se/mirror/wikimedia.org
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
Exit codes:
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
0 The latest dumps are already present or were downloaded successfully.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
1 Argument error.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
16 Some of languages were not available to download. The latest dump may
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
be in progress, some of the specified languages may not exist, or the
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
chosen mirror may not host the files.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
_ Subprocess error.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
set -euo pipefail
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# set -x
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
build_user_agent() {
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# While the dump websites are not part of the API, it's still polite to identify yourself.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# See https://meta.wikimedia.org/wiki/User-Agent_policy
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
subcommand=$1
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
name="OrganicMapsWikiparserDownloaderBot"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
version="1.0"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
url="https://github.com/organicmaps/wikiparser"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
email="hello@organicmaps.app"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
echo -n "$name/$version ($url; $email) $subcommand"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
}
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# Parse options.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
DELETE_OLD_DUMPS=false
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
CONCURRENT_DOWNLOADS=
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
while getopts "hDc:" opt
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
do
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
case $opt in
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
h) echo -n "$USAGE"; exit 0;;
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
D) DELETE_OLD_DUMPS=true;;
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
c) CONCURRENT_DOWNLOADS=$OPTARG;;
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
?) echo "$USAGE" | head -n1 >&2; exit 1;;
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
esac
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
done
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
shift $((OPTIND - 1))
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if [ -z "${1:-}" ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
echo "DUMP_DIR is required" >&2
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
echo -n "$USAGE" >&2
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
exit 1
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# The parent directory to store groups of dumps in.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
DUMP_DIR=$(readlink -f "$1")
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
shift
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if [ -n "${1:-}" ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
echo "Unexpected extra argument: '$1'" >&2
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
echo "$USAGE" | head -n1 >&2
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
exit 1
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
![]() Can spaces be added here? Can spaces be added here?
![]() I haven't seen an example with spaces in the name. All of the browser user agents use CamelCase instead of spaces. I haven't seen an example with spaces in the name. All of the browser user agents use CamelCase instead of spaces.
|
||||
if [ ! -d "$DUMP_DIR" ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
echo "DUMP_DIR '$DUMP_DIR' does not exist" >&2
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
exit 1
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if [ -n "$CONCURRENT_DOWNLOADS" ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if [ ! "$CONCURRENT_DOWNLOADS" -ge 1 ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
echo "Number of concurrent downloads (-n) must be >= 1" >&2
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
echo "$USAGE" | head -n1 >&2
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
exit 1
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if [ -z "${MIRROR:-}" ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# NOTE: Wikipedia requests no more than 2 concurrent downloads.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# See https://dumps.wikimedia.org/ for more info.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
echo "WARN: MIRROR is not set; ignoring -n" >&2
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
CONCURRENT_DOWNLOADS=
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# Ensure we're running in the directory of this script.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
SCRIPT_PATH=$(dirname "$0")
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
cd "$SCRIPT_PATH"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
SCRIPT_PATH=$(pwd)
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# Only load library after changing to script directory.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
source lib.sh
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if [ -n "${MIRROR:-}" ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
log "Using mirror '$MIRROR'"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
BASE_URL=$MIRROR
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
else
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
BASE_URL="https://dumps.wikimedia.org"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if [ -z "${LANGUAGES:-}" ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# Load languages from config.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
LANGUAGES=$(jq -r '(.sections_to_remove | keys | .[])' article_processing_config.json)
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# shellcheck disable=SC2086 # LANGUAGES is intentionally expanded.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
log "Selected languages:" $LANGUAGES
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
![]() nit: Can array be used here without a warning? nit: Can array be used here without a warning?
![]() To convert it to an array with the same semantics it would need to suppress another warning:
To convert it to an array with the same semantics it would need to suppress another warning:
```
# shellcheck disable=SC2206 # Intentionally split on whitespace.
LANGUAGES=( $LANGUAGES )
```
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
log "Fetching run index"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# The date of the latest dump, YYYYMMDD.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
LATEST_DUMP=$(wget "$BASE_URL/other/enterprise_html/runs/" --no-verbose -O - \
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
| grep -Po '(?<=href=")[^"]*' | grep -P '\d{8}' | sort -r | head -n1)
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
LATEST_DUMP="${LATEST_DUMP%/}"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
log "Checking latest dump $LATEST_DUMP"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
URLS=
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
MISSING_DUMPS=0
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
for lang in $LANGUAGES; do
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
url="$BASE_URL/other/enterprise_html/runs/${LATEST_DUMP}/${lang}wiki-NS0-${LATEST_DUMP}-ENTERPRISE-HTML.json.tar.gz"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if ! wget --no-verbose --method=HEAD "$url"; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
MISSING_DUMPS=$(( MISSING_DUMPS + 1 ))
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
log "Dump for '$lang' does not exist at '$url'"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
continue
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
URLS="$URLS $url"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
done
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if [ -z "$URLS" ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
log "No dumps available"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
![]() "Latest dumps are already downloaded"? "Latest dumps are already downloaded"?
![]() If URLS is empty, then none of the specified languages could be found for the latest dump. If a newer dump isn't available, it will still check the sizes of the last downloaded dump, and exit with 0. If URLS is empty, then none of the specified languages could be found for the latest dump.
If a newer dump isn't available, it will still check the sizes of the last downloaded dump, and exit with 0.
![]() Good! The goal is to make a cron script that will update files automatically when they are published (and delete old files). Another question: should previously generated HTML and other temporary files be deleted before relaunching the wikiparser? Does it make sense to cover it in the run script? Good! The goal is to make a cron script that will update files automatically when they are published (and delete old files).
Another question: should previously generated HTML and other temporary files be deleted before relaunching the wikiparser? Does it make sense to cover it in the run script?
![]() They shouldn't need to be. The temporary files are regenerated each time. If an article isn't extracted from the dump due to #24 or something else, then having the old copy still available might be useful. But if the HTML simplification is changed, and older articles are no longer referenced in OSM, then they will remain on disk unchanged. They shouldn't _need_ to be.
The temporary files are regenerated each time.
The generated HTML will be overwritten if it is referenced in the new planet file.
If an article isn't extracted from the dump due to #24 or something else, then having the old copy still available might be useful.
But if the HTML simplification is changed, and older articles are no longer referenced in OSM, then they will remain on disk unchanged.
|
||||
exit 16
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# The subdir to store the latest dump in.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
DOWNLOAD_DIR="$DUMP_DIR/$LATEST_DUMP"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if [ ! -e "$DOWNLOAD_DIR" ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
mkdir "$DOWNLOAD_DIR"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
log "Downloading available dumps"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if type wget2 > /dev/null; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# shellcheck disable=SC2086 # URLS should be expanded on spaces.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
wget2 --verbose --progress=bar --continue \
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
--user-agent "$(build_user_agent wget2)" \
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
--max-threads "${CONCURRENT_DOWNLOADS:-2}" \
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
--directory-prefix "$DOWNLOAD_DIR" \
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
$URLS
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
else
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
log "WARN: wget2 is not available, falling back to sequential downloads"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# shellcheck disable=SC2086 # URLS should be expanded on spaces.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
wget --continue \
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
--user-agent "$(build_user_agent wget)" \
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
--directory-prefix "$DOWNLOAD_DIR" \
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
$URLS
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if [ $MISSING_DUMPS -gt 0 ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
log "$MISSING_DUMPS dumps not available yet"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
exit 16
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
log "Linking 'latest' to '$LATEST_DUMP'"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
LATEST_LINK="$DUMP_DIR/latest"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
ln -sf -T "$LATEST_DUMP" "$LATEST_LINK"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if [ "$DELETE_OLD_DUMPS" = true ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
# shellcheck disable=SC2010 # Only matching files with numeric names are used.
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
mapfile -t OLD_DUMPS < <(ls "$DUMP_DIR" | grep -P '^\d{8}$' | grep -vF "$LATEST_DUMP")
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
if [ "${#OLD_DUMPS[@]}" -gt 0 ]; then
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
log "Deleting old dumps" "${OLD_DUMPS[@]}"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
for old_dump in "${OLD_DUMPS[@]}"; do
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
rm -r "${DUMP_DIR:?}/${old_dump:?}/"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
done
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
else
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
log "No old dumps to delete"
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
||||
fi
|
||||
![]() In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that). In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
![]()
`set -euxo pipefail` is helpful if decide to use pipes in the script.
![]() nit: fewer lines of code are easier to read.
nit: fewer lines of code are easier to read.
```suggestion
if [ -z "${LANGUAGES+}" ]; then
```
![]() nit: (here and below)
nit: (here and below)
```suggestion
for lang in $LANGUAGES; do
```
![]() TMPDIR? TMPDIR?
![]() get_wiki_dump.sh: line 11: 1: unbound variable get_wiki_dump.sh: line 11: 1: unbound variable
![]() Do you really need to store runs.html on disk and then clean it up? Do you really need to store runs.html on disk and then clean it up?
![]() Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem. Good point, I had it like that for POSIX sh because there's no pipefail. With bash it shouldn't be a problem.
![]() Do you want the script to handle this? If it will be running on a cron job, then it might be good to keep 2 copies around. Do you want the script to handle this?
If it will be running on a cron job, then it might be good to keep 2 copies around.
Otherwise the script could delete the last dump as wikiparser is using it?
![]()
1. Aren't files that were open before their deletion on Linux still accessible?
2. Dumps are produced regularly, right? We can set a specific schedule.
3. Script may have an option to automatically delete older dumps.
![]()
You're right, as long as
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
👍 > 1. Aren't files that were open before their deletion on Linux still accessible?
You're right, as long as `run.sh` is started before `download.sh` deletes them, it will be able to access the files.
> 2. Dumps are produced regularly, right? We can set a specific schedule.
Yes, they're started on the 1st and the 20th of each month, and finished within 3 days it looks like.
> 3. Script may have an option to automatically delete older dumps.
:+1:
![]() I've added a new option:
I've added a new option:
```
-D Delete all old dump subdirectories if the latest is downloaded
```
![]()
`-c 1`, `-c 2` and no option behave in the same way with wget2 installed.
![]() Correct, I'll clarify that. Correct, I'll clarify that.
|
2
run.sh
|
@ -36,7 +36,7 @@ set -euo pipefail
|
|||
while getopts "h" opt
|
||||
do
|
||||
case $opt in
|
||||
h) echo -n "$USAGE" >&2; exit 0;;
|
||||
h) echo -n "$USAGE"; exit 0;;
|
||||
?) echo "$USAGE" | head -n1 >&2; exit 1;;
|
||||
esac
|
||||
done
|
||||
|
|
In case no new dumps are available, it should just make sure that the latest ones are already downloaded and exit gracefully (and print that).
set -euxo pipefail
is helpful if decide to use pipes in the script.nit: fewer lines of code are easier to read.
nit: (here and below)
TMPDIR?
get_wiki_dump.sh: line 11: 1: unbound variable