Add Korean language #42

Merged
lens0021 merged 1 commit from patch-1 into main 2024-04-21 12:56:24 +00:00
lens0021 commented 2024-04-20 09:14:49 +00:00 (Migrated from github.com)
No description provided.
biodranik (Migrated from github.com) reviewed 2024-04-20 09:29:32 +00:00
biodranik (Migrated from github.com) left a comment

Thanks! Did you test it?

@vng where else the language should be enabled?

Thanks! Did you test it? @vng where else the language should be enabled?
lens0021 commented 2024-04-21 00:27:11 +00:00 (Migrated from github.com)

I will test this soon.
image

I will test this soon. ![image](https://github.com/organicmaps/wikiparser/assets/28209361/b257c867-018f-47a1-8361-ae7c99116a01)
lens0021 commented 2024-04-21 11:27:07 +00:00 (Migrated from github.com)

I could not found the extract PBF file for Korea. Should I download the 74.2 GB Planet.osm file to test too? 😂

I could not found the extract PBF file for Korea. Should I download the 74.2 GB Planet.osm file to test too? 😂
lens0021 commented 2024-04-21 12:16:55 +00:00 (Migrated from github.com)

Ok now I've tested this patch and the confirmed that the generated HTML files don't have the unwanted sections:

image

How I've parsed:

  1. Downloaded south-korea-latest.osm.pbf from https://download.geofabrik.de/asia/south-korea.html
  2. Downloaded kowiki-NS0-20240420-ENTERPRISE-HTML.json.tar.gz from https://dumps.wikimedia.org/other/enterprise_html/runs/20240420/
  3. $ rustc --version
    rustc 1.73.0 (cc66ad468 2023-10-03)
    $ cargo run --release --
    $ ls
    article_processing_config.json
    benches/
    build.rs
    Cargo.lock
    Cargo.toml
    download.sh*
    ko.tsv
    kowiki-NS0-20240420-ENTERPRISE-HTML.json.tar.gz
    lib.sh
    LICENSE
    README.md
    run.sh*
    south-korea-latest.osm.pbf
    src/
    target/
    tests/
    
  4. $ target/release/om-wikiparser get-tags south-korea-latest.osm.pbf > ko.tsv
    $  head -n 5 ko.tsv
    @id     @otype  @version        wikidata        wikipedia
    301775326       0       11      Q495739 en:Axe murder incident
    306111080       0       2       Q12621551       ko:판암 나들목
    309675985       0       4       Q16093933       ko:경주 나들목
    309947253       0       42      Q42147  en:Cheongju
    $ mkdir descriptions
    $ tar -xvzf kowiki-NS0-20240420-ENTERPRISE-HTML.json.tar.gz
    $ mkdir dumps    
    $ mv kowiki_namespace_0_* dumps
    
  5. cat dumps/kowiki_namespace_0_0.ndjson | target/release/om-wikiparser get-articles --osm-tags ko.tsv --write-new-qids new_qids.txt descriptions/
    

Ok now I've tested this patch and the confirmed that the generated HTML files don't have the unwanted sections: ![image](https://github.com/organicmaps/wikiparser/assets/28209361/76aad4bc-7520-4e92-a57f-bf59f120d470) <details><summary>How I've parsed:</summary> <p> 1. Downloaded south-korea-latest.osm.pbf from https://download.geofabrik.de/asia/south-korea.html 2. Downloaded kowiki-NS0-20240420-ENTERPRISE-HTML.json.tar.gz from https://dumps.wikimedia.org/other/enterprise_html/runs/20240420/ 3. ```console $ rustc --version rustc 1.73.0 (cc66ad468 2023-10-03) $ cargo run --release -- $ ls article_processing_config.json benches/ build.rs Cargo.lock Cargo.toml download.sh* ko.tsv kowiki-NS0-20240420-ENTERPRISE-HTML.json.tar.gz lib.sh LICENSE README.md run.sh* south-korea-latest.osm.pbf src/ target/ tests/ ``` 4. ```console $ target/release/om-wikiparser get-tags south-korea-latest.osm.pbf > ko.tsv $ head -n 5 ko.tsv @id @otype @version wikidata wikipedia 301775326 0 11 Q495739 en:Axe murder incident 306111080 0 2 Q12621551 ko:판암 나들목 309675985 0 4 Q16093933 ko:경주 나들목 309947253 0 42 Q42147 en:Cheongju $ mkdir descriptions $ tar -xvzf kowiki-NS0-20240420-ENTERPRISE-HTML.json.tar.gz $ mkdir dumps $ mv kowiki_namespace_0_* dumps ```` 5. ```console cat dumps/kowiki_namespace_0_0.ndjson | target/release/om-wikiparser get-articles --osm-tags ko.tsv --write-new-qids new_qids.txt descriptions/ ``` </p> </details>
biodranik (Migrated from github.com) approved these changes 2024-04-21 12:56:18 +00:00
biodranik (Migrated from github.com) left a comment

Thanks! Let's see if it works without any additional changes in the generator code. Feel free to also add other wiki sections which are not necessary for a quick overview of the place.

Thanks! Let's see if it works without any additional changes in the generator code. Feel free to also add other wiki sections which are not necessary for a quick overview of the place.
newsch commented 2024-04-21 16:40:23 +00:00 (Migrated from github.com)

@lens0021 thanks for the contribution and testing so thoroughly!
Did you have any trouble using it? Is there anything that would be helpful to add to the docs?

@lens0021 thanks for the contribution and testing so thoroughly! Did you have any trouble using it? Is there anything that would be helpful to add to the docs?
lens0021 commented 2024-04-21 17:24:40 +00:00 (Migrated from github.com)

Did you have any trouble using it? Is there anything that would be helpful to add to the docs?

At first, I didn't know how to run a maps build that was described at:

19d9f2c42a/README.md (L139)

So I tried the alternative way. But I didn't know what is pbf file. After reading https://wiki.openstreetmap.org/wiki/Planet.osm, I am still not sure there was the extract file for my country or not. Fortunately, I found it on the Google.

Oh, and I could not expect that the config file was read in the build time. I thought the CLI read it.

> Did you have any trouble using it? Is there anything that would be helpful to add to the docs? At first, I didn't know how to run a maps build that was described at: https://github.com/organicmaps/wikiparser/blob/19d9f2c42aec2da638be81364021cacd6d22be3a/README.md?plain=1#L139 So I tried the alternative way. But I didn't know what is pbf file. After reading https://wiki.openstreetmap.org/wiki/Planet.osm, I am still not sure there was the extract file for my country or not. Fortunately, I found it on the Google. Oh, and I could not expect that the config file was read in the build time. I thought the CLI read it.
Sign in to join this conversation.
No description provided.