Evan Lloyd New-Schmidt
|
382d351740
|
Write to generator-compatible folder structure
The map generator expects a certain folder structure created by the
current scraper to add the article content into the mwm files.
- Article html is written to wikidata directory.
- Directories are created for any matched titles and symlinked to the
wikidata directory.
- Articles without a QID are written to article title directory.
- Article titles containing `/` are not escaped, so multiple
subdirectories are possible.
The output folder hierarchy looks like this:
.
├── de.wikipedia.org
│ └── wiki
│ ├── Coal_River_Springs_Territorial_Park
│ │ ├── de.html
│ │ └── ru.html
│ ├── Ni'iinlii_Njik_(Fishing_Branch)_Territorial_Park
│ │ ├── de.html
│ │ └── en.html
│ ...
├── en.wikipedia.org
│ └── wiki
│ ├── Arctic_National_Wildlife_Refuge
│ │ ├── de.html
│ │ ├── en.html
│ │ ├── es.html
│ │ ├── fr.html
│ │ └── ru.html
│ ├── Baltimore
│ │ └── Washington_International_Airport
│ │ ├── de.html
│ │ ├── en.html
│ │ ├── es.html
│ │ ├── fr.html
│ │ └── ru.html
│ ...
└── wikidata
├── Q59320
│ ├── de.html
│ ├── en.html
│ ├── es.html
│ ├── fr.html
│ └── ru.html
├── Q120306
│ ├── de.html
│ ├── en.html
│ ├── es.html
│ ├── fr.html
│ └── ru.html
...
Signed-off-by: Evan Lloyd New-Schmidt <evan@new-schmidt.com>
|
2023-07-10 10:29:49 -04:00 |
|