Commit graph

  • af80f2ad75 Check for DUMP_DIR existence Evan Lloyd New-Schmidt 2023-08-16 17:14:51 -04:00
  • 4c2c6e97ff Rename temp dir Evan Lloyd New-Schmidt 2023-08-16 17:11:56 -04:00
  • 5077ed02f2 Document usage Evan Lloyd New-Schmidt 2023-08-16 17:09:43 -04:00
  • bce44d1ab9 Store in subdirs Evan Lloyd New-Schmidt 2023-08-16 16:44:23 -04:00
  • 54727b968d Working downloads Evan Lloyd New-Schmidt 2023-08-16 16:27:35 -04:00
  • 27ff9cb4dc Track number of missing dumps Evan Lloyd New-Schmidt 2023-08-16 14:42:43 -04:00
  • 0a1e0592ff Use real enterprise dump url Evan Lloyd New-Schmidt 2023-08-16 14:42:03 -04:00
  • fe295b2379 Fix jq list output Evan Lloyd New-Schmidt 2023-08-16 14:41:19 -04:00
  • bae03b91c8 Improve comments Evan Lloyd New-Schmidt 2023-08-16 14:10:35 -04:00
  • 9ee1e8d594 Fix check for uninitialized variable Evan Lloyd New-Schmidt 2023-08-16 14:09:52 -04:00
  • 3a4d1214dc Canonicalize input paths Evan Lloyd New-Schmidt 2023-08-16 14:08:59 -04:00
  • 7254bc3ec8 Add requested changes Evan Lloyd New-Schmidt 2023-07-25 12:07:09 -04:00
  • 28c17a28eb WIP download script Evan Lloyd New-Schmidt 2023-07-18 15:26:50 -04:00
  • c7fe34f3ad Remove header ids Evan Lloyd New-Schmidt 2023-08-15 18:06:56 -04:00
  • b96c2cf4db Refactor simplification Evan Lloyd New-Schmidt 2023-08-15 17:25:51 -04:00
  • 6c02f4a569 Remove coordinates from output Evan Lloyd New-Schmidt 2023-08-15 10:42:25 -04:00
  • c4028e52fa Preserve excerpts Evan Lloyd New-Schmidt 2023-08-10 16:59:43 -04:00
  • 3d3ecb52b2 Minify whitespace between elements Evan Lloyd New-Schmidt 2023-08-15 16:19:16 -04:00
  • 81783695d5 Remove doctype and html element Evan Lloyd New-Schmidt 2023-08-10 16:27:10 -04:00
  • 58f32b43fd Remove empty sections after other removals Evan Lloyd New-Schmidt 2023-08-15 18:36:02 -04:00
  • cc3ae9b629 Remove "(listen)" text Evan Lloyd New-Schmidt 2023-08-15 18:35:14 -04:00
  • 81f528a350 Expand spans, sections, and body after removing head Evan Lloyd New-Schmidt 2023-08-15 16:15:34 -04:00
  • 0a0a94b484 Remove comments Evan Lloyd New-Schmidt 2023-08-15 16:12:07 -04:00
  • 4b776f49d4 Add denylist from Extracts API Evan Lloyd New-Schmidt 2023-08-10 09:30:21 -04:00
  • 75fa04407d Add snapshot tests for html output Evan Lloyd New-Schmidt 2023-08-15 16:03:07 -04:00
  • 32cd084f3f Add simplification logging Evan Lloyd New-Schmidt 2023-08-10 11:41:05 -04:00
  • c9eb7a160a Add option to not simplify when extracting Evan Lloyd New-Schmidt 2023-08-10 10:13:19 -04:00
  • 06f3e63276 Remove header ids Evan Lloyd New-Schmidt 2023-08-15 18:06:56 -04:00
  • 8ae2597c5b Refactor simplification Evan Lloyd New-Schmidt 2023-08-15 17:25:51 -04:00
  • 4be39acdd3 Remove coordinates from output Evan Lloyd New-Schmidt 2023-08-15 10:42:25 -04:00
  • a143036283 Preserve excerpts Evan Lloyd New-Schmidt 2023-08-10 16:59:43 -04:00
  • affe164d86 Minify whitespace between elements Evan Lloyd New-Schmidt 2023-08-15 16:19:16 -04:00
  • 099072a459 Remove doctype and html element Evan Lloyd New-Schmidt 2023-08-10 16:27:10 -04:00
  • 4c3643576c Remove empty sections after other removals Evan Lloyd New-Schmidt 2023-08-15 18:36:02 -04:00
  • 6e73cc1675 Remove "(listen)" text Evan Lloyd New-Schmidt 2023-08-15 18:35:14 -04:00
  • 28790990a9 Expand spans, sections, and body after removing head Evan Lloyd New-Schmidt 2023-08-15 16:15:34 -04:00
  • 8396c12690 Remove comments Evan Lloyd New-Schmidt 2023-08-15 16:12:07 -04:00
  • f9fac73a3d Add denylist from Extracts API Evan Lloyd New-Schmidt 2023-08-10 09:30:21 -04:00
  • b765c07b83 Add snapshot tests for html output Evan Lloyd New-Schmidt 2023-08-15 16:03:07 -04:00
  • 4b917144a1 Add simplification logging Evan Lloyd New-Schmidt 2023-08-10 11:41:05 -04:00
  • c2d4ded75a Add option to not simplify when extracting Evan Lloyd New-Schmidt 2023-08-10 10:13:19 -04:00
  • 941d2b1032 Structure parse errors and only log warning if above threshold Evan Lloyd New-Schmidt 2023-08-09 14:10:40 -04:00
  • 34bb9318d5 Refactor and rename title/qid wrappers Evan Lloyd New-Schmidt 2023-08-09 12:09:40 -04:00
  • bdf6f1a68c Improve url handling Evan Lloyd New-Schmidt 2023-08-08 14:51:32 -04:00
  • 6d242a62aa Extract tags in parallel in rust Evan Lloyd New-Schmidt 2023-08-08 13:12:26 -04:00
  • b6db70f74c Refactor into subcommands Evan Lloyd New-Schmidt 2023-08-07 17:40:32 -04:00
  • 5df2d8d243 Add new option to parse osm tag file Evan Lloyd New-Schmidt 2023-08-02 18:55:53 -04:00
  • b250dd4b13 Structure parse errors and only log warning if above threshold Evan Lloyd New-Schmidt 2023-08-09 14:10:40 -04:00
  • 29cdbe2301 Refactor and rename title/qid wrappers Evan Lloyd New-Schmidt 2023-08-09 12:09:40 -04:00
  • 2532d1365e Improve url handling Evan Lloyd New-Schmidt 2023-08-08 14:51:32 -04:00
  • 3d48c39793 Extract tags in parallel in rust Evan Lloyd New-Schmidt 2023-08-08 13:12:26 -04:00
  • 0ac935c175 Refactor into subcommands Evan Lloyd New-Schmidt 2023-08-07 17:40:32 -04:00
  • a2c113a885 Add new option to parse osm tag file Evan Lloyd New-Schmidt 2023-08-02 18:55:53 -04:00
  • 0fc43767aa Add script Evan Lloyd New-Schmidt 2023-07-13 13:59:46 -04:00
  • d6e892343b Keep charset tags Evan Lloyd New-Schmidt 2023-08-04 17:43:30 -04:00
  • ac556bd3d4 Save and log build commit Evan Lloyd New-Schmidt 2023-07-14 11:15:26 -04:00
  • aa213fbece Make new qid writes atomic Evan Lloyd New-Schmidt 2023-07-14 16:29:26 -04:00
  • b722e7d837 Add script Evan Lloyd New-Schmidt 2023-07-13 13:59:46 -04:00
  • e495afa743 Keep charset tags Evan Lloyd New-Schmidt 2023-08-04 17:43:30 -04:00
  • b924301bfe Save and log build commit Evan Lloyd New-Schmidt 2023-07-14 11:15:26 -04:00
  • 8fe572b7a2 Make new qid writes atomic Evan Lloyd New-Schmidt 2023-07-14 16:29:26 -04:00
  • 75f4f6a21b
    Add option to dump new QIDs (#20) Evan Lloyd New-Schmidt 2023-07-13 14:04:52 -04:00
  • 308c932f08 Remove old TODO Evan Lloyd New-Schmidt 2023-07-13 11:33:03 -04:00
  • a3cc7df3a8 Clarify CLI options Evan Lloyd New-Schmidt 2023-07-13 11:31:25 -04:00
  • c8ca2f8ef7 Return error if no filters are provided Evan Lloyd New-Schmidt 2023-07-13 11:30:40 -04:00
  • ca972b1b87 Log errors parsing filter files instead of failing Evan Lloyd New-Schmidt 2023-07-11 12:17:19 -04:00
  • 7d287bd5a4 Add option to dump new QIDs to file Evan Lloyd New-Schmidt 2023-07-11 12:02:19 -04:00
  • 45efd77c0d Remove images and links Evan Lloyd New-Schmidt 2023-06-29 15:41:03 -04:00
  • 8ec696ceae Remove images and links Evan Lloyd New-Schmidt 2023-06-29 15:41:03 -04:00
  • 9036e3413f
    Write to generator-compatible folder structure (#6) Evan Lloyd New-Schmidt 2023-07-10 10:34:20 -04:00
  • 382d351740 Write to generator-compatible folder structure Evan Lloyd New-Schmidt 2023-06-22 18:46:00 -04:00
  • bb1f897cd2 Add checks for whitespace/empty strings in ids and titles Evan Lloyd New-Schmidt 2023-06-23 12:05:45 -04:00
  • 0a0317538c Rewrite comments as sentences for readability Evan Lloyd New-Schmidt 2023-06-23 11:30:55 -04:00
  • 8435682ddf Add support for multiple languages Evan Lloyd New-Schmidt 2023-06-07 15:55:18 -04:00
  • 35faadc693 Optimize wikipedia title parsing Evan Lloyd New-Schmidt 2023-06-06 14:00:07 -04:00
  • f12e8d802c Add id parsing benchmarks Evan Lloyd New-Schmidt 2023-06-06 13:15:58 -04:00
  • d55d3cc7e0 Initial parsing and processing Evan Lloyd New-Schmidt 2023-06-01 10:14:46 -04:00
  • 171cfec10f Add checks for whitespace/empty strings in ids and titles Evan Lloyd New-Schmidt 2023-06-23 12:05:45 -04:00
  • cd67f0e7cc Rewrite comments as sentences for readability Evan Lloyd New-Schmidt 2023-06-23 11:30:55 -04:00
  • dcbc828884 Add support for multiple languages Evan Lloyd New-Schmidt 2023-06-07 15:55:18 -04:00
  • 00256a309e Optimize wikipedia title parsing Evan Lloyd New-Schmidt 2023-06-06 14:00:07 -04:00
  • 00a199e20c Add id parsing benchmarks Evan Lloyd New-Schmidt 2023-06-06 13:15:58 -04:00
  • 34ce30301c Initial parsing and processing Evan Lloyd New-Schmidt 2023-06-01 10:14:46 -04:00
  • aba31775fa
    Setup GitHub (#2) Evan Lloyd New-Schmidt 2023-06-01 03:25:35 -04:00
  • 97de15d136 Use a better filename Evan Lloyd New-Schmidt 2023-05-31 17:55:35 -04:00
  • 17e8f22c94 Remove unused matrix testing key Evan Lloyd New-Schmidt 2023-05-31 17:53:29 -04:00
  • 454a3cad43 Ignore non-rust files Evan Lloyd New-Schmidt 2023-05-31 17:51:44 -04:00
  • ebf28240e8
    Apply suggestions from code review Evan Lloyd New-Schmidt 2023-05-31 15:48:30 -04:00
  • ddcfc1f3d9 Add more context to cache prefix-key Evan Lloyd New-Schmidt 2023-05-31 09:31:22 -04:00
  • 74d4c5e12a Remove explicit rustup install Evan Lloyd New-Schmidt 2023-05-31 09:30:28 -04:00
  • c594288544 Fix formatting Evan Lloyd New-Schmidt 2023-05-30 17:21:26 -04:00
  • c38870a3a0 Add CI tests Evan Lloyd New-Schmidt 2023-05-30 16:34:44 -04:00
  • 5991270650 Fix license identifier Evan Lloyd New-Schmidt 2023-05-30 13:07:46 -04:00
  • ddf6028465
    Initial rust setup (#1) Evan Lloyd New-Schmidt 2023-05-30 13:00:05 -04:00
  • 25e471b0c1
    Update README.md Alexander Borsuk 2023-05-30 18:59:44 +02:00
  • bf08579dc4 Initial rust setup Evan Lloyd New-Schmidt 2023-05-30 11:49:26 -04:00
  • f72e380d11
    Initial commit Alexander Borsuk 2023-05-30 16:01:35 +02:00