[tools] Fix map file stats and add outer geometry stats #2935

Merged
root merged 4 commits from pastk-tools-mwmstats into master 2022-07-10 21:26:16 +00:00
Owner
  1. Fixed some wrong stats, e.g. inner geometry sizes were double-calculated and only the best outer geometry was taken into account. Improved labelling and output formatting.
  2. Added detailed outer geometry stats.

Sample run cmd:

../omim-build-release/generator_tool --stats_general=true --stats_geometry=true --stats_geometry_dup_factor=2.0 --stats_types=true --output="Russia_Mari El"

Sample stats:

File section sizes
              addr :     101929
         altitudes :     341744
           centers :    1150880
        city_roads :      20416
         cross_mwm :     641444
      descriptions :    1429959
          features :    7129219
             geom0 :      14355
             geom1 :      78810
             geom2 :    1068665
             geom3 :    5885068
            header :         32
               idx :    1077635
     isolines_info :          7
         maxspeeds :       4257
              meta :     215981
         metalines :       9796
              offs :     306440
             ranks :      62456
      restrictions :       2914
           rgninfo :          4
        roadaccess :       3300
           routing :     183510
               sdx :     570511
         speedcams :         47
 transit_cross_mwm :         29
              trg0 :     203673
              trg1 :     859230
              trg2 :    2075253
              trg3 :    4299267
           version :         10

               Feature headers: size =   6813593; features =  315361
            incl. inner points: size =   1661890; features =   75621
incl. inner triangles (strips): size =   2507856; features =  169044

Top SIZE by Geometry Type
 0.  Area: size =  10865588; features =  193563; w/names =     2296
 1.  Line: size =  10137284; features =  103928; w/names =    54853
 3. Point: size =    295042; features =   17870; w/names =     5217

Top SIZE by Classificator Type
(a single feature's size may be included in several types)
 0.                isoline-step_10: size =   4649323; features =   33970; w/names =    33970
 1.                 landuse-forest: size =   4461004; features =   14977; w/names =      121
 3.                       building: size =   2984414; features =  158290; w/names =     1160
 4.               isoline-step_100: size =   1980735; features =   12448; w/names =    12448
 5.                  highway-track: size =    803415; features =   11179; w/names =       57
 6.               landuse-farmland: size =    623214; features =    1464; w/names =        1
 7.            natural-water-river: size =    475228; features =     133; w/names =       12
 8.            highway-residential: size =    459388; features =    7581; w/names =     4786
 9.                natural-wetland: size =    443629; features =    2522; w/names =       44
 10.                  natural-water: size =    431182; features =    3470; w/names =      143

Top SIZE by Points Count
 0.   500: size =   2914034; features =    1983
 1.     7: size =    338663; features =   14444
 2.     9: size =    190114; features =    6756
 4.    10: size =    132103; features =    4244
 5.    11: size =    130486; features =    3864
 6.     5: size =    122763; features =    6963
 7.    15: size =    113772; features =    2154
 8.     2: size =    108608; features =   11812
 9.    12: size =    106238; features =    2904
 10.    13: size =    102135; features =    2617

Top SIZE by Triangles Count
 0.     2: size =   1563876; features =  126426
 1.     4: size =    456436; features =   25399
 2.     6: size =    251911; features =    9389
 3.     8: size =    120370; features =    3453
 4.    10: size =    111020; features =    2433
 5.    16: size =     89001; features =     855
 6. 19675: size =     88770; features =       1
 7.  9893: size =     85142; features =       1
 8.    15: size =     84904; features =     840
 9.    12: size =     83249; features =    1461

Top SIZE by Area
 0.    10: size =  19264017; features =  315123; w/names =    62338
 1.   100: size =    488214; features =      40; w/names =        9
 2.   500: size =    430128; features =      17; w/names =        3
 3.    50: size =    428084; features =      65; w/names =        6
 4.   200: size =    363125; features =      25; w/names =        4
 5.    20: size =    292968; features =      90; w/names =        6
 6.  5000: size =     31378; features =       1; w/names =        0

Feature stats by Classificator Type
(a single feature can contain several types and thus its size can be included in several type lines)
                       amenity: size =      2698; features =      46; length =          0 m; area =      22360 m²; w/names =       39
                      building: size =   2984414; features =  158290; length =          0 m; area =   35293862 m²; w/names =     1160
                 building:part: size =     18365; features =     722; length =          0 m; area =     241150 m²; w/names =       17
                      entrance: size =      6207; features =     623; length =          0 m; area =          0 m²; w/names =      177
               internet_access: size =        38; features =       1; length =          0 m; area =          0 m²; w/names =        1
                        office: size =       716; features =      22; length =          0 m; area =       1599 m²; w/names =       21
                          shop: size =      4973; features =     206; length =          0 m; area =       4372 m²; w/names =      185
             aeroway-aerodrome: size =      1377; features =       3; length =          0 m; area =    5421347 m²; w/names =        2
           amenity-arts_centre: size =       645; features =       8; length =          0 m; area =       6422 m²; w/names =        7
                 barrier-block: size =       226; features =      21; length =          0 m; area =          0 m²; w/names =        1
              building-address: size =      3713; features =     354; length =          0 m; area =          0 m²; w/names =        0
               craft-beekeeper: size =       227; features =      20; length =          0 m; area =       3409 m²; w/names =        0
         healthcare-laboratory: size =       158; features =       3; length =          0 m; area =          0 m²; w/names =        3
             highway-bridleway: size =       490; features =       6; length =      22916 m; area =          0 m²; w/names =        0
  historic-archaeological_site: size =        90; features =       2; length =          0 m; area =          0 m²; w/names =        2
          internet_access-wlan: size =      1876; features =      40; length =          0 m; area =       3223 m²; w/names =       27
               isoline-step_10: size =   4649323; features =   33970; length =   39097426 m; area =          0 m²; w/names =    33970

...

Sample outer geometry stats:

Outer LINE geometry
   geom0: size =     14355: elements =      4751; features =     650; elems/feats =   7.3; bytes/elems =  3.0
geom1w/0: size =     48764: elements =     16096; feats w/geom0 =     650; elems/feats =  24.8; size factor =  3.4x; elems factor =  3.4x
   geom1: size =     78810: elements =     26988; features =    3262; elems/feats =   8.3; bytes/elems =  2.9
geom2w/1: size =    174512: elements =     59613; feats w/geom1 =    3262; elems/feats =  18.3; size factor =  2.2x; elems factor =  2.2x
   geom2: size =   1068665: elements =    374913; features =   15426; elems/feats =  24.3; bytes/elems =  2.9
geom3w/2: size =   2249686: elements =    843519; feats w/geom2 =   15426; elems/feats =  54.7; size factor =  2.1x; elems factor =  2.2x
   geom3: size =   5885068: elements =   2209425; features =   28307; elems/feats =  78.1; bytes/elems =  2.7
Geometry almost duplicating (<1.5x less elements) a more detailed one
geom0~=1: size =      2065: elements =       689; features =     135; elems/feats =   5.1; dups size % = 14%
geom1~=2: size =     11554: elements =      4008; features =     777; elems/feats =   5.2; dups size % = 14%
geom2~=3: size =    214457: elements =     73798; features =    2534; elems/feats =  29.1; dups size % = 20%

Outer AREA geometry
   trg0: size =    203673: elements =     40681; features =    3355; elems/feats =  12.1; bytes/elems =  5.0
trg1w/0: size =    604080: elements =    155702; feats w/trg0 =    3355; elems/feats =  46.4; size factor =  3.0x; elems factor =  3.8x
   trg1: size =    859230: elements =    204763; features =   10289; elems/feats =  19.9; bytes/elems =  4.2
trg2w/1: size =   1861674: elements =    504368; feats w/trg1 =   10289; elems/feats =  49.0; size factor =  2.2x; elems factor =  2.5x
   trg2: size =   2075253: elements =    545051; features =   18169; elems/feats =  30.0; bytes/elems =  3.8
trg3w/2: size =   4096561: elements =   1174955; feats w/trg2 =   18169; elems/feats =  64.7; size factor =  2.0x; elems factor =  2.2x
   trg3: size =   4299267: elements =   1227069; features =   24519; elems/feats =  50.0; bytes/elems =  3.5
Geometry almost duplicating (<1.5x less elements) a more detailed one
trg0~=1: size =      1304: elements =       200; features =      44; elems/feats =   4.5; dups size % =  0%
trg1~=2: size =    137969: elements =     35611; features =     753; elems/feats =  47.3; dups size % = 16%
trg2~=3: size =   1066718: elements =    281620; features =    5185; elems/feats =  54.3; dups size % = 51%

E.g. geom3w/2 contains geom3 stats for features that have geom2 also, hence we can compare actual effect of geometry simplification (in terms of data size and points/triangles counts).

Almost duplicating geometry stats give an idea how often geometry simplification doesn't result in much size/complexity difference. E.g. its useful for #2927 analysis. I think a 1.5x threshold is quite conservative (as seen in this sample an avg complexity simplification factor is >= 2.2x).

1. Fixed some wrong stats, e.g. inner geometry sizes were double-calculated and only the best outer geometry was taken into account. Improved labelling and output formatting. 2. Added detailed outer geometry stats. Sample run cmd: ``` ../omim-build-release/generator_tool --stats_general=true --stats_geometry=true --stats_geometry_dup_factor=2.0 --stats_types=true --output="Russia_Mari El" ``` Sample stats: ``` File section sizes addr : 101929 altitudes : 341744 centers : 1150880 city_roads : 20416 cross_mwm : 641444 descriptions : 1429959 features : 7129219 geom0 : 14355 geom1 : 78810 geom2 : 1068665 geom3 : 5885068 header : 32 idx : 1077635 isolines_info : 7 maxspeeds : 4257 meta : 215981 metalines : 9796 offs : 306440 ranks : 62456 restrictions : 2914 rgninfo : 4 roadaccess : 3300 routing : 183510 sdx : 570511 speedcams : 47 transit_cross_mwm : 29 trg0 : 203673 trg1 : 859230 trg2 : 2075253 trg3 : 4299267 version : 10 Feature headers: size = 6813593; features = 315361 incl. inner points: size = 1661890; features = 75621 incl. inner triangles (strips): size = 2507856; features = 169044 Top SIZE by Geometry Type 0. Area: size = 10865588; features = 193563; w/names = 2296 1. Line: size = 10137284; features = 103928; w/names = 54853 3. Point: size = 295042; features = 17870; w/names = 5217 Top SIZE by Classificator Type (a single feature's size may be included in several types) 0. isoline-step_10: size = 4649323; features = 33970; w/names = 33970 1. landuse-forest: size = 4461004; features = 14977; w/names = 121 3. building: size = 2984414; features = 158290; w/names = 1160 4. isoline-step_100: size = 1980735; features = 12448; w/names = 12448 5. highway-track: size = 803415; features = 11179; w/names = 57 6. landuse-farmland: size = 623214; features = 1464; w/names = 1 7. natural-water-river: size = 475228; features = 133; w/names = 12 8. highway-residential: size = 459388; features = 7581; w/names = 4786 9. natural-wetland: size = 443629; features = 2522; w/names = 44 10. natural-water: size = 431182; features = 3470; w/names = 143 Top SIZE by Points Count 0. 500: size = 2914034; features = 1983 1. 7: size = 338663; features = 14444 2. 9: size = 190114; features = 6756 4. 10: size = 132103; features = 4244 5. 11: size = 130486; features = 3864 6. 5: size = 122763; features = 6963 7. 15: size = 113772; features = 2154 8. 2: size = 108608; features = 11812 9. 12: size = 106238; features = 2904 10. 13: size = 102135; features = 2617 Top SIZE by Triangles Count 0. 2: size = 1563876; features = 126426 1. 4: size = 456436; features = 25399 2. 6: size = 251911; features = 9389 3. 8: size = 120370; features = 3453 4. 10: size = 111020; features = 2433 5. 16: size = 89001; features = 855 6. 19675: size = 88770; features = 1 7. 9893: size = 85142; features = 1 8. 15: size = 84904; features = 840 9. 12: size = 83249; features = 1461 Top SIZE by Area 0. 10: size = 19264017; features = 315123; w/names = 62338 1. 100: size = 488214; features = 40; w/names = 9 2. 500: size = 430128; features = 17; w/names = 3 3. 50: size = 428084; features = 65; w/names = 6 4. 200: size = 363125; features = 25; w/names = 4 5. 20: size = 292968; features = 90; w/names = 6 6. 5000: size = 31378; features = 1; w/names = 0 Feature stats by Classificator Type (a single feature can contain several types and thus its size can be included in several type lines) amenity: size = 2698; features = 46; length = 0 m; area = 22360 m²; w/names = 39 building: size = 2984414; features = 158290; length = 0 m; area = 35293862 m²; w/names = 1160 building:part: size = 18365; features = 722; length = 0 m; area = 241150 m²; w/names = 17 entrance: size = 6207; features = 623; length = 0 m; area = 0 m²; w/names = 177 internet_access: size = 38; features = 1; length = 0 m; area = 0 m²; w/names = 1 office: size = 716; features = 22; length = 0 m; area = 1599 m²; w/names = 21 shop: size = 4973; features = 206; length = 0 m; area = 4372 m²; w/names = 185 aeroway-aerodrome: size = 1377; features = 3; length = 0 m; area = 5421347 m²; w/names = 2 amenity-arts_centre: size = 645; features = 8; length = 0 m; area = 6422 m²; w/names = 7 barrier-block: size = 226; features = 21; length = 0 m; area = 0 m²; w/names = 1 building-address: size = 3713; features = 354; length = 0 m; area = 0 m²; w/names = 0 craft-beekeeper: size = 227; features = 20; length = 0 m; area = 3409 m²; w/names = 0 healthcare-laboratory: size = 158; features = 3; length = 0 m; area = 0 m²; w/names = 3 highway-bridleway: size = 490; features = 6; length = 22916 m; area = 0 m²; w/names = 0 historic-archaeological_site: size = 90; features = 2; length = 0 m; area = 0 m²; w/names = 2 internet_access-wlan: size = 1876; features = 40; length = 0 m; area = 3223 m²; w/names = 27 isoline-step_10: size = 4649323; features = 33970; length = 39097426 m; area = 0 m²; w/names = 33970 ... ``` Sample outer geometry stats: ``` Outer LINE geometry geom0: size = 14355: elements = 4751; features = 650; elems/feats = 7.3; bytes/elems = 3.0 geom1w/0: size = 48764: elements = 16096; feats w/geom0 = 650; elems/feats = 24.8; size factor = 3.4x; elems factor = 3.4x geom1: size = 78810: elements = 26988; features = 3262; elems/feats = 8.3; bytes/elems = 2.9 geom2w/1: size = 174512: elements = 59613; feats w/geom1 = 3262; elems/feats = 18.3; size factor = 2.2x; elems factor = 2.2x geom2: size = 1068665: elements = 374913; features = 15426; elems/feats = 24.3; bytes/elems = 2.9 geom3w/2: size = 2249686: elements = 843519; feats w/geom2 = 15426; elems/feats = 54.7; size factor = 2.1x; elems factor = 2.2x geom3: size = 5885068: elements = 2209425; features = 28307; elems/feats = 78.1; bytes/elems = 2.7 Geometry almost duplicating (<1.5x less elements) a more detailed one geom0~=1: size = 2065: elements = 689; features = 135; elems/feats = 5.1; dups size % = 14% geom1~=2: size = 11554: elements = 4008; features = 777; elems/feats = 5.2; dups size % = 14% geom2~=3: size = 214457: elements = 73798; features = 2534; elems/feats = 29.1; dups size % = 20% Outer AREA geometry trg0: size = 203673: elements = 40681; features = 3355; elems/feats = 12.1; bytes/elems = 5.0 trg1w/0: size = 604080: elements = 155702; feats w/trg0 = 3355; elems/feats = 46.4; size factor = 3.0x; elems factor = 3.8x trg1: size = 859230: elements = 204763; features = 10289; elems/feats = 19.9; bytes/elems = 4.2 trg2w/1: size = 1861674: elements = 504368; feats w/trg1 = 10289; elems/feats = 49.0; size factor = 2.2x; elems factor = 2.5x trg2: size = 2075253: elements = 545051; features = 18169; elems/feats = 30.0; bytes/elems = 3.8 trg3w/2: size = 4096561: elements = 1174955; feats w/trg2 = 18169; elems/feats = 64.7; size factor = 2.0x; elems factor = 2.2x trg3: size = 4299267: elements = 1227069; features = 24519; elems/feats = 50.0; bytes/elems = 3.5 Geometry almost duplicating (<1.5x less elements) a more detailed one trg0~=1: size = 1304: elements = 200; features = 44; elems/feats = 4.5; dups size % = 0% trg1~=2: size = 137969: elements = 35611; features = 753; elems/feats = 47.3; dups size % = 16% trg2~=3: size = 1066718: elements = 281620; features = 5185; elems/feats = 54.3; dups size % = 51% ``` E.g. `geom3w/2` contains `geom3` stats for features that have `geom2` also, hence we can compare actual effect of geometry simplification (in terms of data size and points/triangles counts). Almost duplicating geometry stats give an idea how often geometry simplification doesn't result in much size/complexity difference. E.g. its useful for #2927 analysis. I think a 1.5x threshold is quite conservative (as seen in this sample an avg complexity simplification factor is >= 2.2x).
biodranik (Migrated from github.com) requested changes 2022-07-08 21:49:25 +00:00
biodranik (Migrated from github.com) commented 2022-07-08 21:24:19 +00:00

calc_ is not needed here and below

calc_ is not needed here and below
biodranik (Migrated from github.com) commented 2022-07-08 21:26:05 +00:00
      LOG(LINFO, ("General statistics for", dataFile));
```suggestion LOG(LINFO, ("General statistics for", dataFile)); ```
biodranik (Migrated from github.com) commented 2022-07-08 21:26:22 +00:00
      LOG(LINFO, ("Geometry statistics for", dataFile));
```suggestion LOG(LINFO, ("Geometry statistics for", dataFile)); ```
biodranik (Migrated from github.com) commented 2022-07-08 21:26:38 +00:00
      LOG(LINFO, ("Types statistics for", dataFile));
```suggestion LOG(LINFO, ("Types statistics for", dataFile)); ```
@ -16,4 +15,2 @@
using namespace std;
namespace stats
{
biodranik (Migrated from github.com) commented 2022-07-08 21:26:52 +00:00
  static constexpr double kAlmostDupElemsFactor = 1.5;
```suggestion static constexpr double kAlmostDupElemsFactor = 1.5; ```
biodranik (Migrated from github.com) commented 2022-07-08 21:27:29 +00:00

Move outside of try

Move outside of try
biodranik (Migrated from github.com) commented 2022-07-08 21:28:15 +00:00
  static constexpr double kAreas[] = { 10, 20, 50, 100, 200, 500, 1000, 5000, 360*360*12400 };
```suggestion static constexpr double kAreas[] = { 10, 20, 50, 100, 200, 500, 1000, 5000, 360*360*12400 }; ```
@ -55,3 +57,3 @@
{
f.ParseBeforeStatistic();
f.ParseHeader2();
biodranik (Migrated from github.com) commented 2022-07-08 21:30:42 +00:00

Please use kMaxScalesCount directly for better readability of the code below. You can also try using for a less verbose syntax.

Please use kMaxScalesCount directly for better readability of the code below. You can also try `using` for a less verbose syntax.
biodranik (Migrated from github.com) commented 2022-07-08 21:31:14 +00:00
      for (int i = 0; i < n; ++i)
```suggestion for (int i = 0; i < n; ++i) ```
biodranik (Migrated from github.com) commented 2022-07-08 21:33:44 +00:00

Please make reusable constants for each call to geom.m_sizes[i] and alike, it will make the code more readable.

Please make reusable constants for each call to geom.m_sizes[i] and alike, it will make the code more readable.
biodranik (Migrated from github.com) commented 2022-07-08 21:36:11 +00:00

Stats was used above, does it make sense to use PrintStats here?

Stats was used above, does it make sense to use PrintStats here?
biodranik (Migrated from github.com) commented 2022-07-08 21:36:57 +00:00

Please use this constant directly for readability.

Please use this constant directly for readability.
biodranik (Migrated from github.com) commented 2022-07-08 21:38:46 +00:00
    using feature::DataHeader;
    for (int i = 0; i < DataHeader::kMaxScalesCount; ++i)
```suggestion using feature::DataHeader; for (int i = 0; i < DataHeader::kMaxScalesCount; ++i) ```
biodranik (Migrated from github.com) commented 2022-07-08 21:40:18 +00:00

You don't need that constructor, just assign zero to each member variable directly.

You don't need that constructor, just assign zero to each member variable directly.
biodranik (Migrated from github.com) commented 2022-07-08 21:42:46 +00:00

A meaningful error message here and below could be useful.

A meaningful error message here and below could be useful.
biodranik (Migrated from github.com) commented 2022-07-08 21:43:15 +00:00
  size_t const scalesCount = m_loadInfo->GetScalesCount();

?

```suggestion size_t const scalesCount = m_loadInfo->GetScalesCount(); ``` ?
biodranik (Migrated from github.com) commented 2022-07-08 21:43:54 +00:00

Why ind? We always use i for indexes and it for iterators.

Why ind? We always use i for indexes and it for iterators.
biodranik (Migrated from github.com) commented 2022-07-08 21:44:44 +00:00

nit: numPoints or pointsCount?

nit: numPoints or pointsCount?
biodranik (Migrated from github.com) commented 2022-07-08 21:45:47 +00:00

Make the reused reference for m_offsets.m_pts[ind] ?

Make the reused reference for m_offsets.m_pts[ind] ?
biodranik (Migrated from github.com) commented 2022-07-08 21:47:13 +00:00

ditto here and below

ditto here and below
biodranik (Migrated from github.com) commented 2022-07-08 21:41:53 +00:00

Did you find that it was not used anywhere?

Did you find that it was not used anywhere?
pastk reviewed 2022-07-09 11:29:16 +00:00
Author
Owner

a common prefix is needed to keep all 3 options together in the --help output

a common prefix is needed to keep all 3 options together in the `--help` output
pastk reviewed 2022-07-09 11:34:14 +00:00
Author
Owner

What is the point of repeating dataFile on each line?
For a --calc_stats=true --calc_geom_stats=true --calc_type_stats=true command it'll be printed 4 times!

ATM its clean and logical imho:

~/dev/organicmaps$ ../omim-build-release/generator_tool --calc_stats=true --calc_geom_stats=true --calc_type_stats=true --output="Russia_Moscow"
LOG TID(1) INFO    1.699e-05 generator_tool/generator_tool.cpp:656 operator()(): Calculating statistics for ./data/Russia_Moscow.mwm
LOG TID(1) INFO      2.49543 generator_tool/generator_tool.cpp:656 operator()(): Writing general statistics
LOG TID(1) INFO      2.49615 generator_tool/generator_tool.cpp:656 operator()(): Writing geometry statistics
LOG TID(1) INFO      2.49633 generator_tool/generator_tool.cpp:656 operator()(): Writing types statistics
LOG TID(1) INFO      2.49975 generator_tool/generator_tool.cpp:656 operator()(): Stats written to file Russia_Moscow.stats
What is the point of repeating `dataFile` on each line? For a `--calc_stats=true --calc_geom_stats=true --calc_type_stats=true` command it'll be printed 4 times! ATM its clean and logical imho: ``` ~/dev/organicmaps$ ../omim-build-release/generator_tool --calc_stats=true --calc_geom_stats=true --calc_type_stats=true --output="Russia_Moscow" LOG TID(1) INFO 1.699e-05 generator_tool/generator_tool.cpp:656 operator()(): Calculating statistics for ./data/Russia_Moscow.mwm LOG TID(1) INFO 2.49543 generator_tool/generator_tool.cpp:656 operator()(): Writing general statistics LOG TID(1) INFO 2.49615 generator_tool/generator_tool.cpp:656 operator()(): Writing geometry statistics LOG TID(1) INFO 2.49633 generator_tool/generator_tool.cpp:656 operator()(): Writing types statistics LOG TID(1) INFO 2.49975 generator_tool/generator_tool.cpp:656 operator()(): Stats written to file Russia_Moscow.stats ```
biodranik (Migrated from github.com) reviewed 2022-07-09 11:38:23 +00:00
biodranik (Migrated from github.com) commented 2022-07-09 11:38:23 +00:00

-stats, -stats_type (or -stats-type, or better both :) ), -stats_geom

-stats, -stats_type (or -stats-type, or better both :) ), -stats_geom
biodranik (Migrated from github.com) reviewed 2022-07-09 11:39:19 +00:00
biodranik (Migrated from github.com) commented 2022-07-09 11:39:19 +00:00

The point is to easily grep it from logs if necessary for multi-file processing. But you can leave it as is.

The point is to easily grep it from logs if necessary for multi-file processing. But you can leave it as is.
pastk reviewed 2022-07-09 11:46:03 +00:00
Author
Owner

yeap, it was used in stats code only

yeap, it was used in stats code only
pastk reviewed 2022-07-09 11:55:07 +00:00
Author
Owner

ind seems to be used throughout this file to refer to the scale index specifically, so I used it too to be consistent;
I'm ok to change it to i if you think its better still

`ind` seems to be used throughout this file to refer to the scale index specifically, so I used it too to be consistent; I'm ok to change it to `i` if you think its better still
Author
Owner

Made geometry dup factor a configurable param.
And misc changes as per review.

Made geometry dup factor a configurable param. And misc changes as per review.
biodranik (Migrated from github.com) requested changes 2022-07-10 05:14:42 +00:00
biodranik (Migrated from github.com) commented 2022-07-10 05:10:28 +00:00

It would better fit 2 lines instead of 3, no?

It would better fit 2 lines instead of 3, no?
@ -32,3 +33,4 @@
LOG(LWARNING, ("Error reading file:", fPath, ex.Msg()));
}
os << "\n";
}
biodranik (Migrated from github.com) commented 2022-07-10 05:11:50 +00:00

const? std::lower_bound?

const? std::lower_bound?
biodranik (Migrated from github.com) commented 2022-07-10 05:11:59 +00:00

std::distance?

std::distance?
@ -72,2 +105,4 @@
// Header size (incl. inner geometry) + outer geometry size.
uint32_t const allSize = innerStats.m_size + outerGeomSize + outerTrgSize;
double len = 0.0;
biodranik (Migrated from github.com) commented 2022-07-10 05:12:39 +00:00

The current name is bad, let's think about a better one.

The current name is bad, let's think about a better one.
@ -476,4 +475,3 @@
m_parsed.m_points = true;
}
return sz;
}
biodranik (Migrated from github.com) commented 2022-07-10 05:14:23 +00:00

Thanks, it's clear now!

Thanks, it's clear now!
pastk reviewed 2022-07-10 08:22:40 +00:00
@ -72,2 +105,4 @@
// Header size (incl. inner geometry) + outer geometry size.
uint32_t const allSize = innerStats.m_size + outerGeomSize + outerTrgSize;
double len = 0.0;
Author
Owner

yeah its because there is a very first one-byte header also

yeah its because there is a very first one-byte header also
biodranik (Migrated from github.com) reviewed 2022-07-10 09:20:07 +00:00
@ -72,2 +105,4 @@
// Header size (incl. inner geometry) + outer geometry size.
uint32_t const allSize = innerStats.m_size + outerGeomSize + outerTrgSize;
double len = 0.0;
biodranik (Migrated from github.com) commented 2022-07-10 09:20:07 +00:00

ParseSecondHeader?

ParseSecondHeader?
vng (Migrated from github.com) reviewed 2022-07-10 11:50:30 +00:00
@ -18,3 +15,3 @@
namespace stats
{
void FileContainerStatistic(std::ostream & os, string const & fPath)
using namespace feature;
vng (Migrated from github.com) commented 2022-07-10 11:33:34 +00:00

Put using namespace under namespace stats here.
Also, I don't mind to have using std under stats, but it's up to you.

Put using namespace under namespace stats here. Also, I don't mind to have using std under stats, but it's up to you.
@ -72,2 +105,4 @@
// Header size (incl. inner geometry) + outer geometry size.
uint32_t const allSize = innerStats.m_size + outerGeomSize + outerTrgSize;
double len = 0.0;
vng (Migrated from github.com) commented 2022-07-10 11:32:32 +00:00

Well, ParseHeader2 name is ok, while this function is in private sections as it was before.
Why not keep previous ParseToCalculateStats() ?

Well, ParseHeader2 name is ok, while this function is in private sections as it was before. Why not keep previous ParseToCalculateStats() ?
vng (Migrated from github.com) commented 2022-07-10 11:34:17 +00:00

Or FeatureType::GetInnerStats() can call it inside.

Or FeatureType::GetInnerStats() can call it inside.
@ -114,3 +149,3 @@
}
void PrintInfo(std::ostream & os, std::string const & prefix, GeneralInfo const & info, bool measurements)
void PrintInfo(std::ostream & os, std::string const & prefix,
vng (Migrated from github.com) commented 2022-07-10 11:39:09 +00:00

geomElems > 0, trgElems > 0 are useless here

geomElems > 0, trgElems > 0 are useless here
@ -116,1 +151,3 @@
void PrintInfo(std::ostream & os, std::string const & prefix, GeneralInfo const & info, bool measurements)
void PrintInfo(std::ostream & os, std::string const & prefix,
GeneralInfo const & info, uint8_t prefixWidth = 1,
bool names = false, bool measurements = false)
vng (Migrated from github.com) commented 2022-07-10 11:49:07 +00:00

Can't realize, how m_byLineGeomCompared and m_byLineGeomDup will help us to understand something? Ok, we have 5M bytes of geometry that have previous simplified geometry, and?

Why not to collect and print histogram in range of [0.1, 0.2, ..., 0.9, 1.0]?
where factor = geom. m_sizes[ind - 1] / geom. m_sizes[ind] ?

Can't realize, how m_byLineGeomCompared and m_byLineGeomDup will help us to understand something? Ok, we have 5M bytes of geometry that have previous simplified geometry, and? Why not to collect and print histogram in range of [0.1, 0.2, ..., 0.9, 1.0]? where factor = geom. m_sizes[ind - 1] / geom. m_sizes[ind] ?
vng (Migrated from github.com) reviewed 2022-07-10 12:04:53 +00:00
@ -510,0 +579,4 @@
res.m_sizes[ind] = static_cast<uint32_t>(src.Pos() - m_offsets.m_trg[ind]);
res.m_elements[ind] = m_triangles.size() / 3;
}
}
vng (Migrated from github.com) commented 2022-07-10 12:04:53 +00:00

Can be unified with GetOuterTrianglesStats. Store in temporary variable points or modifying m_points

Can be unified with GetOuterTrianglesStats. Store in temporary variable points or modifying m_points
pastk reviewed 2022-07-10 12:58:38 +00:00
@ -72,2 +105,4 @@
// Header size (incl. inner geometry) + outer geometry size.
uint32_t const allSize = innerStats.m_size + outerGeomSize + outerTrgSize;
double len = 0.0;
Author
Owner

Well, ParseHeader2 name is ok, while this function is in private sections as it was before. Why not keep previous ParseToCalculateStats() ?

Because it was used in a context that has nothing to do with stats, i.e. feature_segments_checker.cpp and feature_list.cpp.

> Well, ParseHeader2 name is ok, while this function is in private sections as it was before. Why not keep previous ParseToCalculateStats() ? Because it was used in a context that has nothing to do with stats, i.e. `feature_segments_checker.cpp` and `feature_list.cpp`.
pastk reviewed 2022-07-10 13:40:11 +00:00
@ -116,1 +151,3 @@
void PrintInfo(std::ostream & os, std::string const & prefix, GeneralInfo const & info, bool measurements)
void PrintInfo(std::ostream & os, std::string const & prefix,
GeneralInfo const & info, uint8_t prefixWidth = 1,
bool names = false, bool measurements = false)
Author
Owner
   trg2: size =   2075253: elements =    545051; features =   18169; elems/feats =  530.0; bytes/elems =  3.8
trg3w/2: size =   4096561: elements =   1174955; feats w/trg2 =   18169; elems/feats =  64.7; size factor =  2.0x; elems factor =  2.2x

m_byLineGeomCompared shows us that 4.1Mb or 1.2M of triangles were simplified into 2.1Mb / 0.5M. So overall simplification effectiveness was ~2x (which is not much compared to trg1/trg0 difference).

The factor = geom. m_sizes[ind - 1] / geom. m_sizes[ind] is misleading, because it includes e.g. features with trg3 without trg2, i.e. they were not simplified. Hence we can't judge simplification effectiveness looking at this factor alone.

m_byLineGeomDup shows percentage of very similar geometries. It gives an idea of potential size savings if we decide to omit such similar geometries.

A histogram is a great idea and will give much more info than one number, but its just takes more time to implement :)

``` trg2: size = 2075253: elements = 545051; features = 18169; elems/feats = 530.0; bytes/elems = 3.8 trg3w/2: size = 4096561: elements = 1174955; feats w/trg2 = 18169; elems/feats = 64.7; size factor = 2.0x; elems factor = 2.2x ``` `m_byLineGeomCompared` shows us that 4.1Mb or 1.2M of triangles were simplified into 2.1Mb / 0.5M. So overall simplification effectiveness was ~2x (which is not much compared to trg1/trg0 difference). The `factor = geom. m_sizes[ind - 1] / geom. m_sizes[ind]` is misleading, because it includes e.g. features with trg3 without trg2, i.e. they were not simplified. Hence we can't judge simplification effectiveness looking at this factor alone. `m_byLineGeomDup` shows percentage of very similar geometries. It gives an idea of potential size savings if we decide to omit such similar geometries. A histogram is a great idea and will give much more info than one number, but its just takes more time to implement :)
pastk reviewed 2022-07-10 13:54:09 +00:00
@ -114,3 +149,3 @@
}
void PrintInfo(std::ostream & os, std::string const & prefix, GeneralInfo const & info, bool measurements)
void PrintInfo(std::ostream & os, std::string const & prefix,
Author
Owner

you're right, thanks!

you're right, thanks!
pastk reviewed 2022-07-10 14:00:02 +00:00
@ -510,0 +579,4 @@
res.m_sizes[ind] = static_cast<uint32_t>(src.Pos() - m_offsets.m_trg[ind]);
res.m_elements[ind] = m_triangles.size() / 3;
}
}
Author
Owner

It should be possible (and unifying ParseGeometry with ParseTriangles too) though there will be quite some ifs to handle line/area differences. I just don't have time for this now, sorry.

It should be possible (and unifying ParseGeometry with ParseTriangles too) though there will be quite some `ifs` to handle line/area differences. I just don't have time for this now, sorry.
biodranik (Migrated from github.com) approved these changes 2022-07-10 20:31:43 +00:00
biodranik (Migrated from github.com) left a comment

@vng ?

@vng ?
vng (Migrated from github.com) approved these changes 2022-07-10 21:03:42 +00:00
This repo is archived. You cannot comment on pull requests.
No reviewers
No labels
Accessibility
Accessibility
Address
Address
Android
Android
Android Auto
Android Auto
Android Automotive (AAOS)
Android Automotive (AAOS)
API
API
AppGallery
AppGallery
AppStore
AppStore
Battery and Performance
Battery and Performance
Blocker
Blocker
Bookmarks and Tracks
Bookmarks and Tracks
Borders
Borders
Bug
Bug
Build
Build
CarPlay
CarPlay
Classificator
Classificator
Community
Community
Core
Core
CrashReports
CrashReports
Cycling
Cycling
Desktop
Desktop
DevEx
DevEx
DevOps
DevOps
dev_sandbox
dev_sandbox
Directions
Directions
Documentation
Documentation
Downloader
Downloader
Drape
Drape
Driving
Driving
Duplicate
Duplicate
Editor
Editor
Elevation
Elevation
Enhancement
Enhancement
Epic
Epic
External Map Datasets
External Map Datasets
F-Droid
F-Droid
Fonts
Fonts
Frequently User Reported
Frequently User Reported
Fund
Fund
Generator
Generator
Good first issue
Good first issue
Google Play
Google Play
GPS
GPS
GSoC
GSoC
iCloud
iCloud
Icons
Icons
iOS
iOS
Legal
Legal
Linux Desktop
Linux Desktop
Linux packaging
Linux packaging
Linux Phone
Linux Phone
Mac OS
Mac OS
Map Data
Map Data
Metro
Metro
Navigation
Navigation
Need Feedback
Need Feedback
Night Mode
Night Mode
NLnet 2024-06-281
NLnet 2024-06-281
No Feature Parity
No Feature Parity
Opening Hours
Opening Hours
Outdoors
Outdoors
POI Info
POI Info
Privacy
Privacy
Public Transport
Public Transport
Raw Idea
Raw Idea
Refactoring
Refactoring
Regional
Regional
Regression
Regression
Releases
Releases
RoboTest
RoboTest
Route Planning
Route Planning
Routing
Routing
Ruler
Ruler
Search
Search
Security
Security
Styles
Styles
Tests
Tests
Track Recording
Track Recording
Translations
Translations
TTS
TTS
UI
UI
UX
UX
Walk Navigation
Walk Navigation
Watches
Watches
Web
Web
Wikipedia
Wikipedia
Windows
Windows
Won't fix
Won't fix
World Map
World Map
No milestone
No project
No assignees
2 participants
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: organicmaps/organicmaps-tmp#2935
No description provided.