[tools] Fix map file stats and add outer geometry stats #2935
Labels
No labels
Accessibility
Accessibility
Address
Address
Android
Android
Android Auto
Android Auto
Android Automotive (AAOS)
Android Automotive (AAOS)
API
API
AppGallery
AppGallery
AppStore
AppStore
Battery and Performance
Battery and Performance
Blocker
Blocker
Bookmarks and Tracks
Bookmarks and Tracks
Borders
Borders
Bug
Bug
Build
Build
CarPlay
CarPlay
Classificator
Classificator
Community
Community
Core
Core
CrashReports
CrashReports
Cycling
Cycling
Desktop
Desktop
DevEx
DevEx
DevOps
DevOps
dev_sandbox
dev_sandbox
Directions
Directions
Documentation
Documentation
Downloader
Downloader
Drape
Drape
Driving
Driving
Duplicate
Duplicate
Editor
Editor
Elevation
Elevation
Enhancement
Enhancement
Epic
Epic
External Map Datasets
External Map Datasets
F-Droid
F-Droid
Fonts
Fonts
Frequently User Reported
Frequently User Reported
Fund
Fund
Generator
Generator
Good first issue
Good first issue
Google Play
Google Play
GPS
GPS
GSoC
GSoC
iCloud
iCloud
Icons
Icons
iOS
iOS
Legal
Legal
Linux Desktop
Linux Desktop
Linux packaging
Linux packaging
Linux Phone
Linux Phone
Mac OS
Mac OS
Map Data
Map Data
Metro
Metro
Navigation
Navigation
Need Feedback
Need Feedback
Night Mode
Night Mode
NLnet 2024-06-281
NLnet 2024-06-281
No Feature Parity
No Feature Parity
Opening Hours
Opening Hours
Outdoors
Outdoors
POI Info
POI Info
Privacy
Privacy
Public Transport
Public Transport
Raw Idea
Raw Idea
Refactoring
Refactoring
Regional
Regional
Regression
Regression
Releases
Releases
RoboTest
RoboTest
Route Planning
Route Planning
Routing
Routing
Ruler
Ruler
Search
Search
Security
Security
Styles
Styles
Tests
Tests
Track Recording
Track Recording
Translations
Translations
TTS
TTS
UI
UI
UX
UX
Walk Navigation
Walk Navigation
Watches
Watches
Web
Web
Wikipedia
Wikipedia
Windows
Windows
Won't fix
Won't fix
World Map
World Map
No milestone
No project
No assignees
2 participants
Due date
No due date set.
Dependencies
No dependencies set.
Reference: organicmaps/organicmaps-tmp#2935
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "pastk-tools-mwmstats"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Sample run cmd:
Sample stats:
Sample outer geometry stats:
E.g.
geom3w/2
containsgeom3
stats for features that havegeom2
also, hence we can compare actual effect of geometry simplification (in terms of data size and points/triangles counts).Almost duplicating geometry stats give an idea how often geometry simplification doesn't result in much size/complexity difference. E.g. its useful for #2927 analysis. I think a 1.5x threshold is quite conservative (as seen in this sample an avg complexity simplification factor is >= 2.2x).
calc_ is not needed here and below
@ -16,4 +15,2 @@
using namespace std;
namespace stats
{
Move outside of try
@ -55,3 +57,3 @@
{
f.ParseBeforeStatistic();
f.ParseHeader2();
Please use kMaxScalesCount directly for better readability of the code below. You can also try
using
for a less verbose syntax.Please make reusable constants for each call to geom.m_sizes[i] and alike, it will make the code more readable.
Stats was used above, does it make sense to use PrintStats here?
Please use this constant directly for readability.
You don't need that constructor, just assign zero to each member variable directly.
A meaningful error message here and below could be useful.
?
Why ind? We always use i for indexes and it for iterators.
nit: numPoints or pointsCount?
Make the reused reference for m_offsets.m_pts[ind] ?
ditto here and below
Did you find that it was not used anywhere?
a common prefix is needed to keep all 3 options together in the
--help
outputWhat is the point of repeating
dataFile
on each line?For a
--calc_stats=true --calc_geom_stats=true --calc_type_stats=true
command it'll be printed 4 times!ATM its clean and logical imho:
-stats, -stats_type (or -stats-type, or better both :) ), -stats_geom
The point is to easily grep it from logs if necessary for multi-file processing. But you can leave it as is.
yeap, it was used in stats code only
ind
seems to be used throughout this file to refer to the scale index specifically, so I used it too to be consistent;I'm ok to change it to
i
if you think its better stillMade geometry dup factor a configurable param.
And misc changes as per review.
It would better fit 2 lines instead of 3, no?
@ -32,3 +33,4 @@
LOG(LWARNING, ("Error reading file:", fPath, ex.Msg()));
}
os << "\n";
}
const? std::lower_bound?
std::distance?
@ -72,2 +105,4 @@
// Header size (incl. inner geometry) + outer geometry size.
uint32_t const allSize = innerStats.m_size + outerGeomSize + outerTrgSize;
double len = 0.0;
The current name is bad, let's think about a better one.
@ -476,4 +475,3 @@
m_parsed.m_points = true;
}
return sz;
}
Thanks, it's clear now!
@ -72,2 +105,4 @@
// Header size (incl. inner geometry) + outer geometry size.
uint32_t const allSize = innerStats.m_size + outerGeomSize + outerTrgSize;
double len = 0.0;
yeah its because there is a very first one-byte header also
@ -72,2 +105,4 @@
// Header size (incl. inner geometry) + outer geometry size.
uint32_t const allSize = innerStats.m_size + outerGeomSize + outerTrgSize;
double len = 0.0;
ParseSecondHeader?
@ -18,3 +15,3 @@
namespace stats
{
void FileContainerStatistic(std::ostream & os, string const & fPath)
using namespace feature;
Put using namespace under namespace stats here.
Also, I don't mind to have using std under stats, but it's up to you.
@ -72,2 +105,4 @@
// Header size (incl. inner geometry) + outer geometry size.
uint32_t const allSize = innerStats.m_size + outerGeomSize + outerTrgSize;
double len = 0.0;
Well, ParseHeader2 name is ok, while this function is in private sections as it was before.
Why not keep previous ParseToCalculateStats() ?
Or FeatureType::GetInnerStats() can call it inside.
@ -114,3 +149,3 @@
}
void PrintInfo(std::ostream & os, std::string const & prefix, GeneralInfo const & info, bool measurements)
void PrintInfo(std::ostream & os, std::string const & prefix,
geomElems > 0, trgElems > 0 are useless here
@ -116,1 +151,3 @@
void PrintInfo(std::ostream & os, std::string const & prefix, GeneralInfo const & info, bool measurements)
void PrintInfo(std::ostream & os, std::string const & prefix,
GeneralInfo const & info, uint8_t prefixWidth = 1,
bool names = false, bool measurements = false)
Can't realize, how m_byLineGeomCompared and m_byLineGeomDup will help us to understand something? Ok, we have 5M bytes of geometry that have previous simplified geometry, and?
Why not to collect and print histogram in range of [0.1, 0.2, ..., 0.9, 1.0]?
where factor = geom. m_sizes[ind - 1] / geom. m_sizes[ind] ?
@ -510,0 +579,4 @@
res.m_sizes[ind] = static_cast<uint32_t>(src.Pos() - m_offsets.m_trg[ind]);
res.m_elements[ind] = m_triangles.size() / 3;
}
}
Can be unified with GetOuterTrianglesStats. Store in temporary variable points or modifying m_points
@ -72,2 +105,4 @@
// Header size (incl. inner geometry) + outer geometry size.
uint32_t const allSize = innerStats.m_size + outerGeomSize + outerTrgSize;
double len = 0.0;
Because it was used in a context that has nothing to do with stats, i.e.
feature_segments_checker.cpp
andfeature_list.cpp
.@ -116,1 +151,3 @@
void PrintInfo(std::ostream & os, std::string const & prefix, GeneralInfo const & info, bool measurements)
void PrintInfo(std::ostream & os, std::string const & prefix,
GeneralInfo const & info, uint8_t prefixWidth = 1,
bool names = false, bool measurements = false)
m_byLineGeomCompared
shows us that 4.1Mb or 1.2M of triangles were simplified into 2.1Mb / 0.5M. So overall simplification effectiveness was ~2x (which is not much compared to trg1/trg0 difference).The
factor = geom. m_sizes[ind - 1] / geom. m_sizes[ind]
is misleading, because it includes e.g. features with trg3 without trg2, i.e. they were not simplified. Hence we can't judge simplification effectiveness looking at this factor alone.m_byLineGeomDup
shows percentage of very similar geometries. It gives an idea of potential size savings if we decide to omit such similar geometries.A histogram is a great idea and will give much more info than one number, but its just takes more time to implement :)
@ -114,3 +149,3 @@
}
void PrintInfo(std::ostream & os, std::string const & prefix, GeneralInfo const & info, bool measurements)
void PrintInfo(std::ostream & os, std::string const & prefix,
you're right, thanks!
@ -510,0 +579,4 @@
res.m_sizes[ind] = static_cast<uint32_t>(src.Pos() - m_offsets.m_trg[ind]);
res.m_elements[ind] = m_triangles.size() / 3;
}
}
It should be possible (and unifying ParseGeometry with ParseTriangles too) though there will be quite some
ifs
to handle line/area differences. I just don't have time for this now, sorry.@vng ?