Issue 2787 - Facebook validation fix #2793

strump · 2022-06-21T10:30:24Z

strump commented

2022-06-21 10:30:24 +00:00

Working on #2787 issue.

Looks like FB validation fails because of extra character á (Latin Small Letter A with acute). It's part of Latin-1 Supplement part of codepage.

This extra char code is \xE1. But when we get string from Java to C++ the char is converted to unicode bytes \xC3\xA1. How to write correct regex to cover 2 bytes chars?

And another question: should we also validate other Latin-1 Supplement characters such as Ç, Ó, ß, ñ ?

Update 2022-06-27
Facebook allows very broad range of symbols in it's page urls. So validation of Facebook page input is now simplified: all characters are allowed except !@^*()~@[]{}#$%&;,:+"'/\. This will fix issue #2787 and allow all sort of non-latin alphabets in OrganicMaps.

Note: some symbols are allowed by OrganicMaps but not in FB. For example: £¤¥©®. Eliminating these exceptions is hard unicode related task and is out if the scope of this PR.

Working on #2787 issue. Looks like FB validation fails because of extra character `á` (Latin Small Letter A with acute). It's part of [Latin-1 Supplement](https://en.wikipedia.org/wiki/Latin-1_Supplement) part of codepage. This extra char code is `\xE1`. But when we get string from Java to C++ the char is converted to unicode bytes `\xC3\xA1`. How to write correct regex to cover 2 bytes chars? And another question: should we also validate other Latin-1 Supplement characters such as `Ç`, `Ó`, `ß`, `ñ` ? **Update 2022-06-27** Facebook allows very broad range of symbols in it's page urls. So validation of Facebook page input is now simplified: all characters are allowed except ` !@^*()~@[]{}#$%&;,:+"'/\`. This will fix issue #2787 and allow all sort of non-latin alphabets in OrganicMaps. Note: some symbols are allowed by OrganicMaps but not in FB. For example: `£¤¥©®`. Eliminating these exceptions is hard unicode related task and is out if the scope of this PR.

vng commented

2022-06-21 13:12:51 +00:00

(Migrated from github.com)

We can make UTF-32 (see strings::MakeUniString) string and run regexp on this uint32_t string
How will we get a full set of allowed "ascent descent" chars?
Probably, strings::Normalize function can help us here

- We can make UTF-32 (see strings::MakeUniString) string and run regexp on this uint32_t string - How will we get a full set of allowed "ascent descent" chars? - Probably, strings::Normalize function can help us here

strump commented

2022-06-21 16:19:32 +00:00

Ok, I digged a little into Facebook pages and got some not funny news.

Go to https://www.facebook.com/pages/create and put any strange symbols in "Name" field. You can create a page with very wide range of symbols in it's name. And those symbols could appear in URL.

Symbol	Allowed in name	Allowed in URL	isalnum()
`!@^*()~@©[]{}¢¦¨®¯`	-	-	`False`
`#$%&;,.:'+/\`	+	-	`False`
`¶«»"'/¡£¤¥§¬`	+	-	`False`
`ª À Á Â Ã Ä Å Æ Ç`	+	+	`True`
`È É Ê Ë Ì Í Î Ï`	+	+	`True`
`º`	+	+	`True`
`¼½¾`	+	+	`False`
`オーガニックマップ`	+	+	`True`
`מפות אורגניות` (R-to-L lang)	+	+	`True`

It means if you create a page with name MÊGÅ «¿¶» CÄFË º¼½¾ then FB will register URL: MÊGÅ--CÄFË-º¼½¾-3141592653589793 (you can test it yourself 😉)

So we need to allow only some symbols and all alphanum symbols. This is not 100% solution (symbols ¼½¾ would be marked as invalid) but we can live with it.

In Python it would look like:

def validateFB(page_name):
    return all(ch in '-_' or ch.isalnum() for ch in page_name)

Question: is there any isalnum() functions in C++ for Unicode characters testing?

Ok, I digged a little into Facebook pages and got some not funny news. Go to https://www.facebook.com/pages/create and put any strange symbols in "Name" field. You can create a page with very wide range of symbols in it's name. And those symbols **could** appear in URL. | Symbol | Allowed in name | Allowed in URL | isalnum() | | ------ | --------------- | -------------- | --------- | | `!@^*()~@©[]{}¢¦¨®¯` | - | - | `False` | | `#$%&;,.:'+/\` | + | - | `False` | | `¶«»"'/¡£¤¥§¬` | + | - | `False` | | `ª À Á Â Ã Ä Å Æ Ç` | + | + | `True` | | `È É Ê Ë Ì Í Î Ï` | + | + | `True` | | `º` | + | + | `True` | | `¼½¾` | + | + | `False` | | `オーガニックマップ` | + | + | `True` | | `מפות אורגניות` (R-to-L lang) | + | + | `True` | It means if you create a page with name `MÊGÅ «¿¶» CÄFË º¼½¾` then FB will register URL: `MÊGÅ--CÄFË-º¼½¾-3141592653589793` (you can test it yourself 😉) So we need to allow only some symbols and all alphanum symbols. This is not 100% solution (symbols `¼½¾` would be marked as invalid) but we can live with it. In Python it would look like: ``` def validateFB(page_name): return all(ch in '-_' or ch.isalnum() for ch in page_name) ``` **Question:** is there any `isalnum()` functions in C++ for Unicode characters testing?

vng commented

2022-06-21 21:58:14 +00:00

(Migrated from github.com)

Well, with this logic:

std::string str = "MÊGÅ--CÄFË";

using namespace strings;
auto uniStr = MakeUniString(str);
MakeLowerCaseInplace(uniStr);
NormalizeInplace(uniStr);
NormalizeDigits(uniStr);

str = ToUtf8(uniStr);
// here we can check that str is a full ascii alpha-numeric string and "-", "_"

Well, with this logic: ``` std::string str = "MÊGÅ--CÄFË"; using namespace strings; auto uniStr = MakeUniString(str); MakeLowerCaseInplace(uniStr); NormalizeInplace(uniStr); NormalizeDigits(uniStr); str = ToUtf8(uniStr); // here we can check that str is a full ascii alpha-numeric string and "-", "_" ```

biodranik commented

2022-06-22 12:17:37 +00:00

(Migrated from github.com)

std::regex should more or less work with utf-8. Let's find a cleaner solution. So far please create some unit tests, btw they could be built for android and run on a device.

strump commented

2022-06-22 13:27:42 +00:00

@vng your solution with NormalizeInplace function will work. After normalization we can use regex to find Latin characters and numbers.

But now to check non-Latin letters: Cyrillic or Japanese symbols? Regex doesn't support it.

We can continue with NormalizeInplace and accept the fact that OM will mark some Facebook pages as invalid. E.g. https://osm.org/node/5296186621 has contact:facebook=https://ru-ru.facebook.com/Институт-повышения-квалификации-и-переподготовки-экономических-кадров-БГЭУ-1418862625037617/)

@vng your solution with `NormalizeInplace` function will work. After normalization we can use regex to find Latin characters and numbers. But now to check non-Latin letters: Cyrillic or Japanese symbols? Regex doesn't support it. We can continue with `NormalizeInplace` and accept the fact that OM will mark some Facebook pages as invalid. E.g. https://osm.org/node/5296186621 has `contact:facebook=https://ru-ru.facebook.com/Институт-повышения-квалификации-и-переподготовки-экономических-кадров-БГЭУ-1418862625037617/)`

biodranik commented

2022-06-22 14:39:44 +00:00

(Migrated from github.com)

We can continue with NormalizeInplace and accept the fact that OM will mark some Facebook pages as invalid.

No. It's better to turn normalize off completely for these pages if there are no other clean solutions. Or add proper unicode ranges into the std::regex.

> We can continue with NormalizeInplace and accept the fact that OM will mark some Facebook pages as invalid. No. It's better to turn normalize off completely for these pages if there are no other clean solutions. Or add proper unicode ranges into the std::regex.

vng commented

2022-06-23 11:56:06 +00:00

(Migrated from github.com)

I agree with Alex to take FB links as is, without normalization.

strump commented

2022-06-23 13:46:17 +00:00

If we need to check if some unicode char is letter or symbol we need to have allowed ranges.

I wrote small python script to understand which Unicode chars are letters from Python 3.9 perspective: unicode-ranges-by-python.txt

We can write C++ function to check if UniChar hits any of those 250 ranges.

Looks like Facebook uses same unicode table because I was able to create page with name "Boat ᤀ ᥐ ᦀ ᨖ ᨠ ᐁ ᎀ ዀ ჺ ၕ ༀ ഒ" and URL: https://www.facebook.com/Boat-ᤀ-ᥐ-ᦀ-ᨖ-ᨠ-ᐁ-ᎀ-ዀ-ჺ-ၕ-ༀ-ഒ-102615019173012 😮

How do you think? Should I hardcode ranges and verify UniChar code to detect alphanum? (Did you even thought about using icu lib?)

If we need to check if some unicode char is letter or symbol we need to have allowed ranges. I wrote small python script to understand which Unicode chars are letters from Python 3.9 perspective: [unicode-ranges-by-python.txt](https://gist.github.com/strump/1714972a79a621fa42331236620fa028#file-unicode-ranges-by-python-txt) We can write C++ function to check if `UniChar` hits any of those 250 ranges. Looks like Facebook uses same unicode table because I was able to create page with name `"Boat ᤀ ᥐ ᦀ ᨖ ᨠ ᐁ ᎀ ዀ ჺ ၕ ༀ ഒ"` and URL: `https://www.facebook.com/Boat-ᤀ-ᥐ-ᦀ-ᨖ-ᨠ-ᐁ-ᎀ-ዀ-ჺ-ၕ-ༀ-ഒ-102615019173012` 😮 How do you think? Should I hardcode ranges and verify `UniChar` code to detect alphanum? (Did you even thought about using [icu lib](https://icu.unicode.org/design/cpp)?)

vng commented

2022-06-23 14:50:09 +00:00

(Migrated from github.com)

Why should we check and modify FB links?
If the link is correct, it should open without corrections.
If the link is incorrect, it won't open in any case, no?

Why should we check and modify FB links? If the link is correct, it should open without corrections. If the link is incorrect, it won't open in any case, no?

strump commented

2022-06-23 15:01:14 +00:00

@vng in MWM file and in editor we store only username or page name. If user enters https://www.facebook.com/hlekrestaurant then function string ValidateAndFormat_facebook(string const & facebookPage) will cut only page name "hlekrestaurant" and show it on PlacePage and in editor.

OM editor should support not only URLs but short usernames too. And we need to support non-latin characters validation in FB usernames.

@vng in MWM file and in editor we store only username or page name. If user enters `https://www.facebook.com/hlekrestaurant ` then function `string ValidateAndFormat_facebook(string const & facebookPage)` will cut only page name `"hlekrestaurant"` and show it on PlacePage and in editor. OM editor should support not only URLs but short usernames too. And we need to support non-latin characters validation in FB usernames.

vng commented

2022-06-23 21:03:42 +00:00

(Migrated from github.com)

So, if we will keep username/page name without any validation?

biodranik commented

2022-06-23 21:59:48 +00:00

(Migrated from github.com)

We already have ICU in the project.
It is relatively easy to write IsAlphaNumeric(UniChar) function by hard-coding allowed ranges. But are you sure that it would be enough to properly work with all possible allowed ranges in FB?
Why can't we avoid the validation of characters?

1. We already have ICU in the project. 2. It is relatively easy to write IsAlphaNumeric(UniChar) function by hard-coding allowed ranges. But are you sure that it would be enough to properly work with all possible allowed ranges in FB? 3. Why can't we avoid the validation of characters?

strump commented

2022-06-24 06:14:08 +00:00

I think we need some validation of user input, otherwise invalid data would be posted to OSM. But instead of unicode magic I can simplify rules: allow all chars except some symbols "!@^*()~©[]{}®#$%&;,:'+/\. If such username check fails then validate string as an URL.
How do you think?

I think we need some validation of user input, otherwise invalid data would be posted to OSM. But instead of unicode magic I can simplify rules: allow all chars except some symbols `"!@^*()~©[]{}®#$%&;,:'+/\`. If such username check fails then validate string as an URL. How do you think?

biodranik commented

2022-06-24 14:38:04 +00:00

(Migrated from github.com)

Looks like a good idea!

vng commented

2022-06-24 16:03:35 +00:00

(Migrated from github.com)

Agree, lets filter invalid url ascii symbols only.

biodranik (Migrated from github.com) requested changes 2022-06-27 22:33:32 +00:00

indexer/indexer_tests/validate_and_format_contacts_test.cpp

					
				@ -7,0 +13,4 @@

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("https://facebook.com/OpenStreetMap"), "OpenStreetMap", ());

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("http://www.facebook.com/OpenStreetMap"), "OpenStreetMap", ());

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("https://www.facebook.com/OpenStreetMap"), "OpenStreetMap", ());

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("https://en-us.facebook.com/OpenStreetMap"), "OpenStreetMap", ());

biodranik (Migrated from github.com) commented

2022-06-27 22:03:54 +00:00

Why it is not possible to encode UTF-8 strings directly, or with u8"строка" prefix?

Why it is not possible to encode UTF-8 strings directly, or with `u8"строка"` prefix?

biodranik (Migrated from github.com) commented

2022-06-27 22:05:37 +00:00

Did you try

  TEST(!osm::ValidateFacebookPage("you-shall-not-pass-£¤¥"), ());

or the same with u8 prefix?

Did you try ```suggestion TEST(!osm::ValidateFacebookPage("you-shall-not-pass-£¤¥"), ()); ``` or the same with u8 prefix?

indexer/indexer_tests/validate_and_format_contacts_test.cpp

					
				@ -25,2 +125,3 @@

				  TEST(!osm::ValidateFacebookPage("osm"), ());

				  TEST(!osm::ValidateFacebookPage("invalid_username"), ());

				  TEST(!osm::ValidateFacebookPage("@spaces are not welcome here"), ());

				  TEST(!osm::ValidateFacebookPage("spaces are not welcome here"), ());

biodranik (Migrated from github.com) commented

2022-06-27 22:04:31 +00:00

Is / allowed?

indexer/validate_and_format_contacts.cpp

biodranik (Migrated from github.com) commented

2022-06-27 22:15:11 +00:00

constexpr char kForbiddenFBSymbols[] = " !@^*()~[]{}#$%&;,:+\"'/\\";

```suggestion constexpr char kForbiddenFBSymbols[] = " !@^*()~[]{}#$%&;,:+\"'/\\"; ```

biodranik (Migrated from github.com) commented

2022-06-27 22:17:03 +00:00

Это точно все базовые запрещённые символы? Может надёжнее проверка типа c < '0', чтобы отсеять контрольные символы ASCII таблицы?

Это точно все базовые запрещённые символы? Может надёжнее проверка типа `c < '0'`, чтобы отсеять контрольные символы ASCII таблицы?

biodranik (Migrated from github.com) commented

2022-06-27 22:18:04 +00:00

Велосипед уже давно изобретён :)
if (std::string::npos == str.find_first_of(kForbiddenFBSymbols)) { не найдено }

Велосипед уже давно изобретён :) `if (std::string::npos == str.find_first_of(kForbiddenFBSymbols)) { не найдено }`

biodranik (Migrated from github.com) commented

2022-06-27 22:27:55 +00:00

Весь этот блок можно написать проще:

string const & fb = facebookPage.front() == '@' ? facebookPage.substr(1) : facebookPage;
if (string::npos == fb.find_first_of(kForbiddenFBSymbols, facebookPage.front() == '@' ? 1 : 0)
  return fb;
return {};

Весь этот блок можно написать проще: ``` string const & fb = facebookPage.front() == '@' ? facebookPage.substr(1) : facebookPage; if (string::npos == fb.find_first_of(kForbiddenFBSymbols, facebookPage.front() == '@' ? 1 : 0) return fb; return {}; ```

biodranik (Migrated from github.com) commented

2022-06-27 22:31:39 +00:00

Собачка дублировалась.

strump reviewed 2022-06-28 16:46:05 +00:00

indexer/indexer_tests/validate_and_format_contacts_test.cpp

					
				@ -7,0 +13,4 @@

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("https://facebook.com/OpenStreetMap"), "OpenStreetMap", ());

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("http://www.facebook.com/OpenStreetMap"), "OpenStreetMap", ());

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("https://www.facebook.com/OpenStreetMap"), "OpenStreetMap", ());

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("https://en-us.facebook.com/OpenStreetMap"), "OpenStreetMap", ());

strump commented

2022-06-28 16:46:04 +00:00

I tried. The problem is that find_first_of(...) method compares two strings byte by byte. And that's why:

std::string(u8"¥").find_first_of(u8"ქ") == 1;
std::string("¥").find_first_of("ქ") == 1;

Because in UTF encoding two strings "¥" and "ქ" have common byte \xa5. We need to compare unicode symbols not UTF8 bytes to search for symbols '£€¥'.

I tried. The problem is that `find_first_of(...)` method compares two strings byte by byte. And that's why: ``` std::string(u8"¥").find_first_of(u8"ქ") == 1; std::string("¥").find_first_of("ქ") == 1; ``` Because in UTF encoding two strings `"¥"` and `"ქ"` have common byte `\xa5`. We need to compare unicode symbols not UTF8 bytes to search for symbols `'£€¥'`.

strump reviewed 2022-06-28 18:20:20 +00:00

indexer/validate_and_format_contacts.cpp

strump commented

2022-06-28 18:20:20 +00:00

Я подумал, что если facebookPage.front() == '@', то нет смысла проверять на URL с помощью ValidateWebsite(facebookPage).
В предложенном тобою коде всё таки возможна двойная проверка для строк начинающихся с @.

Я подумал, что если `facebookPage.front() == '@'`, то нет смысла проверять на URL с помощью `ValidateWebsite(facebookPage)`. В предложенном тобою коде всё таки возможна двойная проверка для строк начинающихся с `@`.

strump reviewed 2022-06-28 18:37:55 +00:00

indexer/indexer_tests/validate_and_format_contacts_test.cpp

					
				@ -7,0 +13,4 @@

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("https://facebook.com/OpenStreetMap"), "OpenStreetMap", ());

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("http://www.facebook.com/OpenStreetMap"), "OpenStreetMap", ());

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("https://www.facebook.com/OpenStreetMap"), "OpenStreetMap", ());

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("https://en-us.facebook.com/OpenStreetMap"), "OpenStreetMap", ());

strump commented

2022-06-28 18:37:55 +00:00

Rewritten tests with u8"тестовая строка" syntax.

Rewritten tests with `u8"тестовая строка"` syntax.

strump reviewed 2022-06-28 18:39:16 +00:00

indexer/indexer_tests/validate_and_format_contacts_test.cpp

					
				@ -25,2 +125,3 @@

				  TEST(!osm::ValidateFacebookPage("osm"), ());

				  TEST(!osm::ValidateFacebookPage("invalid_username"), ());

				  TEST(!osm::ValidateFacebookPage("@spaces are not welcome here"), ());

				  TEST(!osm::ValidateFacebookPage("spaces are not welcome here"), ());

strump commented

2022-06-28 18:39:16 +00:00

Added test with symbols /@#. There are more forbiddeb symbols []{}#$%&;,:. Need separate test string for each 😔

Added test with symbols `/@#`. There are more forbiddeb symbols `[]{}#$%&;,:`. Need separate test string for each 😔

strump reviewed 2022-06-28 18:39:58 +00:00

indexer/validate_and_format_contacts.cpp

strump commented

2022-06-28 18:39:57 +00:00

Удалил свой велосипед ))

biodranik (Migrated from github.com) reviewed 2022-06-29 00:24:48 +00:00

indexer/indexer_tests/validate_and_format_contacts_test.cpp

					
				@ -7,0 +13,4 @@

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("https://facebook.com/OpenStreetMap"), "OpenStreetMap", ());

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("http://www.facebook.com/OpenStreetMap"), "OpenStreetMap", ());

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("https://www.facebook.com/OpenStreetMap"), "OpenStreetMap", ());

				  TEST_EQUAL(osm::ValidateAndFormat_facebook("https://en-us.facebook.com/OpenStreetMap"), "OpenStreetMap", ());

biodranik (Migrated from github.com) commented

2022-06-29 00:24:47 +00:00

Very good catch!

Then you can try to use either utf8cpp iterators by #include "3party/utfcpp/source/utf8/unchecked.h" , or use strings::MakeUniString and compare u32 characters. Both methods should use
std::find_first_of

Very good catch! Then you can try to use either [utf8cpp](https://github.com/nemtrif/utfcpp) iterators by `#include "3party/utfcpp/source/utf8/unchecked.h"` , or use strings::MakeUniString and compare u32 characters. Both methods should use [std::find_first_of](https://en.cppreference.com/w/cpp/algorithm/find_first_of)

vng (Migrated from github.com) requested changes 2022-06-29 03:38:01 +00:00

vng (Migrated from github.com) left a comment

Please, use our {} style (from new string)
Don't know how that happened, but we actually use ValidateAndFormat_facebook function in generator, but write tests for ValidateFacebookPage function, that is not used anywhere except tests. The same applies to ValidateInstagramPage, etc

- Please, use our {} style (from new string) - Don't know how that happened, but we actually use ValidateAndFormat_facebook function in generator, but write tests for ValidateFacebookPage function, that is *not* used anywhere except tests. The same applies to ValidateInstagramPage, etc

indexer/validate_and_format_contacts.cpp

					
				@ -18,2 +17,4 @@

				static auto const s_lineRegex = regex(R"(^[a-z0-9-_.]{4,20}$)");

				// TODO: Current implementation looks only for restricted symbols from ASCII block ignoring

vng (Migrated from github.com) commented

2022-06-29 03:28:39 +00:00

Not making additional substr copy: facebookPage.find_first_of(kForbiddenFBSymbols, 1)

indexer/validate_and_format_contacts.cpp

vng (Migrated from github.com) commented

2022-06-29 03:28:59 +00:00

Not making additional substr copy: page.find_first_of(kForbiddenFBSymbols, 1)

strump reviewed 2022-07-01 17:37:29 +00:00

indexer/validate_and_format_contacts.cpp

					
				@ -18,2 +17,4 @@

				static auto const s_lineRegex = regex(R"(^[a-z0-9-_.]{4,20}$)");

				// TODO: Current implementation looks only for restricted symbols from ASCII block ignoring

strump commented

2022-07-01 17:37:29 +00:00

Removed page.substr(1)

Removed `page.substr(1)`

strump reviewed 2022-07-01 17:37:32 +00:00

indexer/validate_and_format_contacts.cpp

strump commented

2022-07-01 17:37:31 +00:00

Removed page.substr(1)

Removed `page.substr(1)`

strump commented

2022-07-01 17:41:31 +00:00

@vng, ValidateFacebookPage function is used by Android App. See Editor.nativeIsFacebookPageValid(String). When user enters Facebook contact in OrganicMaps editor, this method validates input.
ValidateAndFormat_facebook is used in generator and Android App, but is not covered with tests. I'll write unit-tests for it.

@vng, `ValidateFacebookPage` function is used by Android App. See `Editor.nativeIsFacebookPageValid(String)`. When user enters Facebook contact in OrganicMaps editor, this method validates input. `ValidateAndFormat_facebook` is used in generator and Android App, but is not covered with tests. I'll write unit-tests for it.

strump reviewed 2022-07-01 18:04:39 +00:00

indexer/indexer_tests/validate_and_format_contacts_test.cpp

					
				@ -25,2 +125,3 @@

				  TEST(!osm::ValidateFacebookPage("osm"), ());

				  TEST(!osm::ValidateFacebookPage("invalid_username"), ());

				  TEST(!osm::ValidateFacebookPage("@spaces are not welcome here"), ());

				  TEST(!osm::ValidateFacebookPage("spaces are not welcome here"), ());

strump commented

2022-07-01 18:04:38 +00:00

I added unit test where in a loop verify validation reject for each forbidden symbol

strump reviewed 2022-07-01 18:13:29 +00:00

indexer/validate_and_format_contacts.cpp

strump commented

2022-07-01 18:13:29 +00:00

Поменял проверку: вместо списка запрещённых символов проверяю (ch >= ' ' && ch <= ',') || (ch >= '[' && ch <= '^') || ...
Официальной документации Facebook не предоставил, поэтому Я запретил все символы в ASCII диапазоне, кроме '-', '.' и '_' (проверил вручную -- символы действительно запрещены в именах и URL-ах).

Как быть с Unicode символами: £€¥? Во-первых, у меня нет исчерпывающего списка запрещённых символов, а во-вторых, нужно применять MakeUniString или utf8cpp.
Прелагаю так и оставить: даже если пользователь введёт "плохой" символ, который попадёт в OSM, большой проблемы это не создаст.

Поменял проверку: вместо списка запрещённых символов проверяю `(ch >= ' ' && ch <= ',') || (ch >= '[' && ch <= '^') || ...` Официальной документации Facebook не предоставил, поэтому Я запретил все символы в ASCII диапазоне, кроме `'-'`, `'.'` и `'_'` (проверил вручную -- символы действительно запрещены в именах и URL-ах). Как быть с Unicode символами: `£€¥`? Во-первых, у меня нет исчерпывающего списка запрещённых символов, а во-вторых, нужно применять `MakeUniString` или utf8cpp. Прелагаю так и оставить: даже если пользователь введёт "плохой" символ, который попадёт в OSM, большой проблемы это не создаст.

biodranik (Migrated from github.com) reviewed 2022-07-01 21:31:09 +00:00

indexer/validate_and_format_contacts.cpp

biodranik (Migrated from github.com) commented

2022-07-01 21:31:09 +00:00

Оставь TODO в юнит тестах и в методах валидации на этот случай.

strump reviewed 2022-07-04 20:10:45 +00:00

indexer/validate_and_format_contacts.cpp

strump commented

2022-07-04 20:10:44 +00:00

Добавил TODO комментарий

strump commented

2022-07-04 20:11:22 +00:00

Added tests for ValidateAndFormat_* functions

Added tests for `ValidateAndFormat_*` functions

biodranik (Migrated from github.com) reviewed 2022-07-04 20:53:52 +00:00

indexer/validate_and_format_contacts.cpp

biodranik (Migrated from github.com) commented

2022-07-04 20:48:33 +00:00

  auto const size = facebookPage.size();

```suggestion auto const size = facebookPage.size(); ```

biodranik (Migrated from github.com) commented

2022-07-04 20:49:11 +00:00

bool containsInvalidFBSymbol(string const & facebookPage, size_t startIndex = 0)

```suggestion bool containsInvalidFBSymbol(string const & facebookPage, size_t startIndex = 0) ```

biodranik (Migrated from github.com) commented

2022-07-04 20:49:31 +00:00

  for (auto i = startIndex; I < size; ++i)

```suggestion for (auto i = startIndex; I < size; ++i) ```

biodranik (Migrated from github.com) commented

2022-07-04 20:50:29 +00:00

    if ((ch >= ' ' && ch <= ',') ||

```suggestion if ((ch >= ' ' && ch <= ',') || ```

biodranik (Migrated from github.com) commented

2022-07-04 20:51:07 +00:00

        (ch >= '{' && ch <= '~'))

```suggestion (ch >= '{' && ch <= '~')) ```

biodranik (Migrated from github.com) commented

2022-07-04 20:51:20 +00:00

      return true;

```suggestion return true; ```

indexer/validate_and_format_contacts.cpp

					
				@ -230,8 +259,13 @@ bool ValidateFacebookPage(string const & page)

				  if (page.empty())

				    return true;

biodranik (Migrated from github.com) commented

2022-07-04 20:53:45 +00:00

```suggestion ```

strump reviewed 2022-07-05 06:54:22 +00:00

indexer/validate_and_format_contacts.cpp

strump commented

2022-07-05 06:54:22 +00:00

Thanks. Fixed

vng (Migrated from github.com) approved these changes 2022-07-05 07:58:44 +00:00

This repo is archived. You cannot comment on pull requests.

No reviewers

biodranik

vng

No labels

Android Automotive (AAOS)

API

AppGallery

AppStore

Battery and Performance

External Map Datasets

F-Droid

Fonts

Frequently User Reported

No milestone

No project

No assignees

2 participants

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: organicmaps/organicmaps-tmp#2793

Rows
Columns