Fast function to parse strings into double (binary64) floating-point values, enforces the RFC 7159 (JSON standard) grammar: 4x faster than strtod
Find a file
2020-10-02 13:40:55 -04:00
.github/workflows More tweaking. 2020-09-28 10:29:39 -04:00
benchmarks More fixes 2020-09-08 19:10:06 -04:00
include Adding an explanation. 2020-10-02 13:40:55 -04:00
tests Removing mistakenly merged contribution. 2020-10-02 13:01:14 -04:00
.appveyor.yml Update .appveyor.yml 2020-07-14 21:20:29 -04:00
.cirrus.yml Fixing cirrus. 2020-09-28 11:54:28 -04:00
.drone.yml Killing two tests 2020-06-03 17:28:41 +00:00
.gitignore add basic CMakeLists.txt 2020-04-15 11:22:56 -07:00
.gitmodules Saving... 2020-03-09 20:43:39 -04:00
CMakeLists.txt We won't build the benchmarks by default anymore. 2020-09-28 10:24:51 -04:00
LICENSE Update LICENSE 2020-03-12 14:02:37 +00:00
Makefile Removing stats 2020-09-01 21:22:34 -04:00
README.md Removing mistakenly merged contribution. 2020-10-02 13:01:14 -04:00

fast_double_parser

Build Status Build status

Fast function to parse strings containing decimal numbers into double-precision (binary64) floating-point values. That is, given the string "1.0e10", it should return a 64-bit floating-point value equal to 10000000000. We do not sacrifice accuracy. The function will match exactly (down the smallest bit) the result of a standard function like strtod.

We support all major compilers: Visual Studio, GNU GCC, LLVM Clang. We require C++11.

Why should I expect this function to be faster?

Parsing strings into binary numbers (IEEE 754) is surprisingly difficult. Parsing a single number can take hundreds of instructions and CPU cycles, if not thousands. It is relatively easy to parse numbers faster if you sacrifice accuracy (e.g., tolerate 1 ULP errors), but we are interested in "perfect" parsing.

Instead of trying to solve the general problem, we cover what we believe are the most common scenarios, providing really fast parsing. We fall back on the standard library for the difficult cases. We believe that, in this manner, we achieve the best performance on some of the most important cases.

We have benchmarked our parser on a collection of strings from a sample geojson file (canada.json). Here are some of our results:

parser MB/s
fast_double_parser 660 MB/s
abseil, from_chars 330 MB/s
double_conversion 250 MB/s
strtod 70 MB/s

(configuration: Apple clang version 11.0.0, I7-7700K)

We expect string numbers to follow RFC 7159. In particular, the parser will reject overly large values that would not fit in binary64. It will not produce NaN or infinite values.

The parsing is locale-independent. E.g., it will parse 0.5 as 1/2, but it will not parse 0,5 as 1/2 even if you are under a French system.

Requirements

You should be able to just drop the header file into your project, it is a header-only library.

If you want to run our benchmarks, you should have

  • Windows, Linux or macOS; presumably other systems can be supported as well
  • A recent C++ compiler
  • A recent cmake (cmake 3.11 or better) is necessary for the benchmarks

Usage (benchmarks)

git clone https://github.com/lemire/fast_double_parser.git
cd fast_double_parser
mkdir build
cd build
cmake .. -DFAST_DOUBLE_BENCHMARKS=ON
cmake --build . --config Release  
ctest .
./benchmark

Under Windows, the last line should be ./Release/benchmark.exe.

Sample results

$ ./benchmark 
parsing random integers in the range [0,1)


=== trial 1 ===
fast_double_parser  460.64 MB/s
strtod         186.90 MB/s
abslfromch     168.61 MB/s
absl           140.62 MB/s
double-conv    206.15 MB/s


=== trial 2 ===
fast_double_parser  449.76 MB/s
strtod         174.59 MB/s
abslfromch     152.68 MB/s
absl           157.52 MB/s
double-conv    193.97 MB/s


$ ./benchmark benchmarks/data/canada.txt
read 111126 lines 


=== trial 1 ===
fast_double_parser  662.01 MB/s
strtod         69.73 MB/s
abslfromch     341.74 MB/s
absl           325.23 MB/s
double-conv    249.68 MB/s


=== trial 2 ===
fast_double_parser  611.56 MB/s
strtod         69.53 MB/s
abslfromch     330.00 MB/s
absl           328.45 MB/s
double-conv    243.90 MB/s

API

The current API is simple enough:

#include "fast_double_parser.h" // the file is in the include directory


double x;
char * string = ...
bool isok = fast_double_parser::parse_number(string, &x);

You must check the value of the boolean (isok): if it is false, then the function refused to parse.

Users

The library has been reimplemented in Google wuffs.

Ports

There is a Julia port.

Credit

Contributions are invited.

This is based on an original idea by Michael Eisel (joint work).