Estimating the area of the mandelbrot set

It's around 1.5065918857±0.0000000085

August 2024 Update

You need to read the rest of this article to understand this section.

I had some extra computation credits left, so I tested an extra 36.96 trillion points on some VMs. These computations include

Eight 1048576 grids with 8796093022208 points. This computation yielded 5746 erratic points, and area estimates for including and excluding the erratic points are 1.5065918909(99) and 1.506591887(10), respectively. Analysis output (UTF-8).
99 of 524288 grids with 27212912787456 points. This computation yielded 17847 erratic points, and ares estimates for including and excluding the erratic points are 1.5065918952(84) and 1.5065918915(84), respectively. Analysis output (UTF-8).

Problem and goals

The mandelbrot set is a fractal defined as a set of complex numbers c that does not diverge to infinity when iterated infinitely through f(z) = z² + c, starting at z = 0. This webpage will not attempt to explain the fractal, there are existing resource that does an excellent job explaining what the fractal is. I will assume from now that you have a basic understanding of the mandelbrot set

There exists no analytical solution for the area of the mandelbrot set; no closed form solution or well-behaved series. Although from I know, the area of the mandelbrot set has little discovered implications in mathmatics, it is still a fun and interesting challenge to try to estimate the area of the fractal

The approach

The best existing way to estimate the area of the mandelbrot set is to evaluate a very large number of points within a certain area (the area the entire set encompass) and multiply the percentage of points that is in the set by the area evaluated.

I have written a computer program that does exactly that and does its best job estimating the area of the fractal

Other attempts

My methodology and result will be explained later in the page, I want to first mention other attempts that was made and their results

Robert P. Munafo had an estimation with an estimated error of 2.07 * 10^-8 with 6.7 trillion points tested. His site also provided very useful information to my attempt.
Thorsten Forstemann has, to the best of my knowledge, the best estimate currently known, with an estimated error of 2.8 * 10^-9 [1.5065918849(28)] where 87 trillion points was tested.
As a comparison, my estimation has an error of about 8.49 * 10^-9 [1.5065918857(85)] with 4.4 trillion points tested

The "error" mentioned above is 95% confidence of mean. Since numbers above is averaged together from multiple runs (which are in itself means). The 95% confidence range is σ(1.96)(√n)^-1, where σ is the standard deviation between the results of multiple runs and n is the number of runs.

The reason that my estimation has a tighter confidence interval with less values is because I tested fewer grids with more pixels each rather than more grids with fewer pixels each.

Details about my implementation

The code of my implementation can be found in this github repo (hsingtism/mandelbrot-area)

If you want to use my code, the readme in the aforementioned repository will provide more specific information

I have created a simple, singlethreaded, c program that estimates the area of the fractal. The essence of the program is very simple, it tests a large amount of uniformly distributed complex numbers within a certain range for its membership in the mandelbrot set and count the number of points that is a member, not a member, or undecided (potentially chaotic, or bad-having) pixels.

There are two main methods that the program does the calculation,

The pure monte-carlo generates a number where the real and complex part is between -2 and 2, testing an area of 4. This method is extremely easy to implement and convenient to run since the program can be terminated at any time. The downside is that estimates converge very slowly due to its nature. Testing 1 trillion pixels yields an error of about 1.3 * 10^-5
The main method that was used is the "simple-grid" method. This method divide up the complex plane into very small sections and test one random point that is within that section. This method is more successful and the results mentioned in this document are calculated from this method.

The PRNG

The source of randomness in my programs are acquired from the xorshift128+ pseudorandom number generator. This is the same one the V8 JavaScirpt engine uses; in fact, my implementation is heavily based from this. This is the article that mainly helped me.

The random function simply generates a random 64-bit word, this is converted into a floating point double to be used via some bit-masking.

The PRNG is seeded with system time, some user defined constants, and /dev/urandom on unix-like systems

The membership function

The membership function first checks whether the number is in bound by doing the following checks for input x

Re(x) is between -2.0 and 0.49
Im(x) is between -1.15 and 1.15
Abs(x) is less than 2

If any of the conditions is not satified, the function returns not a member

The function then checks if the point is within the main cardioid or the main bulb, using the shape's bounding boxes to help speed up some calculations. Desmos graphs. Empirically, this greatly speeds up the program's speed

The function then starts a loop where the number is iterated through the iterator function. An orbit detection variable is also started where the number is only iterated every two main iterations. After this, there are 4 ways the function can return

Every 5 iterations, the function checks if the absolute value of the main variable is more than 2; if so, not a member is returned. The value 5 strikes a balance between not checking excessively and not iterating excessively since comparisons are expensive, this number also prevent numbers from shooting up to infinity which "contaminate" the numbers with NaN.
After the absolute value check, a convergence check executes. If the difference between the current number and one of the last iteration is less than a certain threshold, it is assumed that the value converges and member is returned
Every time the orbit variable is iterated (every 2 iterations), if the orbit variable is close enough to the main variable, it is assumed that the value orbits and member is returned.
If the loop does not return after a set limit, undecided is returned. This rarely happens

The thresholds for "close enough" for convergence and orbits are user defined, they have to be set to balance between not returning erroneous data (or doing it extremely rarely) but sufficiently increase performance.

Output and analysis

Results are written into a log file that can be analyzed by a node.js script

For large grid calculations, checkpointing can be enabled to write the result every set amount of scanlines. In case that the program crashes, a new program can start there.

Scan direction

The program test points from the real number line to 1.15i. It starts at -2+0i and goes positive real before wrapping back to another higher imaginary scanline

Results

The accuracy of the value increases as the grid size and points tested increases.

I changed the dwell limit between grid sizes and even between each runs pretty arbitrarily, additionally, the number of runs per grid sizes are also pretty inconsistent; because of this, a table will not be really practical, but I will leave in one just to show the relationship between grid size, points tested, and estimated error

grid size  points (log)  estimated error 
8192       10.0648       1.71e-6
16384
32768      10.1101       7.58e-7
65536      11.2350       1.94e-7
131072     11.2764       1.05e-7
262144     12.0412       3.76e-8
524288
1048576    12.6433       8.49e-9
2097152

- log is log base 10
- estimated error is the number that 
includes undecided points

The first time I ran the 1048576 grid, my equivlance threshold was set too high (2^-16 for orbit and 2^-32 for convergence) resulting in too many points being mistaken for member.

Error is inherent to estimations like this. Although the percision limit for floating point (less than 2^-52 for this application) should not cause any problems at this level of percision since the equivalence threshold of 2^-32 for orbit and 2^-48 for convergence produced similar results to other people's implementation.

Four instances of the 1048576 grid was ran on a DigitalOcean 4-core CPU optimized droplet, yielding my best estimate. Each instance took about 1.92 millions seconds or 22 days, the average computation rate was 574000 complex values per second per core. The droplet costed about 60 US dollars for the time of the computation.

The final estimate uses the value that includes undecided pixels as part of the set. More data can be found in this text file (UTF-8 required).

My best estimate stands at 1.5065918857 with a 95% confidence interval of 0.0000000085

or 1.5065918857±0.0000000085 or 1.5065918857(85)