Main Page | How To Cite | Team | R Package | How It Works | Publications | Download Data and Tools

# The Online Algorithmic Complexity Calculator

## How It Works

If your string is shorter than 13 you should use CTM. When the string is greater than 12, then the BDM method requires block size and overlapping values.

You should always pick the largest block size, as it provides better approximations to algorithmic complexity (in contrast, the smallest block size approximates Shannon Entropy). You can pick any overlapping value, that means whether you take separate blocks, e.g. say your string is 111001010111 and you do block size 6, if the overlapping is 0, then BDM will calculate the algorithmic complexity of 111001 and 010111 and add it up, or you can choose overlapping 1, for which the strings to be evaluated and added up are 111001, 101011 and 010111.

Overlapping helps to deal with leftovers after block partition if the string is not multiple of the block size, otherwise, leftovers of length less than the block size will be discarded (and thus neglected). This paper shows different schemes to deal with boundary conditions. The online calculator only deals with the most 'basic', but we have proven that the error is bounded and thus values are reliable for most purposes. In general, overlapping produces overestimations of complexity, and non-overlapping leads to an underestimation but only for objects not multiple of the block size and for less than the block size itself.

You should always compare results for the same chosen parameters, never otherwise (unless you estimate the error as we did in the paper and then make corrections or take the deviations into consideration).

As for matrices, the same rule holds. Current support for strings is binary and non-binary, but for arrays is currently only binary. With some loss of precision one can always translate anything alphabet into binary with some loss of information due to the extra granularity introduced in the translation.

## Calculating D(n,m)

For more technical details you should read the Bibliography below. But in a nutshell, we calculate a function $D(n,m)(s)$. The function $D(n,m)(s)$ gives the algorithmic probability that a halting machines in the set of all Turing machines with n states and m symbols denoted by $(n,m)$, produces a string $s$. We use a standard model of Turing machine (the one used by Rado to define the Busy Beaver problem) but we have also proven that radical changes to the model produce similar estimations.

Formally, $$D(n, m)(s)=\frac{|\{T\in(n, m) : T(p) = s,\ b\in\{0,1,\cdots,m-1\}\}|}{|\{(T,b)\in(n, m)\times \{0,1,\cdots,m-1\} : T(b) \textit{ halts }\}|}$$, where $T(b)$ represents the output of Turing machine $T$ when running on a blank tape filled with symbol $b$ that produces $s$ upon halting, and $|A|$ represents the cardinality of the set $A$.

For $(n,2)$ with $n<5$, the known Busy Beaver values give the maximim number of steps that a machine can run upon falting. But for $n\geq 5$ or for $(n,m)$ with $m>2$, no Busy Beaver valuer are known, and the size of the machine spaces make impossible a complete exploration to calculate $D(n,m)$ for arbitrary n and m, but educated choice of timeouts can be made and samples produced (see Bibliography).

## Approximating $K(s)$

The function $D(n,m)$ is an approximation to Levin's Universal Distribution $\mathfrak{m}(s)$, and it can be used to approach $K(s)$, the algorithmic complexity of string $s$. By using the Coding Theorem, $$K(s) \simeq -\log_2 \mathfrak{m}(s)$$

The greater value of $n$ we use to calculate $D(n,m)$, the better approximations we make to $K(s)$ for $s$ a string of an alphabet of $m$ symbols. Due to the uncomputability of $D(n,m)$ we work with samples and runtime cutoffs. For the simulation of Turing machines we use a C++ simulator running on a supercomputer of medium size.

## Bibliography

Delahaye J-P and Zenil, H (2012) Numerical Evaluation of the Complexity of Short Strings: A Glance Into the Innermost Structure of Algorithmic Randomness
Applied Mathematics and Computation 219, pp. 63-77.

Soler-Toscano F, Zenil H, Delahaye J-P and Gauvrit N (2014) Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines. PLoS ONE 9(5): e96223.

Zenil H, Soler-Toscano F, Kiani N.A., Hernández-Orozco S, Rueda-Toicen A (2016) A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity, arXiv:1609.00110

See the Publications section for even more details and applications to other areas and connections to other measures.