Main Page | About | Team | R Package | Methodology | Publications | Download Data and Tools

# The Online Algorithmic Complexity Calculator

## Turing machine formalism

Consider a Turing machine with alphabet $\Sigma=\{0,1, \cdots, m-1\}$ of $m$ symbols and $n$ states $\{1,2, \ldots n\}$ plus an additional Halt state denoted by $0$ (as defined by Rado in his original Busy Beaver paper). The machine runs on a $2$-way unbounded tape. At each step:

1. the machine's current "state" (instruction); and
2. the tape symbol that the machine's head is scanning
define each of the following:
1. a unique symbol to write;
2. a direction to move in: $-1$ (left), $1$ (right) or $0$ (none, when halting); and
3. a state to transition into (which may be the same as the one it was in).
The machine halts if and when it reaches the special halt state $0$. There are $(2\dot{} n\dot{} m + m)^{2n}$ Turing machines with $n$ states and $m$ symbols according to the formalism described above. We denote by $(n,m)$ the set with all those machines.

For the input, we consider machines running on a blank tape (filled with one of the $m$ symbols used as the blank symbol). The output of a halting computation is taken from the contiguous cells on the tape the head of the machine has gone through.

## Calculating D(n,m)

The function $D(n,m)(s)$ gives the probability that a halting machines in $(n,m)$ produces the output string $s$. Formally, $$D(n, m)(s)=\frac{|\{T\in(n, m) : T(p) = s,\ b\in\{0,1,\cdots,m-1\}\}|}{|\{(T,b)\in(n, m)\times \{0,1,\cdots,m-1\} : T(b) \textit{ halts }\}|}$$ Where $T(b)$ represents the output of Turing machine $T$ when running on a blank tape filled with symbol $b$ that produces $s$ upon halting, and $|A|$ represents the cardinality of the set $A$.

For $(n,2)$ with $n<5$, the known Busy Beaver values give the maximim number of steps that a machine can run upon falting. But for $n\geq 5$ or for $(n,m)$ with $m>2$, no Busy Beaver valuer are known, and the size of the machine spaces make impossible a complete exploration to calculate $D(n,m)$, so we make samples.

## Approximating $K(s)$

Function $D(n,m)$ is an approximation to Levin's Universal Distribution $\mathfrak{m}(s)$, and it can be used to approach $K(s)$, the algorithmic complexity of string $s$. By using the Coding Theorem, $$K(s) \simeq -\log_2 \mathfrak{m}(s)$$

The greater value of $n$ we use to calculate $D(n,m)$, the better approximations we make to $K(s)$ for $s$ a string of an alphabet of $m$ symbols. Due to the uncomputability of $D(n,m)$ we work with samples and runtime cutoffs. For the simulation of Turing machines we use a C++ simulator running on a supercomputer of medium size.

## Bibliography

F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit, Calculating Kolmogorov Complexity from the Frequency Output Distributions of Small Turing Machines, at PLOS ONE.

See the Publications section for even more details and applications to other areas and connections to other measures.