Main Page | About | Team | R Package | Methodology | Publications | Download Data and Tools

The Online Algorithmic Complexity Calculator

 

 

 

Methodology

Turing machine formalism

 

Consider a Turing machine with alphabet $\Sigma=\{0,1, \cdots, m-1\}$ of $m$ symbols and $n$ states $\{1,2, \ldots n\}$ plus an additional Halt state denoted by $0$ (as defined by Rado in his original Busy Beaver paper). The machine runs on a $2$-way unbounded tape. At each step:

  1. the machine's current "state" (instruction); and
  2. the tape symbol the machine's head is scanning
define each of the following:
  1. a unique symbol to write;
  2. a direction to move in: $-1$ (left), $1$ (right) or $0$ (none, when halting); and
  3. a state to transition into (which may be the same as the one it was in).
The machine halts if and when it reaches the special halt state $0$. There are $(2\dot{} n\dot{} m + m)^{2n}$ Turing machines with $n$ states and $m$ symbols according to the formalism described above. We denote by $(n,m)$ the set with all those machines.

For the input, we consider machines running on a blank tape (filled with one of the $m$ symbols used as the blank symbol). The output of a halting computation is taken from the contiguous cells on the tape the head of the machine has gone through.

 

Calculating D(n,m)

 

The function $D(n,m)(s)$ gives the probability that a halting machines in $(n,m)$ produces the output string $s$. Formally, $$D(n, m)(s)=\frac{|\{T\in(n, m) : T(p) = s,\ b\in\{0,1,\cdots,m-1\}\}|}{|\{(T,b)\in(n, m)\times \{0,1,\cdots,m-1\} : T(b) \textit{ halts }\}|}$$ Where $T(b)$ represents the output of Turing machine $T$ when running on a blank tape filled with symbol $b$ that produces $s$ upon halting, and $|A|$ represents the cardinality of the set $A$.

For $(n,2)$ with $n<5$, the known Busy Beaver values give the maximim number of steps that a machine can run upon falting. But for $n\geq 5$ or for $(n,m)$ with $m>2$, no Busy Beaver valuer are known, and the size of the machine spaces make impossible a complete exploration to calculate $D(n,m)$, so we make samples.

Approximating $K(s)$

 

Function $D(n,m)$ is an approximation to Levin's Universal Distribution $\mathfrak{m}(s)$, and it can be used to approach $K(s)$, the algorithmic complexity of string $s$. By using the Coding Theorem, $$K(s) \simeq -\log_2 \mathfrak{m}(s)$$

The greater value of $n$ we use to calculate $D(n,m)$, the better approximations we make to $K(s)$ for $s$ a string of an alphabet of $m$ symbols. Due to the uncomputability of $D(n,m)$ we work with samples and runtime cutoffs. For the simulation of Turing machines we use a C++ simulator running on a supercomputer of medium size.

 

Bibliography

 

F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit, Calculating Kolmogorov Complexity from the Frequency Output Distributions of Small Turing Machines, arXiv:1211.1302 [cs.IT],

 

See the Publications section for even more details and applications to other areas and connections to other measures.

 

Content on this site is licensed under a
Creative Commons Attribution 3.0 License

Creative Commons Licence Attribution 3.0 Unported (CC BY 3.0)
View License Deed | View Legal Code
Contact info: hectorz at LABORES dot EU
Cite or attribute as follows:
Soler-Toscano F, Zenil H, Delahaye J-P, Gauvrit N (2014) Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines. PLoS ONE 9(5): e96223. doi:10.1371/journal.pone.0096223
Algorithmic Nature Group - LABORES