I've just tried to create the smallest possible language interpreter. Would you like to join and try?
Rules of the game:
- You should specify a programming language you're interpreting. If it's a language you invented, it should come with a list of commands in the comments.
- Your code should start with example program and data assigned to your code and data variables.
- Your code should end with output of your result. It's preferable that there are debug statements at every intermediate step.
- Your code should be runnable as written.
- You can assume that data are 0 and 1s (int, string or boolean, your choice) and output is a single bit.
- The language should be Turing-complete in the sense that for any algorithm written on a standard model, such as Turing machine, Markov chains, or similar of your choice, it's reasonably obvious (or explained) how to write a program that after being executred by your interpreter performs the algorithm.
- The length of the code is defined as the length of the code after removal of input part, output part, debug statements and non-necessary whitespaces. Please add the resulting code and its length to the post.
- You can't use functions that make compiler execute code for you, such as
eval()
,exec()
or similar.
This is a Community Wiki, meaning neither the question nor answers get the reputation points from votes. But vote anyway!
Python, 250 bytes.
This one is longer than ilya n.'s Python solution, but the language is a little easier to grasp. Each command in this language (I call it Kaputt, the German word for "broken") is one byte. The following 6 commands are defined:
0
: Push a zero onto the stack1
: Push a one onto the stackI
: (if) Pop a value off the stack (which must be zero or one). Run the following block of code (until "i
") if it's a one; skip the block if it's a zero.i
: (endif) Ends the if block and pushes a one if the block was not run (because the "I
" popped a zero). See below for an explanation of the latter.D
: (def) Pops the name of the to-be-defined function off the stack (see below) and binds the following block (until "d
") to that name.d
: (enddef) Ends the function definition.- Any other byte: Check if there is a function by this (one-byte; but see the edit below) name. If so, run this function; if not, push this byte onto the stack for immediate use by
D
.
Nested ifs are legal; nested function definitions are not. The fact that i
(endif) pushes a one if and only if the corresponding if block was not run allows for the following idiom resembling an if/else/endif structure:
# [code that left a one or zero on the stack]
I # abbreviated "(" below
# [code to be run if it was a one]
0iI # abbreviated "/" below
# [code to be run if it was a zero]
1iIi # abbreviated ")" below
# [continuing...]
Note that comments, linebreaks, spaces etc. are not actually allowed.
Here are some examples of common functions. These make use of the abbreviations ( / )
mentioned above.
<D(/)d
defines the function <
(pop) that pops a value off the stack without using it for anything.
&D((1/0)/<0)d
defines the function &
(and) that pops two values of the stack and pushes a one if both values are ones, pushes a zero otherwise.
~D((11/10)/(01/00))d
defines the function ~
(swap) that swaps the top two values on the stack.
RD(R/<)d
defines the function R
(remove) that recursively removes all ones from the top of the stack, and then removes two more values (the top one of which should be zero).
The following interpreter function expects the program p, which is a string (or any other iterable of bytes), and the input stack S, which is a (possibly empty) list of bytes. After the interpreter has run, this list contains the output stack.
def i(p,S,F=0):
A=S.append
F=F or{}
C=0
k=[]
for c in p:
h=0in k
if c=="d":C=0
elif C!=0:C+=[c]
elif c=="I":k+=[int(h or S.pop())]
elif c=="i":k.pop()or A("1")
elif 1-h:
if c=="D":C=F[S.pop()]=[]
else:i(F.get(c)or A(c)or"",S,F)
Obviously, there was no room for error checking, so i()
expects legal Kaputt code. Test cases:
script = "<D(/)d" # < = pop
script += "&D((1/0)/<0)d" # & = and
script += "~D((11/10)/(01/00))d" # ~ = swap
script += "RD(R/<)d" # R = remove
script += "|D(<1/(1/0))d" # | = or
script=script.replace("(","I")
script=script.replace("/","0iI")
script=script.replace(")","1iIi") # ( and / and ) are no real commands of the language.
S=[]
i(script+"1111010111111111R",S)
assert S == ["1","1","1","1","0"]
S=[]
i(script+"00&",S)
assert S == ["0"]
S=[]
i(script+"01~",S)
assert S == ["1","0"]
S=[]
i(script+"00|",S)
assert S == ["0"]
S=[]
i(script+"01|",S)
assert S == ["1"]
Happy coding :-)
Edit: There is nothing inherent in the interpreter that forces tokens to be one byte only. Requiring this was more for consistency with the built-in commands (which are one-byte because that makes the interpreter shorter). If you pass a list of strings instead of a string, you can write more readable Kaputt code like this:
script = """
inc D
(
(
0 0
/
1 0
)
/
1
)
d
""".replace("(","I").replace("/","0 i I").replace(")","1 i I i").split()
This defines a function inc
that increments the two-bit number on top of the stack (LSB on top).
Test:
for n in xrange(4):
S=[]
i(script + [str((n & 2)/2), str(n & 1), "inc"], S)
assert S == [str(((n+1) & 2)/2), str((n+1) & 1)]
Let's call multi-byte function names a language extension :-)
Python program that interprets a language I just created:
from random import randint
z = [randint(0,1), None] # example: input data is 0 or 1
x = '_b_ed_a_i____d_ai' # this program will set p = data[1] = not data[0]
# input x[i], data z[k]
# jumper j
# p is +1 or -1 each step
# commands:
# a set data = 0 or 1
# b j += 0 or +-9 depending on data
# d move data left or right
# e perform jump left or right
# j set jumper to 0
# i end program
# output: p or some data[x], both possible
g = globals()
p = i = 1
a = b = d = j = e = k = 0
while i:
h = x[i]
g[h] = p = -p
i += 1 + j*e
if a: z[k] = i % 2
if z[k]: j += 9*b
k += d
g[h] = 0
# print(a, b, d, e, i, j, k, h, z)
print('Input:', z, 'Output:', p > 0)
Optimized version:
g=globals()
p=i=1
a=b=d=j=e=k=0
while i:
h=x[i]
g[h]=p=-p
i+=1+j*e
if a:z[k]=i%2
if z[k]:j+=9*b
k+=d
g[h]=0
114 bytes
Update: I want to add that the point of my program is not to create any practical language, and not even to somehow have the 'best' (=='shortest') interpreter, but rather to demonstrate interesting programming tricks. For example, I am relying on direct access to global variables through globals()
, so that I never even test for j
command, saving precious bytes :)
#include "/dev/tty"
Assemble the code below using A86 to get a 150 byte BF interpreter!
add dh,10h
push dx
add dh,10h
push dx
mov bl,80h
lea dx,[bx+2]
add bl,byte ptr [bx]
mov byte ptr [bx+1],al
mov ah,3dh
int 21h
pop ds,es
jc l14
mov bx,ax
mov ah,3fh
mov cx,di
xor dx,dx
int 21h
jc l14
mov bx,ax
xor ax,ax
rep stosw
xor di,di
xor si,si
inc ch
l1:
cmp si,bx
jae l14
lodsb
mov bp,8
push l1
l2:
dec bp
js ret
cmp al,[bp+l15]
jne l2
l3:
mov cl,[bp+8+l15]
jmp cx
l4:
inc di
ret
l5:
dec di
ret
l6:
es inc byte ptr [di]
ret
l7:
es dec byte ptr [di]
ret
l8:
mov ah,2
es mov dl,[di]
int 21h
ret
l9:
mov ah,8
int 21h
es mov [di],al
ret
l10:
es cmp byte ptr [di],dh
jne ret
l11:
cmp si,bx
jae l14
lodsb
cmp al,']'
jne l11
ret
l12:
es cmp byte ptr [di],dh
je ret
l13:
dec si
jz l14
cmp byte ptr [si-1],'['
jne l13
l14:
ret
l15:
db '>'
db '<'
db '+'
db '-'
db '.'
db ','
db '['
db ']'
db offset l4-100h
db offset l5-100h
db offset l6-100h
db offset l7-100h
db offset l8-100h
db offset l9-100h
db offset l10-100h
db offset l12-100h
Specify a filename on the command line, no need for double quotes, as the BF source.
Not mine, but take a look at the 210 bit binary lambda calculus self-interpreter here.
Written in Perl, 140 characters long, including the shell invocation and flags:
perl -ne'push@a,split;if(eof){$i=0;while($i>=0){($a,$b,$c)=@a[$i..$i+2];$b>-1?$a[$b]-=$a[$a]:print chr$a[$a];$i=$b>-1&&$a[$b]<=0?$c:$i+3;}}'
Readable version:
#!/usr/bin/perl -n
push @prog, split /\s+/, $_;
if(eof) {
$i = 0;
while($i >= 0) {
($a, $b, $c) = @prog[$i .. $i + 2];
if($b > -1) {
$prog[$b] -= $prog[$a];
} else {
print chr $prog[$a];
}
if($b > -1 and $prog[$b] <= 0) {
$i = $c;
} else {
$i + 3;
}
}
}
The above is an interpreter for pseudo-machine code for a One Instruction Set Computer using the subleq
instruction. Basically, the source code should just be a bunch of numbers separated by whitespace. A simple test program to verify my results (should print "Hi" and a Unix newline):
0 0 6 72 105 10 3 -1 9 4 -1 12 5 -1 15 0 0 -1
Readable version of the test input (works just as well):
0 0 6
72 105 10
3 -1 9
4 -1 12
5 -1 15
0 0 -1
My own entry, implementation of OISC in Ruby. It's 85 bytes long, and I'm sure one could squeeze a few more with some neat tricks. Programs have to contain data in code (line of space-separated numbers). At the moment I'm unable to provide working program written in OISC, but I will do it later.
p,m=0,gets.chomp.split.map(:to_i)
while p>0
p=(m[m[b]]-=m[m[a]])>0?p+3:c
end
$><<m[0]
The code is quite straightforward. m
is a "memory" containing program and data. First line initializes m
with provided code, and p
- memory pointer. Main loop is subleq
operation, written with ternary operator. Last line is smart way to output number contained in memory.
Building on my previous code-golf entry, here is a (slightly generalized for IO) OISC emulator in
Fortran77
Obsfucated and without the loading scaffold (655 characters):
subroutine o(n,m)
integer m(n)
l=1;
do while (l.ne.0)
i=m(l)
j=m(l+1)
k=m(l+2)
mi=mg(n,m,i)
mj=mg(n,m,j)
if(j.eq.(n+2)) then
write(6,*)mj-mi
else
m(l+1)=mj-mi
endif
if (m(l+1).lt.0) then
l=mg(n,m,k)
else
l=l+3
endif
end do
return
end
function mg(n,m,i)
integer m(n)
if (i.eq.n+2) then
read(5,*)mg
elseif (i.eq.n+1) then
mg=0
else
mg=m(i)
endif
return
end
with comments, loading scaffold and so on (2435 characters):
program c
parameter (n=1024) ! The size to use for memeory
integer m(n) ! represent the memory
c Load a program into memory
i=1
1 read(5,*,end=2)l
c write (6,*) "Load ",l," into location ",i
m(i)=l
i=i+1
goto 1
c Run the computer
2 call o(n,m)
stop
end
subroutine o(n,m)
c Simulate a simple computer that supports only a single
c instruction. Memory is a fixed size array of integers.
c
c The supported instruction is subtract-branch-negative which we
c give the assembly syntax
c
c sbn i j k
c
c and acts by subtracting the value in memeory location i from that
c in location j and storing the result in j. If the result is
c negative, the PC is set to k. Because there is only one opcode, it
c is not represented, so a program consists simply of a series of
c triplet (i,j,k), and the PC is generally incremented by 3. The
c program counter starts at 1
c
c Povisions for IO and halting are by way of special memory
c location:
c
c * PC=0 means halt
c * writes (opcode argument j) to memory location n+1 send
c output, reads from n+1 evaluate as 0
c * reads (opcode argument i, j, k) from memory location n+2 fetch
c input, writes to n+2 are forbidden
c * All others reads and writes outside of memeory are forbidden
c n ! the size of the computers memory
integer m(n) ! an array representing the computers memory
l=1; ! program counter
do while (l.ne.0)
i=m(l)
j=m(l+1)
k=m(l+2)
c write (6,*) "Executing from PC=",l,": (",i,", ",j,", ", k,")"
c handle the write special case for output
mi=mg(n,m,i)
mj=mg(n,m,j)
if(j.eq.(n+1)) then
write(6,*)mj-mi
else
c write (6,*) "Setting location ",j," to ",mj-mi
m(l+1)=mj-mi
endif
if (m(l+1).lt.0) then
l=mg(n,m,k)
else
l=l+3
endif
end do
return
end
c Handle the various read special cases
function mg(n,m,i)
integer m(n)
if (i.eq.n+2) then
read(5,*)mg
elseif (i.eq.n+1) then
mg=0
else
mg=m(i)
endif
c write (6,*) "Read ",mg," from location ",i
return
end
Sample program:
13
1025
0
14
1025
0
14
15
0
0
0
0
-1
1
0
resulting in output:
$ cat trivial.oisc | ./a.out
1
-1
which is as expected.
It really is exceedingly straightforward code. The point here is not really how tightly I can code it, but how simple a language you need for Turing completeness.
106-byte solution was posted to codegolf.com contest. It is written in perl and interprets Brainfuck. There is no way to review it at this point, afaik.
It is not my solution, but I believe it is shorter than current entries. Possible solution should be shorter, as there is no need to use Brainfuck as Turing-complete language.
URM interpreter in CoffeeScript, 143 bytes (167 with new lines).
This version of URM has consists of an unbounded amount of registers, with zero, successor and jump operators. It is well known that this is turing-complete.
The URM program is written in the array c
(commands), and the inputs are in the array r
(registers). After the calculation the output is at r[0]
and this value is displayed.
The interpreter, with sample program and input that computes 32+13 (and indeed outputs 45):
# Addition program, similar to http://planetmath.org/examplesofunlimitedregistermachines
c = [['j', 1, 2, 4], ['s', 0], ['s', 2], ['j', 1, 1, 0]]
# Input: 32, 13, thus the desired output is: 45
r = [32, 13]
k = 0
while k < c.length
t = c[k]
k += 1
if t[0] == 'z'
r[t[1]] = 0
if t[0] == 's'
if !r[t[1]]?
r[t[1]] = 0
r[t[1]] += 1
if t[0] == 'j' && r[t[1]] == r[t[2]]
k = t[3]
alert r[0]
Minified version:
k=0
while k<c.length
t=c[k]
k+=1
if t[0]=='z'
r[t[1]]=0
if t[0]=='s'
if !r[t[1]]?
r[t[1]]=0
r[t[1]]+=1
if t[0]=='j'&&r[t[1]]==r[t[2]]
k=t[3]
alert r[0]
What I really like in this code is that is really straightforward and easy to understand.
Custom language: SLA
Keywords:
i - Interpret SLB
q - Quit
Custom Language: SLB
Keywords:
cp("text") - Interpret text as C program
Example:
cp("#include \nint main() { puts(\"Hi!\n\");return 0}")
Interpreter (Written in SLA):
i q
3 characters!
来源:https://stackoverflow.com/questions/1053931/creating-the-shortest-turing-complete-interpreter