I wrote a 231-byte Brainfuck compiler by abusing the entirety

I wrote a 231-byte Brainfuck compiler by abusing the entirety

tutorial, developer, with an peek in opposition to a brighter techno-social lifestyles



[prev]
[next]

2021-07-10

I wrote a 231-byte Brainfuck compiler by abusing the entirety

All offer code for this weblog post would possibly per chance presumably presumably be found right here.

We wrote a Brainfuck compiler back in April. That one developed a simple architecture to enhance a few backends. At the present time, I could per chance presumably presumably love to write a novel Brainfuck compiler. One who would possibly per chance presumably most productive bustle on a single plot, OpenBSD/amd64. And one that is as shrimp as I will presumably impact it. I broke into the sub-256 bytes membership with my compiler, so I believed it used to be recommended sharing and speaking about how I was ready to drag it off.

Whereas this works on my machine, there would possibly per chance presumably presumably be subtle differences about how your machine works that trigger this compiler no longer to work. Make determined to determine your machine’s habits if there are any differences!

We’re optimizing completely for size. Which technique that we would possibly per chance presumably impact selections which are less performant than diverse ways of writing a Brainfuck compiler. That is OK.

The assembly code shall be written the utilize of AT&T syntax. Final time I had a weblog post with assembly code, any individual complained that it used to be no longer Intel syntax. I’m no longer vital afflicted by either and I enlighten my undergraduates each syntaxes because I deem it’s a long way necessary with a understanding to learn each. In the occasion you are an Intel syntax particular person, spend this post as a risk to abet your AT&T syntax abilities.

Reviewing Brainfuck

Let’s rapidly evaluation the syntax of Brainfuck so we know what we want to implement. The table beneath presumes that we are translating Brainfuck to C, have a global array performing because the Brainfuck tape, and have a pointer, *p, that is intialized to the starting of the array.

Image C identical
< --p;
> ++p;
- --*p;
+ ++*p;
, *p = getchar();
. putchar(*p);
[ while (*p) {
] }

Selecting an implementation language

I deem the most productive technique we stand a gamble at increasing a compiler as shrimp as capacity is to utilize hand-optimized assembly. Those precious bytes are now not going to set themselves!

We shall be capable of be the utilize of the GNU binutils for our assembler and the OpenBSD plot linker, which is LLVM lld, for our linker. I'm the utilize of the GNU assembler from ports, which is a substantial newer version of the assembler. This newer assembler is a shrimp bit bit smarter in the case of encoding and that finally ends up saving a byte for us down the motorway. I will level it out once we assemble there.

Coding our strings

Our compiler will bring collectively Brainfuck to C. I selected C since it will enable us to have a few of the smallest output code, and by extension the shortest strings wanted to have built-in to our compiler. We shall be capable of utilize the table above for our translation.

Our compiler therefore begins as follows:

	.text
	.globl	necessary
necessary:

.LSprologue:
	.ascii	"a[65536],*p=a;necessary(){"	# 21

.LS:
	.ascii	"--p;"		# 4
	.ascii	"++p;"		# 4
	.ascii	"--*p;"		# 5
	.ascii	"++*p;"		# 5
	.ascii	"*p=getchar();"	# 13
	.ascii	"putchar(*p); "	# 13
	.ascii	"while(*p){"	# 10
	.ascii	"return 0;"	# 9
	.ascii	"}"		# 1

The tip is just a few boilerplate to repeat the toolchain that we now have a global image named necessary and it (and the entirety else, for that matter) lives in the text piece of our binary.

Starting with .LSprototype is our strings. Labels starting with .L are regarded as native labels for ELF targets delight in my machine. After each string I set aside aside a inform with the sequence of characters in the string. We shall be capable of utilize the OpenBSD kernel facilities for studying and writing, and the write(2) plot call has a parameter for sequence of bytes to write. Yes, we would possibly per chance presumably write a strlen(3) aim, and if truth be told we did that for our echo program. On the replacement hand, writing such a aim would spend too many bytes when put next to easily shoving the ideal price into the register.

The .LSprologue string is change into self sufficient from the the leisure of the .LS strings because the technique I organized the compiler the prologue is passe straight and for all diverse strings we index into the string starting at .LS.

Additionally, you will gape I'm the utilize of the .ascii pseudo-op moderately than the .asciz pseudo-op. This implies that there would possibly per chance be no NUL-termination of any of these strings, so that they're no longer C strings. Since we are offering the real sequence of bytes to write with each write(2) call, we are able to assemble away with this and set a series of bytes in the job.

We're also leaving them in the .text piece. That technically technique they're code. We would possibly per chance presumably set aside them in the .rodata piece however since these string are by no technique written to, I deem it's stunning.

We would possibly per chance presumably take into memoir some more byte-shaving at work. Whereas we are compiling to C, we didn't specify which similar old we had been targeting. We're going very passe by having implicit ints all over the set aside. We're also going to be implicitly declaring getchar(3) and putchar(3) which is invalid in C99 and later, in step with clang. So we can aim C89 and/or K&R C.

As a results of the implicit ints the cells on the Brainfuck tape are 32-bits in size. There is no requirement for any particular cell size, so right here is appropriate from a Brainfuck standpoint.

I also made the getchar and putchar strings equal size, though the putchar string is on the entire one personality shorter. We shall be capable of employ a byte now to set several bytes later.

Finding out code into a buffer

Our parser must learn on this procedure one personality at a time and job each personality. If it's one in all the Brainfuck characers, we output the correct translation, otherwise we transfer on to the next personality. We're going to spend fair correct thing regarding the truth that we are able to learn from stdin and write to stdout with out having to pickle anything up outselves; these facilities are pickle up for us by our ambiance.

We shall be capable of also ought to survey up the syscall numbers for learn(2) and write(2), which are 3 and 4.

We must always always learn in a personality before we are able to parse it and write anything. Our learn can survey delight in this:

.Lparse:
	movb	$3, %al		# Load learn(2) plot call
	movb	$1, %dl		# Read one personality
	xorl	%edi, %edi	# Read from stdin
	leaq	(%rsp), %rsi	# Read into high of stack
	syscall			# learn(0, (%rsp), 1);
	incl	%edi		# Field %edi to 1, for write

We are able to spend fair correct thing about some assumptions on our plot for this to work. On our plot all overall-aim registers are initialized to 0 by the ambiance before program birth up. The movb directions require most productive 2 bytes however attain no longer 0-out the the leisure of the register. If there used to be something already in %rax or %rdx above the lowest byte, it would dwell as-is. But since our ambiance pickle these registers to 0, we are OK. These movb directions are 2 bytes each.

We also selected to utilize xorl as an replacement of movb for the %rdi register. Right here's since it's seemingly you'll presumably presumably presumably't utilize movb on %rdi since it's at its smallest a 16-bit register. Since we are making it 0, xor'ing any 32-bit register with itself is the smallest technique to attain this at 2 bytes.

The leaq instruction is costly at 4 bytes. Or no longer it's truly one in all the smallest variations of this instruction and we can take into memoir a 7-byte version later. In the slay, we want to call the syscall for 2 byte. That is 12 bytes total to assemble a personality from stdin.

We're also assuming that our ambiance pickle up some stack for us and that our stack pointer parts to something practical. That happens to be correct kind. As we discovered before, the ambiance does non-public our argc and argv variables and that's the case right here too. So we now have an inexpensive stack pointer.

Whereas it's no longer straight related to studying in a personality, it turns out that %rdi stays the similar all the way thru learn(2) calls, so we are able to increment it to prepare for an upcoming write(2) call. Incrementing is 2 bytes, which is the smallest technique to turn a 0 into a 1 for us. Even in the case where we attain no longer truly cease up writing anything, as an illustration because we learn in a inform personality, it would now not vital matter since we xor the register before the learn(2) call on every occasion.

Checking for EOF

Now would possibly per chance presumably presumably be a correct kind time to determine if we are at the cease of the file, since if we are we now have executed our work and must exit this procedure. Fortunately, learn(2) returns the sequence of bytes learn in %rax. If that quantity is 1, then we are no longer but executed. In every other case, if it's 0 or -1 then we are executed. A 0 technique cease of file and a -1 technique an error occured. We assemble no longer truly care if we reached cease of file or encountered an error. Each of these imply it's time to exit for us.

Swiftly aside: The Machine V AMD64 ABI calling convention

Our machine uses the Machine V AMD64 ABI calling convention. This dictates how our registers are passe and how registers are saved or no longer all the way thru aim calls, in conjunction with syscalls. For us, we want to know that %rax will non-public the return price of a call, so we must presume that it will commerce after each call. Additionally, %rax contains the syscall amount earlier than making the syscall. The calling convention also says that the %rbx, %rsp, %rbp, and %r12-%r15 registers has to be restored to their usual price by the callee in the occasion that they are passe. Which technique as a long way as we are fervent, we are able to utilize these registers all the way thru calls and presume that their price will slay the similar. Any diverse registers would possibly per chance presumably presumably be clobbered by calls. We must always always also appreciate the registers passe for call parameters. Since learn(2) and write(2) utilize three parameters, we must always always be sure to utilize %rdi, %rsi, and %rdx for these parameters, delight in we did with our learn(2) call above.

It happens to be the case that that %rdi and %rdx attain no longer assemble clobbered by learn(2) and write(2) calls on our machine, which we can utilize to our advantage.

Checking for EOF, portion 2

Now we are able to envision for EOF after each learn:

	cmpl	%edx, %eax	# EOF ? (%eax < 1)
	jl	.Leof

That is all we resolve. Since %rdx would now not assemble clobbered on our plot, we sign it's 1. If %rax is lower than that, we know we either reached the cease of the file or an error occured. In that case, we must jump to the code that exits this procedure.

Parsing the input

One thing that we want to know is that the smallest cmp is when you compare a byte in opposition to %al. But as of now, %rax contains the return code of the learn(2) call. We stand to set eight bytes the utilize of the smallest cmp. We shall be capable of cease up spending three bytes to set eight bytes, which is tall.

We are able to impact this happen by a shrimp bit tweaking our cease of file take a look at and then parsing our input after the cease of file take a look at:

	xchg	%ebp, %eax	# Store return price in %ebp
	movb	(%rsi), %al	# cmpb imm, %al is the smallest cmp
	leal	.LS, %esi	# Preload string
	cmpl	%edx, %ebp	# EOF ? (%ebp < 1)
	jl	.Leof
	cmpb	$60, %al	# '<' ?
	je	.Lleft
	cmpb	$62, %al	# '>' ?
	je	.Lright
	cmpb	$45, %al	# '-' ?
	je	.Ldec
	cmpb	$43, %al	# '+' ?
	je	.Linc
	cmpb	$44, %al	# ',' ?
	je	.Lgetchar
	cmpb	$46, %al	# '.' ?
	je	.Lputchar
	cmpb	$91, %al	# '[' ?
	je	.Lopenloop
	cmpb	$93, %al	# ']' ?
	je	.Lcloseloop
	jmp	.Lparse		# Comment personality, skip

We are able to xchg to assemble our smallest cmp. Right here's the explanation in the back of the newer GNU assembler. The older assembler in atrocious consistently encodes this to a 2-byte sequence however the newer assembler is terribly very top ample to know this would presumably presumably be encoded as a 1-byte instruction and does so. We then set aside aside the personality we learn in into %rax and preload the take care of of all our strings by a 7-byte leal. That is very dear, truly it's the most costly instruction we can come upon. On the replacement hand it's a long way a the biggest substandard for us. Now we create our cease of file take a look at and if we are no longer at the cease of the file we continue on determining which personality we now have and what to attain with it.

Processing each risk

Let's poke away the cease of file processing for later and now address the Brainfuck image processing. There are eight symbols we want to take care of:

.Lright:
	addl	$4, %esi
.Lleft:
	movb	$4, %dl
	jmp	.Lwrite

Let's spend a behold at fair correct this for correct now. This code handles each the < and > symbols. Steal that .LS contains all our strings, prologue notwithstanding. The most necessary personality of .LS corresponds to the < symbol. We are calling that the .Lleft label. That has an offset of 0 so we immediately move to telling write(2) we want to write 4 characters and then head to the syscall. If we instead see a > that is the .Lright tag and we want to offset the place to begin of the string by 4. We then fall thru because each outputs are 4 characters in size so we are able to combine the code to set bytes.

We shall be capable of utilize these tricks just a few more times. Let's continue writing the emblem processing code:

.Ldec:
	subl	$5, %esi	# 13 - 5 = 8
.Linc:
	addl	$13, %esi
	movb	$5, %dl
	jmp	.Lwrite

Let's cease right here too and try but every other trick. Each .Ldec and .Linc output strings which are the similar size. We resolve with a understanding to combine the code delight in we did in the .Lleft and .Lright case. On the replacement hand, we assemble no longer have an index of 0 for either. We assemble round this by creatively subtracting the size of the string from the offset of the string farther from the starting of .LS to assemble the offset of the string nearer to the starting of .LS and then fall thru from there. These subtractions and additions, because they're all the way thru the vary of -128 and 127 all encode to 2 bytes.

Let's continue on:

.Lgetchar:
	subl	$13, %esi	# 31 - 13 = 18
.Lputchar:
	addl	$31, %esi
	movb	$13, %dl
	jmp	.Lwrite

Precisely the similar tricks as .Linc and .Ldec. Advise though that if I did no longer extend the putchar string by one personality at the starting, we would possibly per chance presumably no longer utilize any of these tricks and it would price us an entire bunch bigger than the one byte it price to elongate the putchar string.

Let's attain up our processing code:

.Lopenloop:
	incl	%ebx		# Increment loop counter
	addl	$44, %esi
	movb	$10, %dl
	jmp	.Lwrite
.Lcloseloop:
	decl	%ebx		# Decrement loop counter
	js	.Lexit		# %ebx < 0 ? (%rdi == 1 from the write(2) call)
	addl	$63, %esi
	movb	$1, %dl
	jmp	.Lwrite

Sadly, we can't utilize the fall thru trick right here however there are some diverse tricks we are able to utilize. We utilize the truth that every our overall-aim registers are initialized to 0 and the truth that %rbx is saved all the way thru calls to fair correct birth up the utilize of it successfully out of nowhere to act as our loop counter. The .Lopenloop tag is otherwise easy.

The .Lcloseloop tag has a further trick. Attributable to we want to have a correct kind compiler that has correct kind return values, 0 for success and 1 for failure, we decrement the loop counter and take a look at to take into memoir if it has fallen beneath 0. If it has, there is an imbalance of brackets and we must cease processing and mistake out. Sadly, dec preserves the price of the lift flag, so we can't put it to use to uncover if we now have long gone beneath 0. On the replacement hand, it alters all diverse flags and we are able to utilize the signal flag to determine if we now have long gone detrimental. The js opcode jumps if the signal flag is determined, which it most productive shall be if we now have long gone detrimental. We are able to steer determined of a cmp in opposition to -1 this system. Fortunately, %rsi has already been pickle to 1 and is precisely the price we resolve for _exit(2) once we error out so we attain no longer must impact any changes to %rdi. The leisure is straightfoward.

Writing output

We shall be capable of utilize the truth that stdout comes on hand to utilize from our ambiance with out us having to attain anything and as such we can write our compiled code to stdout. We're also going to utilize one other trick right here: we can set aside the .Lwrite tag between the .Lparse and .Lright labels. Right here's because even as you have a jmp that is all the way thru the vary of -128 and 127 bytes, that encodes to a short jump which is most productive 2 bytes. That is the smallest jmp there is. We resolve all our jmps to be short jumps. Our .Lwrite labels ends with a jmp to .Lparse in characterize to learn in the next personality and if it came after the entire code we fair correct wrote, it shall be bigger than -128 bytes away.

Our .Lwrite tag is terribly short:

.Lwrite:
	movb	$4, %al
	syscall			# write(1, string, size);
	jmp	.Lparse

That is it. The entire writes must position 4 into %rax, impact the syscall, and then learn in the next personality.

Dealing with cease of file and exiting

Now let's write the code to take care of cease of file and the code to take care of exit, since we can utilize some fall thru to take care of them. Let's also set aside aside this unique code straight after .Lwrite and before .Lright, but again to make certain all our jmps are short jumps.

It appears to be like to be delight in this:

.Leof:
	cmpl	%edx, %ebx	# Loop counter < 1 ? (i.e., 0)
	jge	.Lexit
	addl	$54, %esi
	movb	$10, %dl
	movb	$4, %al
	syscall
	xorl	%edi, %edi	# Derive ready to exit
.Lexit:
	movb	$1, %al		# _exit(%edi);
	syscall

When we attain the cease of the file, we want to make certain our loop counter is 0. If it's no longer, we now have a bracket imbalance and must error out straight. Like before, %rdi happens to be 1 from the pickle up for the write(2) call so we attain no longer ought to impact any changes.

The the leisure of the .Leof tag is writing the C epilogue. It then sets %rdi to 0 because we're going to exit efficiently and falls thru to .Lexit which sets the syscall amount to the _exit(2) syscall and calls it, which exits this procedure.

Writing the prologue

The closing thing we want to write in characterize to attain our compiler is the prologue. This code goes straight after the necessary tag and before the .Lparse tag.

We want to impact a call to write(2) with our .LSprologue string:

	popq	%rdi		# Write prologue (argc == 1, confidently)
	leal	.LSprologue, %esi
	movb	$21, %dl
	jmp	.Lwrite

Our closing trick is straight away popping the tip of the stack into %rdi correct to birth with up. As we discovered before, our ambiance puts argc in there. See you later as argc is 1, which it must be, that lets in us to pickle up %rdi because it shall be for this preliminary write(2) call in barely 1 byte. If no longer, successfully then your output shall be unsuitable. It is the one error that goes undetected by our compiler. We're going to fair safe that we now have chop this corner.

And that's the explanation it! Our compiler is executed.

Inserting all of it collectively

Right here's what our closing performed compiler appears to be like to be delight in:

	.text
	.globl	necessary
necessary:
	popq	%rdi		# Write prologue (argc == 1, confidently)
	leal	.LSprologue, %esi
	movb	$21, %dl
	jmp	.Lwrite
.Lparse:
	movb	$3, %al		# Load learn(2) plot call
	movb	$1, %dl		# Read one personality
	xorl	%edi, %edi	# Read from stdin
	leaq	(%rsp), %rsi	# Read into high of stack
	syscall			# learn(0, (%rsp), 1);
	incl	%edi		# Field %edi to 1, for write
	xchg	%ebp, %eax	# Store return price in %ebp
	movb	(%rsi), %al	# cmpb imm, %al is the smallest cmp
	leal	.LS, %esi	# Preload string
	cmpl	%edx, %ebp	# EOF ? (%ebp < 1)
	jl	.Leof
	cmpb	$60, %al	# '<' ?
	je	.Lleft
	cmpb	$62, %al	# '>' ?
	je	.Lright
	cmpb	$45, %al	# '-' ?
	je	.Ldec
	cmpb	$43, %al	# '+' ?
	je	.Linc
	cmpb	$44, %al	# ',' ?
	je	.Lgetchar
	cmpb	$46, %al	# '.' ?
	je	.Lputchar
	cmpb	$91, %al	# '[' ?
	je	.Lopenloop
	cmpb	$93, %al	# ']' ?
	je	.Lcloseloop
	jmp	.Lparse		# Comment personality, skip
.Lwrite:
	movb	$4, %al
	syscall			# write(1, string, size);
	jmp	.Lparse
.Leof:
	cmpl	%edx, %ebx	# Loop counter < 1 ? (i.e., 0)
	jge	.Lexit
	addl	$54, %esi
	movb	$10, %dl
	movb	$4, %al
	syscall
	xorl	%edi, %edi	# Derive ready to exit
.Lexit:
	movb	$1, %al		# _exit(%edi);
	syscall
.Lright:
	addl	$4, %esi
.Lleft:
	movb	$4, %dl
	jmp	.Lwrite
.Ldec:
	subl	$5, %esi	# 13 - 5 = 8
.Linc:
	addl	$13, %esi
	movb	$5, %dl
	jmp	.Lwrite
.Lgetchar:
	subl	$13, %esi	# 31 - 13 = 18
.Lputchar:
	addl	$31, %esi
	movb	$13, %dl
	jmp	.Lwrite
.Lopenloop:
	incl	%ebx		# Increment loop counter
	addl	$44, %esi
	movb	$10, %dl
	jmp	.Lwrite
.Lcloseloop:
	decl	%ebx		# Decrement loop counter
	js	.Lexit		# %ebx < 0 ? (%rdi == 1 from the write(2) call)
	addl	$63, %esi
	movb	$1, %dl
	jmp	.Lwrite

.LSprologue:
	.ascii	"a[65536],*p=a;necessary(){"	# 21

.LS:
	.ascii	"--p;"		# 4
	.ascii	"++p;"		# 4
	.ascii	"--*p;"		# 5
	.ascii	"++*p;"		# 5
	.ascii	"*p=getchar();"	# 13
	.ascii	"putchar(*p); "	# 13
	.ascii	"while(*p){"	# 10
	.ascii	"return 0;"	# 9
	.ascii	"}"		# 1

How many bytes is our compiler?

On OpenBSD, there is just a few most necessary overhead, 24 bytes price. We shall be capable of measure the size of our compiler three ways: by itself, with the OpenBSD overhead, and with the entire ELF overhead.

The compiler by itself:

$ size bf256.o
text    facts    bss     dec     hex
231     0       0       231     e7

Entirely 231 bytes! Now now not wrong. If I remember because it shall be, supposedly the unique Brainfuck compiler used to be 240 bytes.

The bring along with the OpenBSD overhead:

$ size bf256
text    facts    bss     dec     hex
255     0       0       255     ff

So even if you remark that it's unfair that I excluded the OpenBSD overhead in my rely, we restful come in at most productive 255 bytes.

With the entire ELF overhead:

$ ls -l bf256
-rwxr-xr-x  1 brian  brian  952 Jul 10 19: 38 bf256

Correctly, 952 bytes. OK, that's where the ELF overhead will get you. I assemble no longer would like to assemble out a hex editor and begin up whittling down all that. I'm going to yelp it's correct kind ample and giving us all permission to ignore the entire ELF overhead.

Conclusion

It is time for me to position away the text editor. My aim used to be 256 bytes and I fair correct scraped by. Can you acquire extra bytes to shave off?

Update: A 216-byte compiler

I was ready to set a further jmp and an extraneous movb $1, %dl, saving a further 4 bytes, which allowed me so that you can add back one byte and commerce the preliminary popq %rdi to an incl %edi, guaranteeing that the preliminary prologue write will consistently work no matter the sequence of arguments to this procedure and restful set a byte. Tag it right here.

I was ready to set a further byte by getting rid of the extra dwelling in the putchar string and the utilize of the closing personality in the getchar string to pad out the putchar string. As right here is a ; personality, in C that turns into an empty assertion, which is nice C. This will get us all the kind down to 944 bytes with the entire ELF overhead. Tag it right here.

I was ready to set a further 11 bytes by getting rid of the epilogue string entirely and easily the utilize of the closeloop string for the epilogue. Curiously clang would now not poke over it. That will get us to 936 bytes with the entire ELF overhead. Tag it right here.

Top

RSS



Read More

Share your love