By Sijmen J. Mulder, 3 August 2023.
Casey of Molly Rocket posted four interview questions asked for his 1994 Microsoft intership.
This page is about the third: implementing a flood fill check. A flood fill is what you get when you use a bucket tool in a drawing program to fill an area covered by one color with another.
Implementing a flood fill is a fun exercise by itself, but here it’s about a specific part in a specific situation – testing if 4 pixels packed into a byte are all the same specific color:
Implement a function taking a byte (8 bits), representing 4 packed pixels of 2 bits each, and a 2-bit pattern, and checks if all pixels in the byte match the pattern:
int flood_check(uint8_t pattern, uint8_t target)
Examples:
target pattern result explanation 00 00 00 00
10
0
pattern doesn’t occur in target 00 10 10 00
10
0
only par 2 and 3 match 10 10 10 10
10
1
all pairs match
(Update: Casey’s video for the problem is online now and it turns out the point was to return true if any of the pixels match, and, preprocessing the pattern is allowed. This yields a very different solution. Worth a watch!)
One solution we could try is comparing each pair individually against the pattern. The process would roughly be:
But there’s a more elegant solution that doesn’t involve having to figure out these different mask values: pack pattern first and then compare that against target. Suppose we have pattern ab (two bits of unspecified value). We do:
00 00 00 ab original pattern 00 00 ab 00 pattern shifted left by 2 00 ab 00 00 pattern shifted left by 4 ab 00 00 00 pattern shifted left by 6 ----------- OR ab ab ab ab packed pattern
In C, <<
is the shift-left operator and
|
performs the binary OR.
Now it’s a matter of comparing the packed pattern to target. Implementation:
int flood_check_1(uint8_t pattern, uint8_t target)
{
return (pattern << 6 |
pattern << 4 |
pattern << 2 |
pattern) == target;
}
This version does require that pattern really only holds two bits; if higher bits are set they will mess up the combined pattern. This can be alleviated by masking out the other bits:
int flood_check_2(uint8_t pattern, uint8_t target)
{
return ((pattern & 3) << 6 |
(pattern & 3) << 4 |
(pattern & 3) << 2 |
(pattern & 3)) == target;
}
Decimal 3 is 11 in binary. AND-ing that with another value using
&
masks out other bits, that is to say, any other bits
that are 1 become 0.
As an additional exercise, I thought it fun to implement the same in x86-64 assembly. Here is the listing in NASM syntax:
flood_check_x86_64:
xor eax, eax ; clear register A, the output register
xor ecx, ecx ; clear register C, for our packed pattern
mov cl, dil ; copy pattern to C (1 byte)
and cl, 3 ; mask out all but lower 2 bits
mov al, cl ; copy pattern to A where we'll pack it
shl al, 2 ; shift left by 2
or al, cl ; OR with pattern. now 2 packed patterns in C
shl al, 2
or al, cl ; and again, now 3 packed patterns in C
shl al, 2
or al, cl ; and again, now 4 packed patterns in C
cmp al, sil ; compare with target in SI
sete al ; 1 if equal
ret
Again given pattern ab, this version builds up the packed
ab ab ab ab
, used to test against
target, by repeatedly shifting left and adding the pattern on
the end:
00 00 00 ab shift left, OR with pattern 00 00 ab ab shift left, OR with pattern 00 ab ab ab shift left, OR with pattern ab ab ab ab
Some notes on assembler and this code:
mov assings a value, shl is shift-left,
cmp compares (setting status flags), sete sets
a register to 1 if comparison was equal. In this syntax, the left
operand is also the destination, e.g., or a, b
is
equivalent to a = a | b
in C.
We’re using the A, C, SI, and
DI registers. They are 64 bits, but e.g. eax
refers to the lower 32 bits of A, and al
to
the lower 8 bits. Instructions using these smaller values generally
take up fewer bytes in machine code.
This x86-64 calling convention places the first two function arguments in DI and SI respectively, and expects a return value in A.
xor eax, eax
is equivalent to mov eax,
0
but yields shorter machine code. It even clears the upper
32 bits of the full 64 bit register.
Perhaps this could be taken to the next level by using vector instructions to “fan out“ the pattern or target and compare, but that’s way beyond my so I’ll use the easy cop-out of leaving that as ”an exercise for the reader”.