Around the same time I was working on another self-education project, I wanted to better understand how operating systems and kernel-level programming worked.
Obviously the best way to do this would be to write one from scratch.
I am someone who learns by doing and I love learning how things work at a fundamental level. Wanting to understand computers better, especially the distinction between user space and kernel space, I decided to write a kernel for the Raspberry Pi Zero 2 Wireless.
Doin’ my best
Ok, just a quick note here. The time spent on this project was short and happened years ago at the time of writing. I’m going to do my best to recall the details, but I’m likely to misremember some things…
#Setting some boundaries for myself
Now… given that I am someone who can easily take a simple experiment and then dedicate months to it, I decided to set some limitations for myself. My goal was to see how far I could get creating a basic operating system and I was not allowed to spend any time past 2 weeks.
I would be writing the kernel according to the C99 standard. I could not use any libraries or tools that would abstract away the hardware from me. This meant I needed to understand the hardware at a much deeper level than I was used to.
#Getting an environment set up
Deploying code to, and testing on, a physical device can be really time consuming. I knew my first step was to get a development environment up and running on my Debian machine so I could quickly iterate on my code.
But this created the first problem to solve.
What kind of environment do I use for the Raspberry Pi Zero 2 Wireless? Looking through the QEMU options for emulating a Raspberry Pi there are several options, but none for the model I was using specifically. After some research, I discovered my board used the BCM2710A1 chip, which seemed similar enough to the BCM2837 that QEMU’s raspi3
machine would work for testing.
#Understanding how binary files work
Next up was understanding how to create a binary that could actually boot on bare metal. Unlike normal programs that run on top of an operating system, a kernel needs to be structured in a very specific way that the hardware expects.
In my case, this meant learning about ELF (Executable and Linkable Format) files and how to create a custom linker script. This script would ensure critical kernel code would be placed at the correct memory addresses1 that the Raspberry Pi’s bootloader expects.
#Writing a bootloader
The bootloader is the first piece of code that runs when the device powers on. For the Raspberry Pi, this meant writing assembly code that would:
- Set up the initial processor state
- Initialize the stack pointer
- Clear the BSS section (uninitialized data)
- Jump to our kernel’s main function
This was my first real experience writing ARM assembly, and showed me how much work happens before a single line of C can run.
// This file is going to provide the boot code for the Raspberry Pi
// We start by providing the linker about this code belongs in the
// compiled binary
.section ".text.boot"
// Next we define the entry point for the code - this will need to be
// accessible from outside the assembly file
.global _start
_start:
// Read CPU ID and stop supporting cores (TEMP)
mrs x1, mpidr_el1
and x1, x1, #3
cbz x1, 2f
// If not core 0, halt
1:
wfe
b 1b
// If we are core 0, continue
2:
// Set up the stack pointer
// (stack grows to a lower address per AAPCS64)
ldr x1, =_start
mov sp, x1
// Clear the BSS section
ldr x1, =__bss_start
ldr w2, =__bss_end
3:
cbz w2, 4f
str xzr, [x1], #8
sub w2, w2, #1
cbnz w2, 3b
// Jump to our C code's main entry point
4:
bl kernel_main
// If we ever return from kernel_main, halt this core
b 1b
#Using memory to address hardware
One of the most interesting aspects of kernel development was learning how to interact with hardware through memory-mapped I/O.
Instead of using nice abstractions like device drivers, you have to:
- Look up the physical address of the hardware you want to control in the BCM2835 datasheet
- Map that address to a pointer in your code
- Write specific bit patterns to that address to control the hardware
For example, turning on an LED connected to a GPIO pin involved:
// Map the GPIO controller to memory
volatile uint32_t* gpio = (uint32_t*)GPIO_BASE;
// Set GPIO pin 16 to output mode
gpio[GPIO_FSEL1] = (1 << 18);
// Turn the LED on
gpio[GPIO_SET0] = (1 << 16);
#Getting the kernel to boot
This was easy in QEMU, but not so much on the actual hardware. Why? Well… my assumption that the raspi3
machine in QEMU would be “close enough” to the Raspberry Pi Zero 2 Wireless was wrong. This meant I needed to rework my code and test on actual hardware2.
The key was understanding that the Raspberry Pi’s bootloader expects to find certain files on the SD card:
kernel8.img
- The actual kernel binaryconfig.txt
- Hardware configurationbootcode.bin
andstart.elf
- First-stage bootloader files
Getting this working felt like a huge accomplishment, even though my kernel wasn’t doing anything yet besides spinning in a loop!
#Printing to the screen
Getting text output working turned out to be way more complex than I expected. Without any operating system services, you have two main options:
- Use the UART (serial) port for text output
- Write directly to the framebuffer for graphical output
I started with UART since it seemed simpler. This process involved configuring the correct GPIO pins for UART, setting up the UART controller with the correct baud rate, and writing a basic function to transmit characters. This gave me a way to debug my kernel by sending text to my development machine through QEMU’s serial port emulation.
#Finally some basic graphics
The last major feature I tackled was basic graphical output. This meant learning about the framebuffer3 and then, after a lot of trial and error with the mailbox interface4, I managed to get a simple 128x128 array of pixels drawing in the center of my screen.
#Where I landed by the end
By the end of my two-week limit, I had a very basic kernel that could… almost nothing.
But it did give me a much deeper understanding of how computers work at the lowest level. Knowing every line of code running on a machine, from the moment it powers on until it displays something on screen, gave me a whole new appreciation for the abstractions modern operating systems provide.
When you have to implement everything yourself, even simple tasks like displaying text or reading input become significant engineering efforts. It may seem like a pointless exercise, but this has easily expanded my knowledge of writing software ten fold.