You are here

Writing a basic multitasking OS for ARM Cortex-M3 processor

Uditha Atukorala's picture

ARM Cortex-M processors implements two privilege levels and two stacks that can be used with application software. Combining these features with powerful exception handling capabilities, this is an attempt to discuss my practical approach to develop a simple multitasking Operating System for ARM Cortex-M3 processors using C++ and the challenges faced.

ARM Cortex-M3 processor

Processor Modes

ARM Cortex-M3 MCUs have two processor modes, Handler mode for handling exceptions (interrupts) and Thread mode for executing application software. The processor enters the Thread mode after a reset and returns back to Thread mode after handling the exceptions. When the processor is in Thread mode the CONTROL register can be used to set the privilege level for the application software to be either privileged or unprivileged.

Along with the two processor modes it also implements two stacks. The main stack (MSP) and the process stack (PSP). The processor always use the main stack when in Handler mode but we can use the CONTROL register to control the active stack for the Thread mode.

Note that you can only change the CONTROL register when in privileged Thread mode (or in Handler mode which is always considered to be privileged).

"In an OS environment, ARM recommends that threads running in Thread mode use the process stack and the kernel and exception handlers use the main stack."

Core Registers

ARM Cortex-M3 - Core registers


The Operating System (OS)

Making use of the two privilege levels for Thread mode and two stack implementations, the aim is to develop an OS with the following features:

  • Use the main stack for the kernel and exception handlers
  • Use a dedicated process stack for each user task
  • Kernel (system task) will be executed in privileged Thread mode
  • All user tasks will be executed in unprivileged Thread mode
  • Implement counting semaphores for task synchronisation
  • Use C++ as much as possible

Using C++

Is it the correct choice to use C++ for wring the OS? To some extent no. You are making your life a little difficult by using C++ but it comes with its benefits, specially the data abstraction.

But you can't write a complete OS using C++, not even just C. I had to use a mix of C, C++ and assembly to achieve what I needed.

The examples you see here are developed and tested against STM32F10x family of processors (STM32F103C8T6 to be exact) using GNU ARM toolchain (gcc-arm-none-eabi-4_8-2014q3). I have used Eclipse as my IDE with GNU ARM Eclipse Plug-ins1.

Full source code can be found at GitHub.


The Kernel

The kernel will need to initialise user task stacks, schedule tasks and context switch between tasks. Context switching will involve switching between system task(s) running in kernel space using the main stack (MSP) and users tasks running in unprivileged Thread mode using process stack (PSP). Since we are not aiming to create a fully featured OS, but a very basic one, we won't discuss about device drivers here.

In order to make sure the kernel gets CPU time periodically we will be using a SysTick exception. And to keep the exception processing time for the SysTick exception to a minimum we won't be doing a context switch here. We will schedule a PendSV exception within the SysTick handler if a context switch is required.

By using ARM Cortex-M3 exception priorities, we can make sure our context switching code in the PendSV handler gets executed at a lower priority giving other time critical exceptions a priority2.

Internal data structures

There are a few different parameters we need to keep track of within the kernel:

  1. System tick counter
  2. Internal status of the kernel
  3. A table of all the tasks (system tasks and user tasks)
  4. A reference to the current running task
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
    // maximum number of user tasks
    #define MAX_TASKS 2
 
    typedef uint32_t systicks_t;
    typedef uint32_t status_t;
    typedef uint32_t memaddr_t;
 
    // task table
    struct task_t {
        memaddr_t * sp;
        memaddr_t * stk_end;
        status_t status;
 
        task_t() :
            sp( 0 ), stk_end( 0 ), status( 0 ) { }
    };
 
 
    task_t        _tasks[MAX_TASKS + 1];
    systicks_t    _systicks;
    status_t      _status;
    pid_t         _current_task;

In addition to the above, we will use a restricted stack space allocation for user tasks. This helps us to manage the available RAM to some extent but doesn't guarantee that there won't be any memory clashes. Hardware memory protection is not considered for this basic OS.

1
2
3
4
5
6
7
8
9
    memaddr_t *   _stk_space;
    memaddr_t *   _free_stkp;
 
    // reserve memory for task stack space
    _stk_space = new memaddr_t [KERNEL_TASK_STACK_SIZE];
    memset( _stk_space, 0, ( sizeof( memaddr_t ) * KERNEL_TASK_STACK_SIZE ) );
 
    // set free stack pointer
    _free_stkp = _stk_space;


Exception Handlers

We will be using two exception handlers to aid our kernel tasks, SysTick and PendSV.

SysTick Handler

This needs to be frequent enough so the kernel is given control to manage user tasks efficiently and the exception handler needs to be quick enough so it doesn't become too much of an overhead.

I configured my STM32F10x MCU to have a 1ms SysTick interval.

1
2
3
4
5
6
7
    // system tick frequency
    #define SYSTICK_FREQUENCY_HZ 1000
 
    ...
 
    // configure SysTick exception frequency (1ms / 1kHz for 72MHz MCU clock)
    SysTick_Config( SystemCoreClock / SYSTICK_FREQUENCY_HZ );

And the SysTick exception handler itself would only increment a counter (which can be used for timers) and schedule a PenSV exception if needed.

1
2
3
4
5
6
7
8
9
10
    void kernel::systick_handler() {
 
        _systicks++;
 
        if ( _status & KERNEL_SCHEDULER_FLAG ) {
            _status ^= KERNEL_SCHEDULER_FLAG;
            SCB->ICSR |= SCB_ICSR_PENDSVSET_Msk;
        }
 
    }


PendSV Handler

This is where we would do a context switch between tasks. We will revisit this topic later.


Context switching

At a very basic level, a context switch between two tasks would need to:

  • Save the current task context
  • Store the current task stack pointer so we can use it later
  • Load the next task stack pointer and update the process stack pointer
  • Load the next task context
Saving task context

For our simple OS, saving a task's context3 means saving the general-purpose registers (R0 - R12) along with Stack Pointer (SP), Link Register (LR), Program Counter (PC) and Program Status Register (xPSR) from the ARM Cortex-M3 processor. To make things easier we will use the task's stack to store R0 - R12, LR, PC and xPSR and a data structure within the kernel to store the Stack Pointer reference.

Since we are using a PendSV exception handler for context switching, half of the registers we need to save will be pushed to the main stack (MSP) by the ARM Cortex-M3 processor itself when serving the exception.

ARM Cortex-M3 - Hardware stack frame

Context Switch Example

Assuming you have the address of a variable to store the current task stack pointer loaded onto "r0" and the next task stack pointer loaded onto "r1", a context switch would be (in assembly):

1
2
3
4
    push {r4 - r11}
    str  sp, [r0]
    msr  sp, r1
    pop  {r4 - r11}


Launching user tasks

For simplicity we will treat our user tasks as methods that will never return. These user tasks will be assigned a dedicated stack space by the kernel consuming the pre-allocated memory for user tasks stack spaces.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
    // context saved by the hardware
    struct stkctx_t {
        memaddr_t r0;
        memaddr_t r1;
        memaddr_t r2;
        memaddr_t r3;
        memaddr_t r12;
        memaddr_t lr;
        memaddr_t pc;
        memaddr_t psr;
    };
 
    // context saved by the software
    struct tskctx_t {
        memaddr_t r4;
        memaddr_t r5;
        memaddr_t r6;
        memaddr_t r7;
        memaddr_t r8;
        memaddr_t r9;
        memaddr_t r10;
        memaddr_t r11;
        memaddr_t lr;
    };
 
 
    memaddr_t * kernel::reserve_stack( size_t stk_size ) {
 
        memaddr_t * stack_space = 0;
 
        // add required stack size for context switching
        stk_size += ( sizeof( stkctx_t ) + sizeof( tskctx_t ) ) / sizeof( memaddr_t );
 
        // do we have enough stack space left?
        if ( ( _free_stkp + stk_size ) < ( _stk_space + KERNEL_TASK_STACK_SIZE ) )  {
 
            _free_stkp += stk_size;
            stack_space = ( _free_stkp - 1 );
 
        }
 
        return stack_space;
 
    }

Note there's a Link Register (LR) reference in both hardware stack frame structure (stkctx_t) and software stack frame structure (tskctx_t) in the code listing above. This is intentional. We will use the LR in software stack frame to return to Thread mode using PSP after a context switch.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
    pid_t kernel::launch( void * (* task)(void *), size_t stk_size ) {
 
        // get the next available pid from the task table
        pid_t pid = -1;
 
        for ( size_t i = 1; i <= MAX_TASKS; i++ ) {
            if (! ( _tasks[i].status &amp; TASK_ACTIVE_FLAG ) ) {
                pid = i;
                break;
            }
        }
 
 
        if ( pid > 0 ) {
 
            // reserve stack memory
            memaddr_t * stack = reserve_stack( stk_size );
 
            if ( stack ) {
 
                // setup context
                stkctx_t * s_ctx = (stkctx_t *)( ( (uint32_t) stack ) - sizeof( stkctx_t ) );
                s_ctx->pc  = (uint32_t) task;
                s_ctx->psr = 0x01000000;              // default PSR value
 
                tskctx_t * t_ctx = (tskctx_t *)( ( (uint32_t) s_ctx ) - sizeof( tskctx_t ) );
                t_ctx->lr = 0xFFFFFFFD;
 
 
                // update task table
                _tasks[pid].stk_end = (memaddr_t *)( ( (uint32_t) stack ) - ( ( sizeof( stkctx_t ) + sizeof( tskctx_t ) ) + ( sizeof( memaddr_t ) * stk_size ) ) );
                _tasks[pid].sp      = (memaddr_t *)( ( (uint32_t) s_ctx ) - sizeof( tskctx_t ) );
                _tasks[pid].status |= TASK_ACTIVE_FLAG;
 
 
                // set the scheduler status flag
                _status |= KERNEL_SCHEDULER_FLAG;
 
            } else {
 
                // failure
                pid = -1;
 
            }
 
        }
 
        return pid;
 
    }

You'll notice in our kernel::launch member function above we are storing 0xFFFFFFFD as Link Register (LR) for software stack frame (tskctx_t). This value is our EXC_RETURN4 value which we'll use to return to Thread mode and use PSP after a context switch.

Context Switch Example using EXC_RETURN

If we revisit our context switch code to consider EXC_RETURN it would be:

1
2
3
4
5
    push    {r4 - r11, lr}
    str     psp, [r0]
    msr     psp, r1
    pop     {r4 - r11, lr}
    bx      lr

Note: We are only considering switching between two user tasks here.

PendSV Handler

1
2
3
4
5
6
7
8
9
10
11
12
13
    void PendSV_Handler() {
 
        // save context
        asm volatile (
            "mrs     r0, msp           \n"
            "push    {r4 - r11, lr}    \n"
            "mov     r11, r0           \n"
        );
 
        // call the kernel scheduler
        os::kernel->scheduler();
 
    }

Before entering into the PendSV exception handler the ARM Cortex-M3 MCU would have saved the hardware stack frame (stkctx_t) and would restore it when returning from the exception. Within our PendSV handler we will save a copy of MSP in "r11" before saving software stack frame (tskctx_t) in main stack. This is to restore the MSP if we do a context switch onto a user task.

As the final step we call the kernel scheduler which will use a "BX" instruction to exit from our PendSV exception.

Note: By default GNU ARM compiler will store and restore a Frame Pointer using "r7" and do stack alignment when entering into and exiting functions. To avoid such optimisations we need to declare our handlers as "naked".

1
    void PendSV_Handler()  __attribute__ ( ( isr, naked ) );


Kernel Scheduler

We will use a basic round robin algorithm to reschedule tasks from the task table. We are expecting our tasks to be cooperative and yield when they don't need CPU time using semaphores.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
    pid_t kernel::reschedule() {
 
        pid_t next_task = 0;
 
        for ( pid_t task = 1; task <= MAX_TASKS; task++ ) {
 
            if ( ( _tasks[task].status &amp; TASKS_SEM_WAIT_FLAG )
                || ( task == _current_task ) ) {
                continue;
            }
 
            if ( _tasks[task].status &amp; TASK_ACTIVE_FLAG ) {
 
                next_task = task;
                break;
 
            }
 
        }
 
        return next_task;
 
    }
 
 
 
    void kernel::scheduler() {
 
        pid_t next_task    = reschedule();
        pid_t current_task = _current_task;
 
 
        if ( next_task != 0 ) {
 
            // get a copy of MSP
            asm volatile ( "mrs     r12, msp" );
 
            if ( current_task != 0 ) {
 
                // copy the saved context onto task stack space
                asm volatile (
                    "push    {r4 - r11}            \n"
                    "ldmia   r12!, {r4 - r11, lr}  \n"
                    "mrs     r12, psp              \n"
                    "stmdb   r12!, {r4 - r11, lr}  \n"
                    "pop     {r4 - r11}            \n"
                );
 
            } else {
 
                // context switch will try to reset the MSP
                asm volatile ( "mrs     r11, msp" );
 
            }
 
            _current_task = next_task;
            context_switch( _tasks[next_task].sp, _tasks[current_task].sp );
 
        }
 
 
        // we are switching back to system task
 
        // are we already running the system task?
        if ( current_task != 0 ) {
 
            // save user task context
            asm volatile (
                "mrs     r12, psp              \n"
                "str     r12, [%0, #0]         \n"
                : : "r" ( &amp;_tasks[current_task].sp )
            );
 
        }
 
        // task switch to system task
        asm volatile (
            "msr      msp, %0           \n"
            "pop      {r4 - r11, lr}    \n"
            "bx       lr                \n"
            : : "r" ( &amp;_tasks[0].sp )
        );
 
    }


Context Switcher

1
2
3
    extern "C" {
        void context_switch( kernel::memaddr_t * &amp;next_spp, kernel::memaddr_t * &amp;current_spp );
    }
1
2
3
4
5
6
7
8
9
10
                .thumb_func
                .globl  context_switch
    context_switch:
                // r0 = next stack ptr, r1 = current stack ptr
                str     r12, [r1, #0]
                ldr     r12, [r0, #0]
                msr     msp, r11
                ldmia   r12!, {r4 - r11, lr}
                msr     psp, r12
                bx      lr

Refer to the Nisos project at GitHub for full source code of this post.



  1. Zylin Embedded CDT and OpenOCD have been used for on-chip debugging 

  2. Refer to Exception model in ARM Cortex-M3 Devices Generic User Guide for more information on exceptions and exception priorities. 

  3. Context (computing), Wikipedia 

  4. ARM Cortex-M3 Devices Generic User Guide, 2.3.7 Exception entry and return