Which of the following are normally part of the stack frame? (select all that apply)

Architecture

Joseph Yiu, in The Definitive Guide to ARM® CORTEX®-M3 and CORTEX®-M4 Processors (Third Edition), 2014

CONTROL register

The CONTROL register (Figure 4.9) defines:

FIGURE 4.9. CONTROL register in Cortex-M3, Cortex-M4, Cortex-M4 with FPU. The bit nPRIV is not available in the Cortex-M0 and is optional in the Cortex-M0+ processor

•

The selection of stack pointer (Main Stack Point/Process Stack Pointer)

•

Access level in Thread mode (Privileged/Unprivileged)

In addition, for Cortex-M4 processor with a floating point unit, one bit of the CONTROL register indicates if the current context (currently executed code) uses the floating point unit or not.

Note: The CONTROL register for ARMv6-M (e.g., Cortex-M0) is also shown for comparison. In ARMv6-M, support of nPRIV and unprivileged access level is implementation dependent, and is not available in the first generation of the Cortex-M0 products and Cortex-M1 products. It is optional in the Cortex-M0+ processor.

The CONTROL register can only be modified in the privileged access level and can be read in both privileged and unprivileged access levels. The definition of each bit field in the CONTROL register is shown in Table 4.3.

Table 4.3. Bit Fields in CONTROL Register

Bit	Function
nPRIV (bit 0)	Defines the privileged level in Thread mode: When this bit is 0 (default), it is privileged level when in Thread mode. When this bit is 1, it is unprivileged when in Thread mode. In Handler mode, the processor is always in privileged access level.
SPSEL (bit 1)	Defines the Stack Pointer selection: When this bit is 0 (default), Thread mode uses Main Stack Pointer (MSP). When this bit is 1, Thread mode uses Process Stack Pointer (PSP). In Handler mode, this bit is always 0 and write to this bit is ignored.
FPCA (bit 2)	Floating Point Context Active – This bit is only available in Cortex-M4 with floating point unit implemented. The exception handling mechanism uses this bit to determine if registers in the floating point unit need to be saved when an exception has occurred. When this bit is 0 (default), the floating point unit has not been used in the current context and therefore there is no need to save floating point registers. When this bit is 1, the current context has used floating point instructions and therefore need to save floating point registers. The FPCA bit is set automatically when a floating point instruction is executed. This bit is clear by hardware on exception entry. There are several options for handling saving of floating point registers. This will be covered in Chapter 13.

After reset, the CONTROL register is 0. This means the Thread mode uses the Main Stack Pointer as Stack Pointer and Thread mode has privileged accesses. Programs in privileged Thread mode can switch the Stack Pointer selection or switch to unprivileged access level by writing to CONTROL (Figure 4.10). However, once nPRIV (CONTROL bit 0) is set, the program running in Thread can no longer access the CONTROL register.

FIGURE 4.10. Stack Pointer selection

A program in unprivileged access level cannot switch itself back to privileged access level. This is essential in order to provide a basic security usage model. For example, an embedded system might contain untrusted applications running in unprivileged access level and the access permission of these applications must be restricted to prevent security breaches or to prevent an unreliable application from crashing the whole system.

If it is necessary to switch the processor back to using privileged access level in Thread mode, then the exception mechanism is needed. During exception handling, the exception handler can clear the nPRIV bit (Figure 4.11). When returning to Thread mode, the processor will be in privileged access level.

FIGURE 4.11. Switching between privileged thread mode and unprivileged thread mode

When an embedded OS is used, the CONTROL register could be reprogrammed at each context switch to allow some application tasks to run with privileged access level and the others to run with unprivileged access level.

The settings of nPRIV and SPSEL are orthogonal. Four different combinations of nPRIV and SPSEL are possible, although only three of them are commonly used in real world applications, as shown in Table 4.4.

Table 4.4. Different Combinations of nPRIV and SPSEL

nPRIV	SPSEL	Usage Scenario
0	0	Simple applications – the whole application is running in privileged access level. Only one stack is used by the main program and interrupt handlers. Only the Main Stack Pointer (MSP) is used.
0	1	Applications with an embedded OS, with current executing task running in privileged Thread mode. The Process Stack Pointer (PSP) is selected in current task, and the MSP is used by OS Kernel and exception handlers.
1	1	Applications with an embedded OS, with current executing task running in unprivileged Thread mode. The Process Stack Pointer (PSP) is selected in current task, and the MSP is used by OS Kernel and exception handlers.
1	0	Thread mode tasks running with unprivileged access level and use MSP. This can be observed in Handler mode but is less likely to be used for user tasks because in most embedded OS, the stack for application tasks is separated from the stack used by OS kernel and exception handlers.

In most simple applications without an embedded OS, there is no need to change the value of the CONTROL register. The whole application can run in privileged access level and use only the MSP (Figure 4.12).

FIGURE 4.12. Simple applications do not require unprivileged Thread mode

To access the CONTROL register in C, the following functions are available in CMSIS-compliant device-driver libraries:

x = __get_CONTROL(); // Read the current value of CONTROL

__set_CONTROL(x); // Set the CONTROL value to x

There are two points that you need to be aware of when changing the value of the CONTROL register:

•

For the Cortex-M4 processor with floating point unit (FPU), or any variant of ARMv7-M processors with (FPU), the FPCA bit can be set automatically due to the presence of floating point instructions. If the program contains floating point operations and the FPCA bit is cleared accidentally, and subsequently an interrupt occurs, the data in registers in the floating point unit will not be saved by the exception entry sequence and could be overwritten by the interrupt handler. In this case, the program will not be able to continue correct processing when resuming the interrupted task.

•

After modifying the CONTROL register, architecturally an Instruction Synchronization Barrier (ISB) instruction (or __ISB() function in CMSIS compliant driver) should be used to ensure the effect of the change applies to subsequent code. Due to the simple nature of the Cortex-M3, Cortex-M4, Cortex-M0+, Cortex-M0, and Cortex-M1 pipeline, omission of the ISB instruction does not cause any problem.

To access the Control register in assembly, the MRS and MSR instructions are used:

MRS r0, CONTROL ; Read CONTROL register into R0

MSR CONTROL, r0 ; Write R0 into CONTROL register

You can detect if the current execution level is privileged by checking the value of IPSR and CONTROL:

int in_privileged(void)

{

if (__get_IPSR() != 0) return 1; // True

else

if ((__get_CONTROL() & 0x1)==0) return 1; // True

else return 0; // False

}

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012408082900004X

Architecture

Joseph Yiu, in Definitive Guide to Arm® Cortex®-M23 and Cortex-M33 Processors, 2021

4.3.5 Setting up and accessing of stack pointers and stack limit registers

After the processor powers up:

•

If the TrustZone is not implemented, the processor automatically initializes the MSP by reading the vector table.

•

If the TrustZone is implemented, the processor automatically initializes the MSP_S by reading the Secure vector table.

More information on the vector table is covered in Section 8.6. Other stack pointers not initialized by the reset sequence have to be initialized by software. This includes the situation where Secure software needs to launch a Non-secure application after finishing its security initialization (The Non-secure MSP (MSP_NS) must be initialized by Secure software before starting the Non-secure application).

Although just one of the SPs is selected at a time (when using SP or R13 to access it), it is possible to specify read/write directly to the MSP and PSP, providing that the processor is in privileged state. If the processor is in Secure privileged state, the software can also access the Non-secure stack pointers. The CMSIS-CORE software framework provides a number of functions for stack pointer access (Table 4.12).

Table 4.12. CMSIS-CORE functions for stack pointer access.

CMSIS-CORE function	Usage	Applicable security state
__get_MSP(void)	Gets value of the MSP in the current security state	S/NS
__get_PSP(void)	Gets value of the PSP in the current security state	S/NS
__set_MSP(uint32_t topofstack)	Sets value of the MSP in the current security state	S/NS
__set_PSP(uint32_t topofstack)	Sets value of the PSP in current the security state	S/NS
__TZ_get_MSP_NS(void)	Gets value of the MSP_NS	S
__TZ_get_PSP_NS(void)	Gets value of the PSP_NS	S
__TZ_get_SP_NS(void)	Gets value of the MSP_NS/PSP_NS (dependent upon which one is currently selected in the Non-secure world.)	S
__TZ_set_MSP_NS(uint32_t topofstack)	Sets value of the MSP_NS	S
__TZ_set_PSP_NS(uint32_t topofstack)	Sets value of the PSP_NS	S
__TZ_set_SP_NS(uint32_t topofstack)	Sets value of the MSP_NS/PSP_NS (dependent upon which one is currently selected in the Non-secure world.)	S

Similar to stack pointer access, a number of functions are defined in the CMSIS-CORE for accessing stack limit registers (Table 4.13).

Table 4.13. CMSIS-CORE functions for stack limit registers access.

CMSIS-CORE function	Usage	Applicable security state
__get_MSPLIM(void)	Gets value of the MSPLIM in the current security state	S/NS
__get_PSPLIM(void)	Gets value of the PSPLIM in the current security state	S/NS
__set_MSPLIM(uint32_t limitofstack)	Sets value of the MSPLIM in the current security state	S/NS
__set_PSPLIM(uint32_t limitofstack)	Sets value of the PSPLIM in the current security state	S/NS
__TZ_get_MSPLIM_NS(void)	Gets value of the MSPLIM_NS	S
__TZ_get_PSPLIM_NS(void)	Gets value of the PSPLIM_NS	S
__TZ_set_MSPLIM_NS(uint32_t limitofstack)	Sets value of the MSPLIM_NS	S
__TZ_set_PSPLIM_NS(uint32_t limitofstack)	Sets value of the PSPLIM_NS	S

When using assembly language programming, these functions can be carried out using MRS (move from special register to general register) and MSR (move from general register to special register) instructions.

In general, it is not recommended to change the value of a currently selected SP in C functions as part of the stack memory could be used for storing local variables or other data. Most application codes do not need to explicitly access the MSP and PSP. In the case of the passing of parameters in function calls, the compiler automatically handles the stack management and, thus, is totally transparent to application codes.

For software developers working on embedded OS designs, access to the MSP and the PSP is necessary in situations such as:

Context switching operations that require the direct manipulation of the PSP.

During the execution of an OS's API (MSP is used)—the API might need to read the data pushed into the stack (using a PSP) before the API is called (e.g., the pushed data would contain the registers’ states before the execution of an SVC instruction—some of which could be the input parameters for the SVC function).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128207352000044

OS Support Features

Joseph Yiu, in The Definitive Guide to ARM® CORTEX®-M3 and CORTEX®-M4 Processors (Third Edition), 2014

10.2 Shadowed stack pointer

In Chapter 4 we said that there are two stack pointers in the Cortex®-M processors:

•

Main Stack Pointer (MSP) is the default stack pointer. It is used in the Thread mode when the CONTROL bit[1] (SPSEL) is 0, and it is always used in Handler mode.

•

Processor Stack Pointer (PSP) is used in Thread mode when the CONTROL bit[1] (SPSEL) is set to 1.

Stack operations like PUSH and POP instructions, and most instructions that use SP (R13) use the currently selected stack pointer. You can also access the MSP and PSP directly using MRS and MSR instructions. In simple applications without an embedded OS or RTOS, you can just use the MSP for all operations and ignore the PSP.

In systems with an embedded OS or RTOS, the exception handlers (including part of the OS kernel) use the MSP, while the application tasks use the PSP. Each application task has its own stack space (Figure 10.1), and the context-switching code in the OS updates the PSP each time the context is switched.

FIGURE 10.1. The stack for each task is separated from the others

This arrangement has several benefits:

•

If an application task encounters a problem that leads to a stack corruption, the stack used by the OS kernel and other tasks is still likely to be intact, thus helping to improve system reliability.

•

The stack space for each task only needs to cover the maximum stack usage plus one level of stack frame (maximum 9 words including padding in Cortex-M3 or Cortex-M4 without floating point unit, or maximum 27 words for Cortex-M4 with floating point unit). Stack space needed for the ISR and nested interrupt handling is allocated in the main stack only.

•

It makes it easy to create an efficient OS for the Cortex-M processors.

•

An OS can also utilize the Memory Protection Unit (MPU) to define the stack region which an application task can use. If an application task has a stack overflow problem, the MPU can trigger a MemManage fault exception and prevent the task from overwriting memory regions outside the allocated stack space for this task.

After power up, the MSP is initialized from the vector table as a part of the processor’s reset sequence. The C startup code added by the toolchain can also carry out another stage of stack initialization for the main stack. It is then possible to start using PSP by initializing it using the MSR instruction and then write to the CONTROL register to set SPSEL, but it is uncommon to do so.

The simplest way to initialize and start using PSP (not suitable for most OS):

LDR R0,=PSP_TOP ; PSP_TOP is a constant defines the top address of stack

MSR PSP, R0 ; Set PSP to the top of a process stack

MRS R0, CONTROL ; Read current CONTROL

ORRS R0, R0, #0x2 ; Set SPSEL

MSR CONTROL, R0 ; write to CONTROL

ISB ; Execute and ISB after updating CONTROL,

; this is an architectural recommendation

Typically, to use the process stack, put OS in Handler mode, and program the PSP directly, then use an exception return sequence to “jump” into the application task.

For example, when an OS first starts in Thread mode, it can use the SVC exception to enter the Handler mode (Figure 10.2). Then it can create a stack frame in the process stack, and trigger an exception return that uses the PSP. When the stack frame is loaded, the application task is started.

FIGURE 10.2. Initialization of a task in a simple OS

In OS designs, we need to switch between different tasks. This is typically called context switching. Context switching is usually carried out in the PendSV exception handler, which can be triggered by the periodic SysTick exception. Inside the context-switching operation, we need to:

•

Save the current status of the registers in the current task

•

Save the current PSP value

•

Set the PSP to the last SP value for the next task

•

Restore the last values for the next tasks

•

Use exception return to switch to the task

For example, in Figure 10.3, a simplified context-switching operation is shown.

FIGURE 10.3. Concept of context switching

Note that the context switching is carried out in PendSV, which is typically programmed to the lowest priority level. This prevents context switching from happening in the middle of an interrupt handler. This is explained in detail in section 10.4.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124080829000105

Advanced Programming Features and System Behavior

Joseph Yiu, in The Definitive Guide to the ARM Cortex-M3 (Second Edition), 2010

12.1 Running a System with Two Separate Stacks

One of the important features of ARMv7-M architecture is the capability to allow the user application stack to be separated from the privileged/kernel stack. If the optional Memory Protection Unit (MPU) is implemented, it could be used to block user applications from accessing kernel stack memory so that they cannot crash the kernel by memory corruption.

Typically, a robust system based on the Cortex™-M3 has the following properties:

•

Exception handlers using Main Stack Pointer (MSP)

•

Kernel code invoked by a System Tick (SYSTICK) exception at regular intervals, running in the privileged access level for task scheduling and system management

•

User applications running as threads with the user access level (nonprivileged); these applications use Process Stack Pointer (PSP)

•

Stack memory for kernel and exception handlers is pointed to by the MSP, and the stack memory is restricted to privileged accesses only, if the MPU is available

•

Stack memory for user applications is pointed to by the PSP

Assume that the system memory has a Static Random Access Memory (SRAM) memory and a Memory Protection Unit (MPU), we could set up the MPU so that the SRAM is divided into two regions for user and privileged access (see Figure 12.1). Each region is used by application data, as well as by stack memory space. Since stack operation in the Cortex-M3 is full descending, the initial value of stack pointers needs to be pointed to the top of the regions.

FIGURE 12.1. Example Memory Use with Privileged Data and User Application Data.

After power-up, only the MSP is initialized (by fetching address 0x0 in the power-up sequence). Additional steps are required to set up a completely robust two-stack system. For applications in assembly code, it can simply be

; Start at privileged level (this code locates in user

; accessible memory)

BL MpuSetup ; Setup MPU regions and enable memory

; protection

LDR R0,=PSP_TOP ; Setup Process SP to top of process stack

MSR PSP, R0

BL SystickSetup ; Setup Systick and systick exception to

; invoke OS kernel at regular intervals

MOV R0, #0x3 ; Setup CONTROL register so that user

; program use PSP,

MSR CONTROL, R0 ; and switch current access level to user

ISB ; Instruction Synchronization Barrier

B UserApplicationStart ; Now we are in user access

; level. Start user code

This arrangement is fine for assembler, but for C programs, switching stack pointers in the middle of a C function can cause loss of local variables (because in C functions or subroutines, local variables may be put onto stack memory). The Cortex-M3 Technical Reference Manual (TRM) [Ref. 1] suggests that we use an interrupt service routine (ISR) like Supervisor Call (SVC) to invoke the kernel, and then change the stack pointer by modifying the EXC_RETURN value (see Figure 12.2).

FIGURE 12.2. Initialization of Multiple Stacks in a Simple OS.

In most cases, EXC_RETURN modification and stack switching are included in the operating system (OS). After the user application starts, the SYSTICK exception can be used regularly to invoke the OS for system management and possibly arrange context switching, if needed (see Figure 12.3).

FIGURE 12.3. Context Switching in a Simple OS.

Note that context switching is carried out in PendSV (a low-priority exception) to prevent context switching at the middle of an interrupt handler.

However, many applications do not require an OS, but it is still helpful to use separate stacks for different sections of application code as a way to improve reliability. One possible way to handle this is to start Cortex-M3 with the MSP pointed to a process stack region. This way the initialization is done with the process stack region but using MSP. Before starting the user application, the following code is executed:

; Start at privileged level, MSP point to User stack

MpuSetup(); // Setup MPU regions and enable memory protection

SystickSetup(); // Setup Systick and systick exception for routine

// system management code

SwitchStackPointer(); // Call an assembly subroutine to switch SP

/*; ------Inside SwitchStackPointer -----

PUSH {R0, R1, LR}

MRS R0, MSP ; Save current stack pointer

LDR R1, =MSP_TOP ; Change MSP to new location

MSR MSP, R1

MSR PSP, R0 ; Store current stack pointer in PSP

MOV R0, #0x3

MSR CONTROL, R0 ; Switch to user mode, and use PSP as

; current stack

POP {R0, R1, PC} ; Return

; ------ Back to C program -----*/

; Now we are in User mode, using PSP and the local variables

; still here

UserApplicationStart(); // Start application code in user mode

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781856179638000156

Starting Cortex-M3 Development Using the GNU Tool Chain

Joseph Yiu, in The Definitive Guide to the ARM Cortex-M3 (Second Edition), 2010

19.4.5 Example 5: C Program

One of the main components in the GNU tool chain is the C compiler. In this example, the whole executable is coded using C. In addition, a linker script is needed to put the segments in place. First, let's look at the C program file:

========== example5.c ==========

// Declare functions

void myputs(char *string1);

void myputc(char mychar);

int main(void);

void Reset_Handler(void);

void NMI_Handler(void);

void HardFault_Handler(void);

void UartInit(void);

// Declare _start - C startup code

extern void _start(void);

//--------------------------------

void Reset_Handler(void)

{

// Call the CS3 reset handler

_start();

}

//--------------------------------

//Dummy handler

void NMI_Handler(void)

{

return;

}

//--------------------------------

//Dummy handler

void HardFault_Handler(void)

{

return;

}

//--------------------------------

void UartInit(void)

{

/* Add your UART initialization code here */

return;

}

//--------------------------------

// Start of main program

int main(void)

{

#define NVIC_CCR (*((volatile unsigned long *)(0xE000ED14)))

const char *helloworld=“Hello world\n”;

NVIC_CCR= NVIC_CCR | 0x200; /* Set STKALIGN in NVIC */

UartInit();

myputs(helloworld);

while(1);

return(0);

}

//--------------------------------

// Function to print a string

void myputs(char *string1)

{

char mychar;

int j;

j=0;

do {

mychar = string1[j];

if (mychar!=0) {

myputc(mychar);

j++;

}

} while (mychar != 0);

return;

}

//--------------------------------

void myputc(char mychar)

{

#define UART0_DATA (*((volatile unsigned long *)(0x4000C000)))

#define UART0_FLAG (*((volatile unsigned long *)(0x4000C018)))

// Wait until busy flag is clear

while ((UART0_FLAG & 0x20) != 0);

// Output character to UART

UART0_DATA = mychar;

return;

}

========== end of file ==========

This program prints the “Hello world” message via a UART interface. Depending on the UART you use, you need to provide your own UART setup code or use the device driver library from a microcontroller vendor to initialize the UART.

After reset, the reset handler calls the _start function, which is the C start-up routine. When the C runtime initialization is done, it executes the main() code. The CodeSourcery G++ packages use the CS3 (CodeSourcery Common Start-up Code Sequence) for start-up and vector table handling in microcontrollers. CS3 has a predefined vector table for the Cortex-M3 processor called “__cs3_interrupt_vector_micro.” The vector table is shown in Table 19.2.

Table 19.2. Cortex-M3 Vector Table Definition in CS3

Number	Vector Name	Description
0	__cs3_stack	Initial Main Stack Pointer
1	__cs3_reset	Reset vector
2	__cs3_isr_nmi	Nonmaskable interrupt
3	__cs3_isr_hard_fault	Hard fault
4	__cs3_isr_mpu_fault	Memory management fault
5	__cs3_isr_bus_fault	Bus fault
6	__cs3_isr_usage_fault	Usage fault
7 … 10	__cs3_isr_reserved_7 … 10	Reserved exception types
11	__cs3_isr_svcall	Supervisor Call
12	__cs3_isr_debug	Debug monitor exception
13	__cs3_isr_reserved_13	Reserved exception types
14	__cs3_isr_pendsv	PendSV
15	__cs3_isr_systick	System Tick Timer
16 … 47	__cs3_isr_external_0 … __cs3_isr_external_31	External interrupt

The exception handlers we used in the program are mapped into these vector symbols using a linker script. In addition, the memory layout including the vector table positioning is also defined in this file. Users of CodeSourcery G++ Personal and Professional Editions, can find linker scripts for most available Cortex-M3 microcontrollers already included in the installation. For the CodeSourcery G++ Lite edition, a number of generic linker scripts in the arm-none-eabi\lib path can be found. In this example, we use a linker script modified from the generic linker script for Cortex-M processors (generic-m.ld). This modified linker script (cortexm3.ld) is provided in Appendix F.

The command for the compiler and link process is as follows:

$> arm-none-eabi-gcc –mcpu=cortex-m3 -mthumb example5.c

-T cortexm3.ld -o example5.o

The memory map information is passed on to the linker during the compile stage.

The gcc automatically carried out the linking, so there is no need to carry out a linking stage. Finally, the binary and disassembled list file can be generated:

$> arm-none-eabi-objcopy -Obinary example5.out example5.bin

$> arm-none-eabi-objdump -S example5.out > example5.list

The use of Reset_Handler in this C example is optional. You can point “__cs_reset” to the “_start” start-up routine in the linker script instead.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781856179638000223

OS support features

Joseph Yiu, in Definitive Guide to Arm® Cortex®-M23 and Cortex-M33 Processors, 2021

11.3.2 Operations of banked stack pointers

As shown in Fig. 4.6, there can be up to four stack pointers in an Armv8-M processor. These are:

•

Secure Main Stack Pointer (MSP_S)

•

Secure Processor Stack Pointer (PSP_S)

•

Non-secure Main Stack Pointer (MSP_NS)

•

Non-secure Process Stack Pointer (PSP_NS)

The selection of stack pointers being used is determined by the processor's security states (either Secure or Non-secure), the processor's mode (Thread or Handler), and the SPSEL setting in the CONTROL register. This is covered in Sections 4.2.2.3 and 4.3.4. If TrustZone is not implemented, only the Non-secure stack pointers are available.

For programming purposes, normally:

•

The MSP and the PSP symbols refer to the stack pointer in the current selected state:

○

If the processor is in a Secure state, MSP means MSP_S and PSP means PSP_S

○

If the processor is in a Non-secure state, MSP means MSP_NS and PSP means PSP_NS

•

Secure software can access the MSP_NS and PSP_NS using MSR and MRS instructions

By default, a Cortex-M processor uses the Main Stack Pointer (MSP) to boot up:

•

if TrustZone is implemented, the processor boots up in Secure privileged state and, by default, selects MSP_S (the Secure MSP). The default value of CONTROL_S.SPSEL (bit 1 in the CONTROL_S register) is 0, which indicates that the MSP has been selected.

•

if TrustZone is not implemented, the processor boots up in (Non-secure) privileged state and, by default, selects MSP. The default value of CONTROL.SPSEL (bit 1 in the CONTROL register) is 0, which indicates that the MSP has been selected.

In most applications without an embedded OS or RTOS, the MSP can be used for all operations and the PSP can be ignored.

For most RTOS based systems without TrustZone, the PSP is used by application threads for stack operations. The MSP is used for booting up, for initialization and for exception handlers (including OS kernel codes). For each of these software components, stack operation instructions (such as PUSH, POP, VPUSH, and VPOP) and most instructions that use SP (e.g., using SP/R13 as a base address for data access), will use the currently selected stack pointer.

Each application task/thread has its own stack space (Fig. 11.4: Note that the placement of the stack space in this diagram is just an example) with the context switching code in the OS updating the PSP each time the context is switched.

Fig. 11.4. The stack memory allocated for each task/thread is separated from other stacks.

Within the context switching operations, the OS code accesses the PSP directly using the MRS and MSR instructions. The access of PSP include:

•

saving the PSP value of the task to be switched out

•

setting the PSP to the previous PSP value of the task to be switched into

By separating the stack spaces, the OS can use either the MPU (Memory Protection Unit) or the stack limit check feature to restrict the maximum amount of stack space each task/thread uses. In addition to restricting the stack memory that is consumed, the OS can also utilize the MPU to restrict which memory address ranges an application task/thread is able to access. More information on this topic is covered in Chapter 12.

Cortex-M processor systems with TrustZone have four stack pointers. In a typical system which has security software solutions such as Trusted Firmware-M [1] and secure libraries, the way the four stack pointers could be used is as per Fig. 11.5.

Fig. 11.5. Stack pointer usage in a TrustZone system.

Using the software architecture as shown in Fig. 11.5:

•

the security management software (such as the Secure Partition Manager in Trusted Firmware-M) [1] executes in Secure privileged state, and

•

the secure libraries (such as IoT cloud connectors/clients) execute in Secure unprivileged state.

By so doing, the security management software can configure the Secure MPU to isolate the various Secure libraries and, thus, prevent those libraries from accessing/corrupting critical data being used by the security management software. The use of PSP_S (Secure Process Stack Pointer) allows us to separate the stacks of these libraries.

Similar to the execution of multiple tasks on the Non-secure side in an RTOS environment, these Secure unprivileged libraries might need to be accessed at different times and, to do so, will need security management software to handle the context switching of these libraries. This will involve reprogramming the PSP_S and the reconfiguration of the Secure MPU at each context switch.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128207352000111

Cortex-M3 Basics

Joseph Yiu, in The Definitive Guide to the ARM Cortex-M3 (Second Edition), 2010

3.2.3 The Control Register

The control register is used to define the privilege level and the SP selection. This register has 2 bits, as shown in Table 3.3.

Table 3.3. Cortex-M3 Control Register

Bit	Function
CONTROL[1]	Stack status: 1 = Alternate stack is used 0 = Default stack (MSP) is used If it is in the thread or base level, the alternate stack is the PSP. There is no alternate stack for handler mode, so this bit must be 0 when the processor is in handler mode.
CONTROL[0]	0 = Privileged in thread mode 1 = User state in thread mode If in handler mode (not thread mode), the processor operates in privileged mode.

CONTROL[1]

In the Cortex-M3, the CONTROL[1] bit is always 0 in handler mode. However, in the thread or base level, it can be either 0 or 1.

This bit is writable only when the core is in thread mode and privileged. In the user state or handler mode, writing to this bit is not allowed. Aside from writing to this register, another way to change this bit is to change bit 2 of the LR when in exception return. This subject is discussed in Chapter 8, where details on exceptions are described.

CONTROL[0]

The CONTROL[0] bit is writable only in a privileged state. Once it enters the user state, the only way to switch back to privileged is to trigger an interrupt and change this in the exception handler.

To access the control register in C, the following CMSIS functions are available in CMSIS compliant device driver libraries:

x = __get_CONTROL(); // Read the current value of CONTROL

__set_CONTROL(x); // Set the CONTROL value to x

To access the control register in assembly, the MRS and MSR instructions are used:

MRS r0, CONTROL ; Read CONTROL register into R0

MSR CONTROL, r0 ; Write R0 into CONTROL register

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781856179638000065

Programming with Embedded OS

Joseph Yiu, in The Definitive Guide to Arm® Cortex®-M0 and Cortex-M0+ Processors (Second Edition), 2015

20.1 Introduction

20.1.1 Background

In Chapter 10 we covered the hardware features in the Cortex®-M0 and Cortex-M0+ processors related to OS operations:

•

Banked stack pointers (Main Stack Pointer and Process Stack Pointer)

•

The SVCall and PendSV exceptions and the SVC instruction

•

SysTick timer

We have also covered the concept of context switching, and how it can be done. In this chapter, we will cover examples of using various features in a typical embedded OS called RTX (Real-Time eXecutive) kernel.

Before we start going into technical details of how to use an embedded OS, let us first revisit some of the general concepts of OSs in embedded applications.

20.1.2 Embedded OS and RTOS

There are many types of OSs in the world. Most of you might already be very familiar with OS for personal computers. For embedded systems, there are also a range of OSs available. In general, an embedded OS can be anything from a simple task scheduler to a fully-featured OS like Linux. Many of the OS running on small microcontrollers only provide task scheduling and intertask communications. On these systems, you usually would not find any fancy graphic user interface or a file system. Some of them might provide additional features such as a TCP/IP stack.

Some of the embedded OS are called Real-Time Operating Systems (RTOSs), which is a subset of embedded OS. What RTOS means is that when a certain event occurs, the design of the OS can ensure that the OS responds within a defined period of time, providing that the software developer sets up the system properly (e.g., task priorities). In addition, typically an RTOS provides a very fast context switching time.

Unlike Cortex-A processors, Cortex-M processors cannot run a full-feature Linux system because there is no virtual address support. In Cortex-A processors, a Memory Management Unit (MMU) is available for remapping logical addresses to physical addresses, which is required for Linux operations. Cortex-M processors have a Memory Protection Unit (MPU), which does not handle address remapping. However, some operations related to MMU features can result in significant latency and therefore most systems running Linux do not guarantee a system response time. In Cortex-M processor systems, the interrupt latency is low and the MPU operations do not introduce additional delay, which makes Cortex-M processors ideal for many real-time applications.

20.1.3 Why Use an Embedded OS?

When the complexity of applications increase, the application code has to handle more and more tasks in parallel and it is more and more difficult to ensure such applications run smoothly without an embedded OS. An embedded OS divides the available CPU processing time into a number of time slots and allocates different tasks to the time slots. Since the switching of tasks can occur 100 times or more per second, it appears to the application that the tasks are running simultaneously.

Many embedded applications do not require an OS. For example, if the applications do not have to handle many tasks in parallel, or if the additional tasks are relatively short so they can be processed inside interrupt handlers, the use of an embedded OS is not required. For simple applications, use of an OS could result in unnecessary overhead. For example, extra program size and RAM size are required by the OS, and the OS itself also requires a small amount of processing time. On the other hand, if an application has a number of parallel tasks and requires a fast response time for each task switch, then using an embedded OS can be very important.

An embedded OS also requires hardware resources. For example, most embedded OSs require a timer to generate an interrupt so that the OS can perform task scheduling and system management. On the Cortex-M processors, the SysTick timer is designed for this purpose and is supported by many RTOS. An embedded OS might also utilize various OS features on the Cortex-M processors such as separate stack pointers for kernel and threads, SVC and PendSV.

20.1.4 Role of CMSIS-RTOS

CMSIS-RTOS is one of the projects inside the Cortex Microcontroller Software Interface Standard development. CMSIS-RTOS is an API specification that enables middleware to be designed that works with multiple RTOS products. The CMSIS-RTOS itself is not a product but companies can build an RTOS that is based on CMSIS-RTOS APIs, or add a wrapper layer on top of their own OS APIs to do the same things.

Many middleware products are quite complex; many of them might need to utilize task scheduling features in OS to work. For example, a TCP/IP stack might run as a task inside a multitasking system and might need to spawn out additional child tasks when certain service requests are received. Traditionally, middleware includes an OS emulation layer (Figure 20.1) that a software integrator needs to port when using a different OS.

Figure 20.1. The need for OS emulation layer for middleware components.

The porting of the OS emulation layer creates additional work for software developers, or sometimes the middleware vendors, and can increase project risks because the porting might not be straightforward.

CMSIS-RTOS was created to solve this issue. It can be implemented as an additional set of API or a wrapper for existing OS APIs. Since the API is standardized, middleware can be developed based on this API and the product should, in theory, be able to work with any embedded OS that supports CMSIS-RTOS (Figure 20.2).

Figure 20.2. CMSIS-RTOS avoids the need for OS emulation layer for each middleware component.

The RTOS products can still have their own native API interface and application code can still use those directly for additional features or for higher performance. This is good news for application developers because it saves a lot of time in porting middleware and reduces project risks. It is also good news for middleware vendors because it allows their products to work with more OSs.

The CMSIS-RTOS also benefits RTOS vendors: As the amount of middleware that works with CMSIS-RTOS increases, having CMSIS-RTOS support in an embedded OS enables the OS product to work with more middleware. Also, as software in embedded systems increases in complexity and time-to-market becomes more important, the porting of OS emulation layers for middleware is no longer feasible for some projects because of the extra time needed and the associated project risk. CMSIS-RTOS enables RTOS products to reach these markets, which previously could only be covered by a few software platform solutions.

20.1.5 About the Keil® RTX Kernel

There are a number of embedded OSs available for the Cortex-M processors. As an example we will look at the Keil RTX. The OS APIs in RTX are based on the CMSIS-RTOS API. Therefore, applications that are based on RTX can also be used in other RTOS environment provided the RTOS supports CMSIS-RTOS APIs.

The Keil RTX Real-Time Kernel is a royalty-free RTOS targeted for microcontroller applications. The CMSIS package can be downloaded from www.arm.com/CMSIS. The RTX libraries and source files are included in the CMSIS-PACK package. So when the CMSIS pack for Keil MDK is installed, the RTX is also included.

The RTX in the CMSIS package includes source code and precompiled libraries for ARM tool chains, gcc, and IAR EWARM. The precompiled libraries support little endian as well as big endian (Table 20.1).

Table 20.1. Precompiled libraries for RTX kernel in CMSIS-CORE version 4.2

Processor	Endian	ARM tool chains (Keil® MDK/ARM DS-5)	gcc	IAR EWARM
Cortex®-M0/Cortex-M0+	Little Endian	RTX_CM0.lib	libRTX_CM0.a	RTX_CM0.a
Big Endian	RTX_CM0_B.lib	libRTX_CM0_B.a	RTX_CM0_B.a
Cortex-M3	Little Endian	RTX_CM3.lib	libRTX_CM3.a	RTX_CM3.a
Big Endian	RTX_CM3_B.lib	libRTX_CM3_B.a	RTX_CM3_B.a
Cortex-M4	Little Endian	RTX_CM4.lib	libRTX_CM4.a	RTX_CM4.a
Big Endian	RTX_CM4_B.lib	libRTX_CM4_B.a	RTX_CM4_B.a

Since May 2012, the RTX Kernel has become open sourced. This means you can freely use and redistribute the RTX kernel source code under the conditions described in the license document in the CMSIS installation.

The RTX kernel is supported on all Cortex-M processors in addition to traditional ARM processors such as ARM7 and ARM9. It has the following features:

•

Flexible scheduler: supports pre-emptive, round-robin, and collaborative scheduling schemes

•

Supports mailboxes, events (up to 16 per thread), semaphores, mutex, and timers

•

Unlimited number of defined threads, with maximum of 250 active threads at a time

•

Up to 254 thread priority levels

•

Support for multithreading and thread-safe operations

•

Kernel aware debug support in Keil MDK

•

Fast context switching time

•

Small memory footprint (less than 4 KB for Cortex-M version, less than 5 KB for ARM7/9)

In addition, the Cortex-M version of RTX kernel has the following features:

•

SysTick timer support

•

No interrupt lock out in Cortex-M versions (the OS do not need to disable the interrupts for any OS operations)

ARM also has a range of middleware (part of the Keil MDK Professional) including file system, USB host and device library, TCP/IP networking suite, CAN interface library, and GUI library. These middleware are designed to work seamlessly with the RTX kernel. The RTX kernel can also work with third-parties software products such as communication protocol stacks, data processing codecs, and other middleware.

20.1.6 Setting Up a Simple RTX Example with Keil MDK

The following examples are based on the Keil MDK-ARM development suite 5.12 and CMSIS-RTOS RTX, using the Freescale Freedom FRDM-KL25Z board.

In the first example, we will look at a minimal setup with two threads: main() and a blinky thread. The threads each toggle an LED on the development board. To set up the first project, we use the precompiled version of CMSIS-RTOS RTX (library file RTX_CM0.lib) to simplify the compilation. When creating a new project, the Keil RTX is selected as shown in Figure 20.3.

Figure 20.3. Add Keil® RTX in the project in the Manage Run-Time Environment dialog.

After including the Keil RTX software component in the project, we would see a project hierarchy as shown in Figure 20.4. The Keil RTX option added the following files to the project.

Figure 20.4. Keil RTX software component option adds additional files to the project.

•

RTX_CM0.lib (the precompiled version of the Keil RTX)

•

RTX_Conf_CM.c (a configuration file for various settings in the RTX kernel)

The main application file “blinky.c” is very simple. The LED control functions are moved to a separate file “led_funcs.c”.

Blinky.c with RTX—two threads running in parallel toggling the Red and Green LED on board

#include <MKL25Z4.H>

#include "cmsis_os.h" // Include header file for RTX CMSIS-RTOS

// System runs at 48MHz

// LED #0, #1 are port B, LED #2 is port D

extern void LED_Config(void);

extern void LED_Set(void);

extern void LED_Clear(void);

extern __INLINE void LED_On(uint32_t led);

extern __INLINE void LED_Off(uint32_t led);

/∗ Thread IDs ∗/

osThreadId t_blinky; // Declare a thread ID for blinky

/∗ Function declaration ∗/

void blinky(void const ∗argument); // Thread

// --------------------------------------------------------

// Blinky

void blinky(void const ∗argument) {

while(1) {

LED_On(1); // Green LED on

osDelay(500); // delay 500 msec

LED_Off(1); // Green LED off

osDelay(500); // delay 500 msec

} // end while

} // end of blinky

// define blinky as thread function

osThreadDef(blinky, osPriorityNormal, 1, 0);

// --------------------------------------------------------

int main(void)

{

SystemCoreClockUpdate();

// Configure LED outputs

LED_Config();

// Create a task "blinky" and assign thread ID to t_blinky

t_blinky = osThreadCreate(osThread(blinky), NULL);

while(1){

LED_On(0); // Red LED on

osDelay(200); // delay 200 msec

LED_Off(0); // Red LED off

osDelay(200); // delay 200 msec

};

}

The blinky program has the following two threads:

•

main()—start second thread blinky and toggling the red LED

•

blinky()—toggling the green LED

Before we start to compile the program, we need to edit a few settings:

•

Clock frequency configuration in system_MKL25Z4.c—set the CLOCK_SETUP macro to 1 so that the processor runs at 48 MHz. This is optional, but if the clock frequency of the processor is different, you should update the clock frequency setting in the RTX as well.

•

RTX kernel configuration in RTX_Conf_CM.c—This file contains various settings regarding RTX operations, see below.

•

Project's debug setting—select CMSIS-DAP and select Serial Wire debug protocol.

For RTX_Conf_CM.c, you could edit the C file directly in the Text Editor in the μVision IDE. But to make things easier you can edit the settings by clicking on the Configuration Wizard tab and edit the settings using the GUI, as shown in Figure 20.5.

Figure 20.5. RTX Configuration settings display using Configuration Wizard.

As the system clock frequency is set to 48 MHz, and then we edited the RTOS Kernel Timer input clock frequency to be 48 MHz as well.

After the configuration steps are done, we can then compile the project, download the application to the board and test it. If everything is set up properly, you should see that the LEDs on the microcontroller board start flashing with red and green colors at different speed.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128032770000205

Architecture

Joseph Yiu, in The Definitive Guide to the ARM Cortex-M0, 2011

R13, Stack Pointer (SP)

R13 is the stack pointer. It is used for accessing the stack memory via PUSH and POP operations. There are physically two different stack pointers in Cortex-M0. The main stack pointer (MSP, or SP_main in ARM documentation) is the default stack pointer after reset, and it is used when running exception handlers. The process stack pointer (PSP, or SP_process in ARM documentation) can only be used in Thread mode (when not handling exceptions). The stack pointer selection is determined by the CONTROL register, one of the special registers that will be introduced later.

When using ARM development tools, you can access the stack pointer using either “R13” or “SP.” Both uppercase and lowercase (e.g., “r13” or “sp”) can be used. Only one of the stack pointers is visible at a given time. However, you can access to the MSP or PSP directly when using the special register access instructions MRS and MSR. In such cases, the register names “MSP” or “PSP” should be used.

The lowest two bits of the stack pointers are always zero, and writes to these two bits are ignored. In ARM processors, PUSH and POP are always 32-bit accesses because the registers are 32-bit, and the transfers in stack operations must be aligned to a 32-bit word boundary. The initial value of MSP is loaded from the first 32-bit word of the vector table from the program memory during the startup sequence. The initial value of PSP is undefined.

It is not necessary to use the PSP. In many applications, the system can completely rely on the MSP. The PSP is normally used in designs with an OS, where the stack memory for OS Kernel and the thread level application code must be separated.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123854773100035

Operating System Support Features

Joseph Yiu, in The Definitive Guide to Arm® Cortex®-M0 and Cortex-M0+ Processors (Second Edition), 2015

10.4 Process Stack and PSP

The Cortex®-M0 and Cortex-M0+ processors (also applicable to Cortex-M3/M4/M7) have two Stack Pointers (SPs):

•

the MSP—use at start-up and in exception handlers, including OS operations.

•

the PSP—typically use by application tasks in a multitasking system

Both of them are 32-bit registers and can be referenced as R13, but only one is used at one time, depending on the value in the CONTROL special register and the current mode (Handler or Thread). The MSP is the default SP and initialized at reset by loading the value from the first word of the memory. For simple applications, we can use MSP all the time. In this case, we only have one stack region.

For system with an embedded OS, or in systems that required high reliability and therefore require separation of stacks for different parts of the software, we can define multiple stack regions (Figure 10.6): one for the OS kernel and exceptions and the others for different tasks.

Figure 10.6. Separate memory ranges for OS and application tasks.

Overall, the reasons for separating the SPs and use PSP for application tasks/threads included thefollowing:

•

To enable easier context switching,

•

Enhance reliability (in this arrangement stack corruption in an application task is less likely to affect stack use by OS kernel),

•

To reduce the overall stack size required (stack regions for application tasks do not need to support the stack usage by exception handlers).

During context switching, the SP for the exiting application task in the PSP will have to be saved and the PSP will then change to the SP location for the next task.

Very often the OS kernel code requires a stack to operate, and the context switching requires switching of SP. As a result, having two SPs and separating the kernel stack from others makes it easier for OS operations, because it avoids SP updates from affecting OS kernel data accesses.

The separation of stack memory for different tasks and OS kernel reduces the chance of a stack error. Although a rogue task can corrupt data in the RAM (e.g., stack overflow), an embedded OS can check the SP value during context switching to detect stack errors. An OS can also include MPU support to limit stack usage of each task. As a result it can help to improve the reliability of an embedded system.

In a system with an embedded OS, the OS kernel has to keep track of the SP values for each task during context switching, and switch over the PSP value to allow each task to have their own stack, as shown in Figure 10.7.

Figure 10.7. MSP and PSP activities with simple OS running three tasks.

As covered in Chapter 4, the selection of the pointer is determined by the current mode of the Cortex-M processor and the value of the CONTROL register. When the processor comes out of reset, it is in thread mode, the CONTROL register's value is 0, and the MSP is selected as the default SP.

From the default state, the current SP selection can be changed to use PSP by programming the CONTROL register. Note that an Instruction Synchronization Barrier (ISB) instruction should be used (an architectural recommendation) after programming the CONTROL register bit 1 to 1. You can also switch back to use MSP by clearing bit 1 of the CONTROL register, providing that the processor is still in privileged state.

Figure 10.8 describes the stack pointer switching flows in exception entry and exit sequences. If an exception occurs, the processor will enter handler mode and the MSP will be selected. The stacking process that pushes R0–R3, R12, LR, PC, and xPSR can be carried out using either MSP or PSP, depending on the value of CONTROL register before the exception, as explained in Chapter 8.

Figure 10.8. Switching of stack pointer selection by software or exception entry/exit.

When an exception handler is completed, the PC is loaded with the EXC_RETURN value. Depending on the value of lowest 4 bits of the EXC_RETURN, the processor can return to Thread mode with MSP selected, Thread mode with PSP selected, or Handler mode with MSP selected. The value of the CONTROL register is updated to match bit 2 of the EXC_RETURN value.

The value of MSP and PSP can be accessed using the MRS and MSR instructions. In general, changing the value of the currently selected SP in C language is a bad idea because access to local variables and function parameters can be dependent on the SP value. If it is changed, the values of these variables cannot be accessed.

If you are using CMSIS compliant device driver libraries, you can access the value of the MSP and PSP with the following functions (Table 10.7):

Table 10.7. CMSIS-CORE functions for accessing MSP and PSP

Functions	Description
Uint32_t __get_MSP(void)	Read the current value of the Main Stack Pointer
void __set_MSP(uint32_t topOfMainStack)	Set the value of the Main Stack Pointer
uint32_t __get_PSP(void)	Read the current value of the Process Stack Pointer
void __set_PSP(uint32_t topOfProcStack)	Set the value of the Process Stack Pointer

To implement the context switching sequence as in Figure 10.7, the following procedures can be used. Please note that there are various different ways to implement an embedded OS, the following illustration is only an example.

First, we need to be able to switch from thread into OS code running in handler mode. Typically this can be carried out with an SVC instruction, which is cover in the next section (Section 10.5). Then we need to set up a stack frame in the memory, and use this stack frame in an exception return mechanism to jump to the starting point of the first thread (task A). The sequence is illustrated in Figure 10.9.

Figure 10.9. Initialization of a task in a simple OS by creating a stack frame and then switch to it using exception return.

We also need to have the code to handle context switching. When an application task is interrupted by an exception, the registers R0–R3, R12 are already saved. We need to add code to save R4–R11 to the stack, and then save the current value of the PSP so that we can resume the task later. The operation is illustrated in Figure 10.10.

Figure 10.10. Example context switching from one task to another in a simple OS.

Section 10.7 of this chapter shows example codes to create a simple multi-tasking system.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128032770000102