Copyright © 2001, 2002 by IBM Corporation
Copyright © 2002 by Free Standards Group
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".
Linux is a trademark of Linus Torvalds.
LSB is a trademark of the Free Standards Group in the USA and other countries.
This v1.02 edition, published on 18 November 2002, applies to version 2, release 2, modification 16 of the Linux kernel and to all subsequent releases and modifications until otherwise indicated in new editions. This edition replaces LNUX-1007-02 published in July 2001.
This section describes the processor-specific information for the zSeries processors.
[z/Architecture Principles of Operation] (SA22–7832) defines the zSeries architecture.
Programs intended to execute directly on the processor use the zSeries instruction set, and the instruction encoding and semantics of the architecture.
An application program can assume that all instructions defined by the architecture that are neither privileged nor optional exist and work as documented.
To be ABI-conforming the processor must implement the instructions of the architecture, perform the specified operations, and produce the expected results. The ABI neither places performance constraints on systems nor specifies what instructions must be implemented in hardware. A software emulation of the architecture could conform to the ABI.
In z/Architecture a processor runs in big-endian mode. (See the Section called Byte ordering.)
The architecture defines an 8-bit byte, a 16-bit halfword, a 32-bit word, a 64-bit doubleword and a 128-bit quadword. Byte ordering defines how the bytes that make up halfwords, words, doublewords and quadwords are ordered in memory. Most significant byte (MSB) ordering, or "Big-Endian" as it is sometimes called, means that the most significant byte of a structure is located in the lowest addressed byte position in a storage unit (byte 0).
Figure 1 to Figure 4 illustrate the conventions for bit and byte numbering within storage units of various widths. These conventions apply to both integer data and floating-point data, where the most significant byte of a floating-point value holds the sign and the exponent (or at least the start of the exponent). The figures show big-endian byte numbers in the upper left corners and bit numbers in the lower corners.
Table 1 shows how ANSI C scalar types correspond to those of the zSeries processor. For all types a NULL pointer has the value zero (binary).
Table 1. Scalar types
Type | ANSI C | sizeof (bytes) | Alignment | type (zSeries) |
|---|---|---|---|---|
Character | 1 | 1 | byte | |
Short | 2 | 2 | halfword | |
Integer | 4 | 4 | word | |
Long Long long | 8 | 8 | doubleword | |
Pointer | 8 | 8 | unsigned doubleword | |
Floating point | 4 | 4 | single precision (IEEE) | |
8 | 8 | double precision (IEEE) | ||
16 | 16 | extended precision (IEEE) | ||
¹Compilers and systems may implement the long double data type in some other way, for performance reasons, using a compiler option. Examples of such formats could be two successive doubles or even a single double. Such usage does not conform to this ABI however, and runs the risk of passing a wrongly formatted floating-point number to another function as an argument. Programs using other formats should transform long double floating-point numbers to a conforming format before passing them. | ||||
Aggregates (structures and arrays) and unions assume the alignment of their most strictly aligned component, that is, the component with the largest alignment. The size of any object, including aggregates and unions, is always a multiple of the alignment of the object. An array uses the same alignment as its elements. Structure and union objects may require padding to meet size and alignment constraints:
An entire structure or union object is aligned on the same boundary as its most strictly aligned member.
Each member is assigned to the lowest available offset with the appropriate alignment. This may require internal padding, depending on the previous member.
If necessary, a structure's size is increased to make it a multiple of the structure's alignment. This may require tail padding if the last member does not end on the appropriate boundary.
In the following examples (Figure 5 to Figure 9), member byte offsets (for the big-endian implementation) appear in the upper left corners.
C struct and union definitions may have "bit-fields," defining integral objects with a specified number of bits (see Table 7).
"Plain" bit-fields (that is, those neither signed nor unsigned) always have non-negative values. Although they may have type short, int or long (which can have negative values), bit-fields of these types have the same range as bit-fields of the same size with the corresponding unsigned type. Bit-fields obey the same size and alignment rules as other structure and union members, with the following additions:
Bit-fields are allocated from left to right (most to least significant).
A bit-field must entirely reside in a storage unit appropriate for its declared type. Thus, a bit-field never crosses its unit boundary.
Bit-fields must share a storage unit with other structure and union members (either bit-field or non-bit-field) if and only if there is sufficient space within the storage unit.
Unnamed bit-fields' types do not affect the alignment of a structure or union, although an individual bit-field's member offsets obey the alignment constraints. An unnamed, zero-width bit-field shall prevent any further member, bit-field or other, from residing in the storage unit corresponding to the type of the zero-width bit-field.
The following examples (Figure 10 through Figure 15) show structure and union member byte offsets in the upper left corners. Bit numbers appear in the lower corners.
This section discusses the standard function calling sequence, including stack frame layout, register usage, and parameter passing.
The ABI makes the assumption that the processor has 16 general purpose registers and 16 IEEE floating point registers. zSeries processors have these registers; each register is 64 bits wide. The use of the registers is described in the table below.
Table 8.
Register name | Usage | Call effect |
|---|---|---|
General purpose | Volatile¹ | |
Parameter passing and return values | Volatile | |
Parameter passing | Volatile | |
Parameter passing | Saved² | |
Local variables | Saved | |
Local variable, commonly used as GOT pointer | Saved | |
Local variable, commonly used as Literal Pool pointer | Saved | |
Return address | Volatile | |
Stack pointer | Saved | |
Parameter passing and return values | Volatile | |
General purpose | Volatile | |
General purpose | Saved | |
Access registers 0, 1 | Reserved for system use | Volatile |
Access registers 2-15 | General purpose | Volatile |
¹Volatile: These registers are not preserved across function calls. ²Saved: These registers belong to the calling function. A called function shall save these registers' values before it changes them, restoring their values before it returns. | ||
Registers r6 through r13, r15, f1, f3, f5 and f7 are nonvolatile; that is, they "belong" to the calling function. A called function shall save these registers' values before it changes them, restoring their values before it returns.
Registers r0, r1, r2, r3, r4, r5, r14, f0, f2, f4, f6, f8 through f15 are volatile; that is, they are not preserved across function calls.
Furthermore the values in registers r0 and r1 may be altered by the interface code in cross-module calls, so a function cannot depend on the values in these registers having the same values that were placed in them by the caller.
The following registers have assigned roles in the standard calling sequence:
Table 9.
r12 | Global Offset Table pointer. If a position-independent module uses cross-linking the compiler must point r12 to the GOT as described in the Section called Dynamic Linking in the chapter called Program loading and dynamic linking. If not this register may be used locally. |
r13 | Commonly used as the Literal Pool pointer. If the Literal Pool is not required this register may be used locally. |
r14 | This register will contain the address to which a called function will normally return. r14 is volatile across function calls. |
r15 | The stack pointer (stored in r15) will maintain an 8-byte alignment. It will always point to the lowest allocated valid stack frame, and will grow towards low addresses. The contents of the word addressed by this register may point to the previously allocated stack frame. If required it can be decremented by the called function – see the Section called Dynamic stack space allocation. |
Signals can interrupt processes. Functions called during signal handling have no unusual restrictions on their use of registers. Moreover, if a signal handling function returns, the process will resume its original execution path with all registers restored to their original values. Thus programs and compilers may freely use all registers listed above, except those reserved for system use, without the danger of signal handlers inadvertently changing their values.
With these calling conventions the following usage of the registers for inline assemblies is recommended:
General registers r0 and r1 should be used internally whenever possible
General registers r2 to r5 should be second choice
General registers r12 to r15 should only be used for their standard function.
A function will be passed a frame on the runtime stack by the function which called it, and may allocate a new stack frame. A new stack frame is required if the called function will in turn call further functions (which must be passed the address of the new frame). This stack grows downwards from high addresses. Figure 16 shows the stack frame organization. SP in the figure denotes the stack pointer (general purpose register r15) passed to the called function on entry. Maintenance of the back chain pointers is not a requirement of the ABI, but the storage area for these pointers must be allocated whether used or not.
The format of the register save area created by the gcc compiler is:
The following requirements apply to the stack frame:
The stack pointer shall maintain 8-byte alignment.
The stack pointer points to the first word of the lowest allocated stack frame. If the "back chain" is implemented this word will point to the previously allocated stack frame (towards higher addresses), except for the first stack frame, which shall have a back chain of zero (NULL). The stack shall grow downwards, in other words towards lower addresses.
The called function may create a new stack frame by decrementing the stack pointer by the size of the new frame. This is required if this function calls further functions. The stack pointer must be restored prior to return.
The parameter list area shall be allocated by the caller and shall be large enough to contain the arguments that the caller stores in it. Its contents are not preserved across calls.
Other areas depend on the compiler and the code being compiled. The standard calling sequence does not define a maximum stack frame size.
The stack space for the register save area and back chain must be allocated by the caller. The size of these is 160 bytes.
Except for the stack frame header and any padding necessary to make the entire frame a multiple of 8 bytes in length, a function need not allocate space for the areas that it does not use. If a function does not call any other functions and does not require any of the other parts of the stack frame, it need not establish a stack frame. Any padding of the frame as a whole shall be within the local variable area; the parameter list area shall immediately follow the stack frame header, and the register save areas shall contain no padding.
Arguments to called functions are passed in registers. Since all computations must be performed in registers, memory traffic can be eliminated if the caller can compute arguments into registers and pass them in the same registers to the called function, where the called function can then use these arguments for further computation in the same registers. The number of registers implemented in a processor architecture naturally limits the number of arguments that can be passed in this manner.
For Linux for zSeries, the following applies:
General registers r2 to r6 are used for integer values.
Floating point registers f0, f2, f4 and f6 are used for floating point values.
Beside these general rules the following rules apply:
char, short, int, long and long long are passed in general registers.
Structures equivalent to a floating point type are passed in floating point registers. A structure is equivalent to a floating point type if and only if it has exactly one member, which is either of floating point type of itself a structure equivalent to a floating point type.
Structures with a size of 1, 2, 4, or 8 bytes which are not equivalent to a floating point type are passed as integral values.
All other structures are passed by reference. If needed, the called function makes a copy of the value.
Complex numbers are passed as structures.
The following algorithm specifies where argument data is passed for the C language. For this purpose, consider the arguments as ordered from left (first argument) to right, although the order of evaluation of the arguments is unspecified. In this algorithm fr contains the number of the next available floating-point register, gr contains the number of the next available general purpose register, and starg is the address of the next available stack argument word.
Set fr=0, gr=2, and starg to the address of parameter word 1.
If there are no more arguments, terminate. Otherwise, select one of the following depending on the type of the next argument:
A DOUBLE_OR_FLOAT is one of the following:
A single length floating point type,
A double length floating point type.
A structure equivalent to a floating point type.
A SIMPLE_ARG is one of the following:
One of the simple integer types no more than 64 bits wide (char, short, int, long, long long, enum).
A pointer to an object of any type.
A struct or a union of 1, 2, 4 or 8 bytes which is not a structure equivalent to a floating point type.
A struct or union of another size, or a long double, any of which shall be passed as a pointer to the object, or to a copy of the object where necessary to enforce call-by-value semantics. Only if the caller can ascertain that the object is "constant" can it pass a pointer to the object itself.
If gr>6, go to OTHER. Otherwise load the argument value into general register gr, set gr to gr+1, and go to SCAN. Values shorter than 64 bits are sign- or zero-extended (as appropriate) to 64 bits.
Arguments not otherwise handled above are passed in the parameter words of the caller's stack frame. SIMPLE_ARGs, as defined above, are considered to have size of 8 bytes, where simple integer types shorter than 8 bytes are signed or zero-extended (as appropriate) to 8 bytes, and other arguments of size less than 8 bytes will be placed right-justified into a 8 byte slot. float and double arguments are considered to have a size of 8 bytes, where float arguments will be placed right-justified into an 8 byte slot.
The contents of registers and words which are skipped by the above algorithm for alignment purposes (padding) are undefined.
As an example, assume the declarations and the function call shown in Figure 19. The corresponding register allocation and storage would be as shown in Table 10.
Some otherwise portable C programs depend on the argument passing scheme, implicitly assuming that 1) all arguments are passed on the stack, and 2) arguments appear in increasing order on the stack. Programs that make these assumptions have never been portable, but they have worked on many implementations. However, they do not work on z/Architecture because some arguments are passed in registers. Portable C programs use the header files <stdarg.h> or <varargs.h> to deal with variable argument lists on zSeries and other machines as well.
In general, arguments are returned in registers, as described in Table 11.
Table 11. Registers for return values
Type | Returned in register: |
|---|---|
char, short, int, long and long long | general register 2 (r2) |
double and float | floating point register 0 (f0) |
Functions shall return float or double values in f0, with float values rounded to single precision. Functions shall return values of type int, long, long long, enum, short and char, or a pointer to any type as unsigned or signed integers as appropriate, zero- or sign-extended to 64 bits if necessary, in r2.
Values of type long double and structures or unions are returned in a storage buffer allocated by the caller.
Processes execute in a 64-bit virtual address space. Memory management translates virtual addresses to physical addresses, hiding physical addressing and letting a process run anywhere in the system's real memory. Processes typically begin with three logical segments, commonly called "text", "data" and "stack". An object file may contain more segments (for example, for debugger use), and a process can also create additional segments for itself with system services.
![]() | The term "virtual address" as used in this document refers to a 64-bit address generated by a program, as contrasted with the physical address to which it is mapped. |
Memory is organized into pages, which are the system's smallest units of memory allocation. The hardware page size for z/Architecture is 4096 bytes.
Processes have a 42, 53 or 64-bit address space available to them, depending on the Linux kernel level.
Figure 20 shows the virtual address configuration on the zSeries architecture. The segments with different properties are typically grouped in different areas of the address space. The loadable segments may begin at zero (0); the exact addresses depend on the executable file format (see the chapter called Object files and the chapter called Program loading and dynamic linking). The process' stack resides at the end of the virtual memory and grows downwards. Processes can control the amount of virtual memory allotted for stack space, as described below.
![]() | Although application programs may begin at virtual address 0, they conventionally begin above 0x1000 (4 Kbytes), leaving the initial 4 Kbytes with an invalid address mapping. Processes that reference this invalid memory (for example by de-referencing a null pointer) generate an translation exception as described in the Section called Exception interface. |
Although applications may control their memory assignments, the typical arrangement follows the diagram above. When applications let the system choose addresses for dynamic segments (including shared object segments), the system will prefer addresses in the upper half of the address space (for a 42–bit address space this means addresses above 1 TByte).
The section the Section called Process initialization describes the initial stack contents. Stack addresses can change from one system to the next – even from one process execution to the next on a single system. A program, therefore, should not depend on finding its stack at a particular virtual address.
A tunable configuration parameter controls the system maximum stack size. A process can also use setrlimit to set its own maximum stack size, up to the system limit. The stack segment is both readable and writable.
Operating system facilities, such as mmap, allow a process to establish address mappings in two ways. Firstly, the program can let the system choose an address. Secondly, the program can request the system to use an address the program supplies. The second alternative can cause application portability problems because the requested address might not always be available. Differences in virtual address space can be particularly troublesome between different architectures, but the same problems can arise within a single architecture.
Processes' address spaces typically have three segments that can change size from one execution to the next: the stack (through setrlimit); the data segment (through malloc); and the dynamic segment area (through mmap). Changes in one area may affect the virtual addresses available for another. Consequently an address that is available in one process execution might not be available in the next. Thus a program that used mmap to request a mapping at a specific address could appear to work in some environments and fail in others. For this reason programs that want to establish a mapping in their address space should let the system choose the address.
Despite these warnings about requesting specific addresses the facility can be used properly. For example, a multiprocess application might map several files into the address space of each process and build relative pointers among the files' data. This could be done by having each process ask for a certain amount of memory at an address chosen by the system. After each process receives its own private address from the system it would map the desired files into memory at specific addresses within the original area. This collection of mappings could be at different addresses in each process but their relative positions would be fixed. Without the ability to ask for specific addresses, the application could not build shared data structures because the relative positions for files in each process would be unpredictable.
Two execution modes exist in z/Architecture: problem (user) state and supervisor state. Processes run in problem state (the less privileged). The operating system kernel runs in supervisor state. A program executes an supervisor call (svc) instruction to change execution modes.
Note that the ABI does not define the implementation of individual system calls. Instead programs shall use the system libraries. Programs with embedded system call or trap instructions do not conform to the ABI.
The z/Architecture exception mechanism allows the processor to change to supervisor state as a result of six different causes: system calls, I/O interrupts, external interrupts, machine checks, restart interruptions or program checks (unusual conditions arising in the execution of instructions).
When exceptions occur:
information (such as the address of the next instruction to be executed after control is returned to the original program) is saved,
program control passes from user to supervisor level, and
software continues execution at an address (the exception vector) predetermined for each exception.
Exceptions may be synchronous or asynchronous. Synchronous exceptions, being caused by instruction execution, can be explicitly generated by a process. The operating system handles an exception either by completing the faulting operation in a manner transparent to the application or by delivering a signal to the application. The correspondence between exceptions and signals is shown in Table 12.
Table 12. Exceptions and Signals
Exception Name | Signal | Examples |
Illegal instruction | SIGILL | Illegal or privileged instruction, Invalid instruction form, Optional, unimplemented instruction |
Storage access | SIGSEGV | Unmapped instruction or data location access, Storage protection violation |
Alignment | SIGBUS | Invalid data item alignment, Invalid memory access |
Breakpoint | SIGTRAP | Breakpoint program check |
Floating exception | SIGFPE | Floating point overflow or underflow, Floating point divide by zero, Floating point conversion overflow, Other enabled floating point exceptions |
The signals that an exception may give rise to are SIGILL, SIGSEGV, SIGBUS, SIGTRAP, and SIGFPE. If one of these signals is generated due to an exception when the signal is blocked, the behavior is undefined.
This section describes the machine state that exec creates for "infant" processes, including argument passing, register usage, and stack frame layout. Programming language systems use this initial program state to establish a standard environment for their application programs. For example, a C program begins executing at a function named main, conventionally declared in the way described in Figure 21:
Briefly, argc is a non-negative argument count; argv is an array of argument strings, with argv[argc] == 0, and envp is an array of environment strings, also terminated by a NULL pointer.
Although this section does not describe C program initialization, it gives the information necessary to implement the call to main or to the entry point for a program in any other language.
When a process is first entered (from an exec system call), the contents of registers other than those listed below are unspecified. Consequently, a program that requires registers to have specific values must set them explicitly during process initialization. It should not rely on the operating system to set all registers to 0. Following are the registers whose contents are specified:
Table 13.
r15 | The initial stack pointer, aligned to a 8-byte boundary and pointing to a stack location that contains the argument count (see the Section called Process stack for further information about the initial stack layout) |
fpc | The floating point control register contains 0, specifying "round to nearest" mode and the disabling of floating-point exceptions |
Every process has a stack, but the system defines no fixed stack address. Furthermore, a program's stack address can change from one system to another – even from one process invocation to another. Thus the process initialization code must use the stack address in general purpose register r15. Data in the stack segment at addresses below the stack pointer contain undefined values.
Whereas the argument and environment vectors transmit information from one application program to another, the auxiliary vector conveys information from the operating system to the program. This vector is an array of structures, which are defined in Figure 22.
The structures are interpreted according to the a_type member, as shown in Table 14.
Table 14. Auxiliary Vector Types, a_type
Name | Value | a_un |
AT_NULL | 0 | ignored |
AT_IGNORE | 1 | ignored |
AT_EXECFD | 2 | a_val |
AT_PHDR | 3 | a_ptr |
AT_PHENT | 4 | a_val |
AT_PHNUM | 5 | a_val |
AT_PAGESZ | 6 | a_val |
AT_BASE | 7 | a_ptr |
AT_FLAGS | 8 | a_val |
AT_ENTRY | 9 | a_ptr |
AT_NOTELF | 10 | a_val |
AT_UID | 11 | a_val |
AT_EUID | 12 | a_val |
AT_GID | 13 | a_val |
AT_EGID | 14 | a_val |
a_type auxiliary vector types are described in 'Auxiliary Vector Types Description' below.
Auxiliary Vector Types Description
The auxiliary vector has no fixed length; so an entry of this type is used to denote the end of the vector. The corresponding value of a_un is undefined.
This type indicates the entry has no meaning. The corresponding value of a_un is undefined.
exec may pass control to an interpreter program. When this happens, the system places either an entry of type AT_EXECFD or one of type AT_PHDR in the auxiliary vector. The a_val field in the AT_EXECFD entry contains a file descriptor for the application program's object file.
Under some conditions, the system creates the memory image of the application program before passing control to an interpreter program. When this happens, the a_ptr field of the AT_PHDR entry tells the interpreter where to find the program header table in the memory image. If the AT_PHDR entry is present, entries of types AT_PHENT, AT_PHNUM and AT_ENTRY must also be present. See the section the chapter called Program loading and dynamic linking for more information about the program header table.
The a_val field of this entry holds the size, in bytes, of one entry in the program header table at which the AT_PHDR entry points.
The a_val field of this entry holds the number of entries in the program header table at which the AT_PHDR entry points.
If present this entry's a_val field gives the system page size in bytes. The same information is also available through sysconf.
The a_ptr member of this entry holds the base address at which the interpreter program was loaded into memory.
If present, the a_val field of this entry holds 1-bit flags. Undefined bits are set to zero.
The a_ptr field of this entry holds the entry point of the application program to which the interpreter program should transfer control.
The a_val field of this entry is non-zero if the program is in another format than ELF, for example in the old COFF format.
The a_ptr field of this entry holds the real user id of the process.
The a_ptr field of this entry holds the effective user id of the process.
The a_ptr field of this entry holds the real group id of the process.
The a_ptr field of this entry holds the effective group id of the process.
Other auxiliary vector types are reserved. No flags are currently defined for AT_FLAGS on the zSeries architecture.
When a process receives control, its stack holds the arguments, environment, and auxiliary vector from exec. Argument strings, environment strings, and the auxiliary information appear in no specific order within the information block; the system makes no guarantees about their relative arrangement. The system may also leave an unspecified amount of memory between the null auxiliary vector entry and the beginning of the information block. A sample initial stack is shown in Figure 23.
This section describes example code sequences for fundamental operations such as calling functions, accessing static objects, and transferring control from one part of a program to another. Previous sections discussed how a program may use the machine or the operating system, and they specified what a program may and may not assume about the execution environment. Unlike previous material, the information in this section illustrates how operations may be done, not how they must be done.
As before, examples use the ANSI C language. Other programming languages may use the same conventions displayed below, but failure to do so does not prevent a program from conforming to the ABI. Two main object code models are available:
Instructions can hold absolute addresses under this model. To execute properly, the program must be loaded at a specific virtual address, making the program's absolute addresses coincide with the process' virtual addresses.
Instructions under this model hold relative addresses, not absolute addresses. Consequently, the code is not tied to a specific load address, allowing it to execute properly at various positions in virtual memory.
The following sections describe the differences between these models. When different, code sequences for the models appear together for easier comparison.
![]() | The examples below show code fragments with various simplifications. They are intended to explain addressing modes, not to show optimal code sequences or to reproduce compiler output. |
When the system creates a process image, the executable file portion of the process has fixed addresses and the system chooses shared object library virtual addresses to avoid conflicts with other segments in the process. To maximize text sharing, shared objects conventionally use position-independent code, in which instructions contain no absolute addresses. Shared object text segments can be loaded at various virtual addresses without having to change the segment images. Thus multiple processes can share a single shared object text segment, even if the segment resides at a different virtual address in each process.
Position-independent code relies on two techniques:
Control transfer instructions hold addresses relative to the Current Instruction Address (CIA), or use registers that hold the transfer address. A CIA-relative branch computes its destination address in terms of the CIA, not relative to any absolute address.
When the program requires an absolute address, it computes the desired value. Instead of embedding absolute addresses in instructions (in the text segment), the compiler generates code to calculate an absolute address (in a register or in the stack or data segment) during execution.
Because z/Architecture provides CIA-relative branch instructions and also branch instructions using registers that hold the transfer address, compilers can satisfy the first condition easily.
A Global Offset Table (GOT), provides information for address calculation. Position-independent object files (executable and shared object files) have a table in their data segment that holds addresses. When the system creates the memory image for an object file, the table entries are relocated to reflect the absolute virtual address as assigned for an individual process. Because data segments are private for each process, the table entries can change – unlike text segments, which multiple processes share.
Two position-independent models give programs a choice between more efficient code with some size restrictions and less efficient code without those restrictions. Because of the processor architecture, a GOT with no more than 512 entries (4096 bytes) is more efficient than a larger one. Programs that need more entries must use the larger, more general code. In the following sections, the term "small model position-independent code" is used to refer to code that assumes the smaller GOT, and "large model position-independent code" is used to refer to the general code.
This section describes the prolog and epilog code of functions . A function's prolog establishes a stack frame, if necessary, and may save any nonvolatile registers it uses. A function's epilog generally restores registers that were saved in the prolog code, restores the previous stack frame, and returns to the caller.
The prolog of a function has to save the state of the calling function and set up the base register for the code of the function body. The following is in general done by the function prolog:
Save all registers used within the function which the calling function assumes to be non-volatile.
Set up the base register for the literal pool.
Allocate stack space by decrementing the stack pointer.
Set up the dynamic chain by storing the old stack pointer value at stack location zero if the "back chain" is implemented.
Set up the GOT pointer if the compiler is generating position independent code.
(A function that is position independent will probably want to load a pointer to the GOT into a nonvolatile register. This may be omitted if the function makes no external data references. If external data references are only made within conditional code, loading the GOT pointer may be deferred until it is known to be needed.)
Set up the frame pointer if the function allocates stack space dynamically (with alloca).
The compiler tries to do as little as possible of the above; the ideal case is to do nothing at all (for a leaf function without symbolic references).
The epilog of a function restores the registers saved in the prolog (which include the stack pointer) and branches to the return address.
.section .rodata
.align 2
.LC0:
.string "hello, world\n"
.text
.align 4
.globl main
.type main,@function
main:
# Prolog
STMG 11,15,88(15) # Save callers registers
LARL 13,.LT0_0 # Load literal pool pointer
.section .rodata # Switch for literal pool
.align 2 # to read-only data section
.LT0_0:
.LC2:
.quad 65536
.LTN0_0:
.text # Back to text section
LGR 1,15 # Load stack pointer in GPR 1
AGHI 15,-160 # Allocate stack space
STG 1,0(15) # Store backchain
# Prolog end
LARL 2,.LC0
LG 3,.LC2-.LT0_0(13)
BRASL 14,printf
LGHI 2,0
# Epilog
LG 4,272(15) # Load return address
LMG 11,15,248(15) # Restore registers
BR 4 # Branch back to caller
# Epilog end
Figure 24. Prolog and epilog example
This section shows a way of providing profiling (entry counting) on zSeries systems. An ABI-conforming system is not required to provide profiling; however if it does this is one possible (not required) implementation.
If a function is to be profiled it has to call the _mcount routine after the function prolog. This routine has a special linkage. It gets an address in register 1 and returns without having changed any register. The address is a pointer to a word-aligned one-word static data area, initialized to zero, in which the _mcount routine is to maintain a count of the number of times the function is called.
For example Figure 25 shows how the code after the function prolog may look.
STMG 7,15,56(15) #
Save callers registers
LGR 1,15 # Stack pointer
AGHI 15,-160 # Allocate new
STG 1,0(15) # Save backchain
LGR 11,15 # Local stack pointer
.data
.align 4
.LP0: .quad 0 # Profile counter
.text
# Function profiler
STG 14,8(15) # Preserve r14
LARL 1,.LPO # Load address of profile counter
BRASL 14,_mcount # Branch to _mcount
LG 14,8(15) # Restore r14
Figure 25. Code for profiling
This section describes only objects with static storage duration. It excludes stack-resident objects because programs always compute their virtual addresses relative to the stack or frame pointers.
Because zSeries instructions cannot hold 64-bit addresses directly, a program has to build an address in a register and access memory through that register. In order to do so a function normally has a literal pool that holds the addresses of data objects used by the function. Register 13 is set up in the function prolog to point to the start of this literal pool.
Position-independent code cannot contain absolute addresses. In order to access a local symbol the literal pool contains the (signed) offset of the symbol relative to the start of the pool. Combining the offset loaded from the literal pool with the address in register 13 gives the absolute address of the local symbol. In the case of a global symbol the address of the symbol has to be loaded from the Global Offset Table. The offset in the GOT can either be contained in the instruction itself or in the literal pool. See Figure 26 for an example.
Figure 26 through Figure 28 show sample assembly language equivalents to C language code for absolute and position-independent compilations. It is assumed that all shared objects are compiled as position-independent and only executable modules may have absolute addresses. The code in the figures contains many redundant operations as it is only intended to show how each C statement could have been compiled independently of its context. The function prolog is not shown, and it is assumed that it has loaded the address of the literal pool in register 13.
Figure 27. Small model position-independent addressing
Programs can use the z/Architecture BRASL instruction to make direct function calls. A BRASL instruction has a self-relative branch displacement that can reach 4 GBytes in either direction. To call functions beyond this limit (inter-module calls) load the address in a register and use the BASR instruction for the call. Register 14 is used as the first operand of BASR to hold the return address as shown in Figure 29.
The called function may be in the same module (executable or shared object) as the caller, or it may be in a different module. In the former case, if the called function is not in a shared object, the linkage editor resolves the symbol. In all other cases the linkage editor cannot directly resolve the symbol. Instead the linkage editor generates "glue" code and resolves the symbol to point to the glue code. The dynamic linker will provide the real address of the function in the Global Offset Table. The glue code loads this address and branches to the function itself. See the Section called Procedure Linkage Table in the chapter called Program loading and dynamic linking for more details.
Programs use branch instructions to control their execution flow. z/Architecture has a variety of branch instructions. The most commonly used of these performs a self-relative jump with a 128-Kbyte range (up to 64 Kbytes in either direction). For large functions another self-relative jump is available with a range of 4 Gbytes (up to 2 Gbytes in either direction).
C language switch statements provide multi-way selection. When the case labels of a switch statement satisfy grouping constraints the compiler implements the selection with an address table. The following examples use several simplifying conventions to hide irrelevant details:
The selection expression resides in register 2.
The case label constants begin at zero.
The case labels, the default, and the address table use assembly names .Lcasei, .Ldef and .Ltab respectively.
The GNU C compiler, and most recent compilers, support dynamic stack space allocation via alloca.
Figure 35 shows the stack frame before and after dynamic stack allocation. The local variables area is used for storage of function data, such as local variables, whose sizes are known to the compiler. This area is allocated at function entry and does not change in size or position during the function's activation.
The parameter list area holds "overflow" arguments passed in calls to other functions. (See the OTHER label in the Section called Parameter passing.) Its size is also known to the compiler and can be allocated along with the fixed frame area at function entry. However, the standard calling sequence requires that the parameter list area begin at a fixed offset (160) from the stack pointer, so this area must move when dynamic stack allocation occurs.
Data in the parameter list area are naturally addressed at constant offsets from the stack pointer. However, in the presence of dynamic stack allocation, the offsets from the stack pointer to the data in the local variables area are not constant. To provide addressability a frame pointer is established to locate the local variables area consistently throughout the function's activation.
Dynamic stack allocation is accomplished by "opening" the stack just above the parameter list area. The following steps show the process in detail:
After a new stack frame is acquired, and before the first dynamic space allocation, a new register, the frame pointer or FP, is set to the value of the stack pointer. The frame pointer is used for references to the function's local, non-static variables. The frame pointer does not change during the execution of a function, even though the stack pointer may change as a result of dynamic allocation.
The amount of dynamic space to be allocated is rounded up to a multiple of 8 bytes, so that 8-byte stack alignment is maintained.
The stack pointer is decreased by the rounded byte count, and the address of the previous stack frame (the back chain) may be stored at the word addressed by the new stack pointer. The back chain is not necessary to restore from this allocation at the end of the function since the frame pointer can be used to restore the stack pointer.
Figure 35 is a snapshot of the stack layout after the prolog code has dynamically extended the stack frame.
The above process can be repeated as many times as desired within a single function activation. When it is time to return, the stack pointer is set to the value of the back chain, thereby removing all dynamically allocated stack space along with the rest of the stack frame. Naturally, a program must not reference the dynamically allocated stack area after it has been freed.
Even in the presence of signals, the above dynamic allocation scheme is "safe." If a signal interrupts allocation, one of three things can happen:
The signal handler can return. The process then resumes the dynamic allocation from the point of interruption.
The signal handler can execute a non-local goto or a jump. This resets the process to a new context in a previous stack frame, automatically discarding the dynamic allocation.
The process can terminate.
Regardless of when the signal arrives during dynamic allocation, the result is a consistent (though possibly dead) process.
This section defines the "Debug with Arbitrary Record Format" (DWARF) debugging format for the zSeries processor family. The zSeries ABI does not define a debug format. However, all systems that do implement DWARF shall use the following definitions.
DWARF is a specification developed for symbolic source-level debugging. The debugging information format does not favor the design of any compiler or debugger.
The DWARF definition requires some machine-specific definitions. The register number mapping is specified for the zSeries processors in Table 24.
This section describes the Executable and Linking Format (ELF).
For file identification in e_ident the zSeries processor family requires the values shown in Table 1.
Table 1. Auxiliary Vector Types Description
Position | Value | Comments |
e_ident[EI_CLASS] | ELFCLASS64 | For all 64bit implementations |
e_ident[EI_DATA] | ELFDATA64MSB | For all Big-Endian implementations |
The ELF header's e_flags field holds bit flags associated with the file. Since the zSeries processor family defines no flags, this member contains zero.
Processor identification resides in the ELF header's e_machine field and must have the value 22, defined as the name EM_S390.
Various sections hold program and control information. The sections listed in Table 2 are used by the system and have the types and attributes shown.
Table 2. Special Sections
Name | Type | Attributes |
.got | SHT_PROGBITS | SHF_ALLOC + SHF_WRITE |
.plt | SHT_PROGBITS | SHF_ALLOC + SHF_WRITE + SHF_EXECINSTR |
Special sections are described in Table 3.
Table 3. Special Sections Description
Name | Description |
.got | This section holds the Global Offset Table, or GOT. See the Section called Coding examples in the chapter called Low-level system information and the Section called Global Offset Table in the chapter called Program loading and dynamic linking for more information. |
.plt | This section holds the Procedure Linkage Table, or PLT. See the Section called Procedure Linkage Table in the chapter called Program loading and dynamic linking for more information. |
If an executable file contains a reference to a function defined in one of its associated shared objects, the symbol table section for the file will contain an entry for that symbol. The st_shndx field of that symbol table entry contains SHN_UNDEF. This informs the dynamic linker that the symbol definition for that function is not contained in the executable file itself. If that symbol has been allocated a Procedure Linkage Table entry in the executable file, and the st_value field for that symbol table entry is nonzero, the value is the virtual address of the first instruction of that PLT entry. Otherwise the st_value field contains zero. This PLT entry address is used by the dynamic linker in resolving references to the address of the function. See the Section called Function Addresses in the chapter called Program loading and dynamic linking for details.
Relocation entries describe how to alter the instruction and data relocation fields shown in Figure 1 (bit numbers appear in the lower box corners; byte numbers appear in the upper left box corners).
This specifies a 64-bit field occupying 8 bytes, the alignment of which is 4 bytes unless otherwise specified.
This specifies a 32-bit field occupying 4 bytes, the alignment of which is 4 bytes unless otherwise specified.
This specifies a 32-bit field occupying 4 bytes with 2-byte alignment. The signed value in this field is shifted to the left by 1 before it is used as a program counter relative displacement (for example, the immediate field of a "Load Address Relative Long" instruction).
This specifies a 16-bit field occupying 2 bytes with 2-byte alignment (for example, the immediate field of an "Add Halfword Immediate" instruction).
This specifies a 16-bit field occupying 2 bytes with 2-byte alignment. The signed value in this field is shifted to the left by 1 before it is used as a program counter relative displacement (for example, the immediate field of an "Branch Relative" instruction).
This specifies a 12-bit field contained within a halfword with a 2-byte alignment. The 12 bit unsigned value is the displacement of a memory reference.
This specifies a 8-bit field with a 1-byte alignment.
Calculations in Table 4 assume the actions are transforming a relocatable file into either an executable or a shared object file. Conceptually, the linkage editor merges one or more relocatable files to form the output. It first determines how to combine and locate the input files, next it updates the symbol values, and then it performs relocations.
Relocations applied to executable or shared object files are similar and accomplish the same result. The following notations are used in Table 4:
Represents the addend used to compute the value of the relocatable field.
Represents the base address at which a shared object has been loaded into memory during execution. Generally, a shared object file is built with a 0 base virtual address, but the execution address will be different.
Represents the section offset or address of the Global Offset Table. See the Section called Coding examples in the chapter called Low-level system information and the Section called Global Offset Table in the chapter called Program loading and dynamic linking for more information.
Represents the section offset or address of the Procedure Linkage Table entry for a symbol. A PLT entry redirects a function call to the proper destination. The linkage editor builds the initial PLT. See the Section called Procedure Linkage Table in the chapter called Program loading and dynamic linking for more information.
Represents the offset into the GOT at which the address of the relocation entry's symbol will reside during execution. See the Section called Coding examples in the chapter called Low-level system information and the Section called Global Offset Table in the chapter called Program loading and dynamic linking for more information.
Represents the place (section offset or address) of the storage unit being relocated (computed using r_offset).
Represents the offset of the symbol within the section in which the symbol is defined (its section-relative address).
Represents the value of the symbol whose index resides in the relocation entry.
Relocation entries apply to bytes, halfwords or words. In either case, the r_offset value designates the offset or virtual address of the first byte of the affected storage unit. The relocation type specifies which bits to change and how to calculate their values. The zSeries family uses only the Elf64_Rela relocation entries with explicit addends. For the relocation entries, the r_addend field serves as the relocation addend. In all cases, the offset, addend, and the computed result use the byte order specified in the ELF header.
The following general rules apply to the interpretation of the relocation types in Table 4:
"+" and "-" denote 64-bit modulus addition and subtraction, respectively. ">>" denotes arithmetic right-shifting (shifting with sign copying) of the value of the left operand by the number of bits given by the right operand.
For relocation type half16, the upper 48 bits of the v