S/390 ELF Application Binary Interface Supplement

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

Linux is a trademark of Linus Torvalds.

LSB is a trademark of the Free Standards Group in the USA and other countries.


Table of Contents
Preface
Low-level system information
Machine interface
Processor architecture
Data representation
Function calling sequence
Registers
The stack frame
Parameter passing
Variable argument lists
Return values
Operating system interface
Virtual address space
Page size
Virtual address assignments
Managing the process stack
Coding guidelines
Processor execution modes
Exception interface
Process initialization
Registers
Process stack
Coding examples
Code model overview
Function prolog and epilog
Profiling
Data objects
Function calls
Branching
Dynamic stack space allocation
DWARF definition
Object files
ELF Header
Machine Information
Sections
Special Sections
Symbol Table
Relocation
Program loading and dynamic linking
Program Loading
Dynamic Linking
Dynamic Section
Global Offset Table
Function Addresses
Procedure Linkage Table
GNU Free Documentation License
PREAMBLE
APPLICABILITY AND DEFINITIONS
VERBATIM COPYING
COPYING IN QUANTITY
MODIFICATIONS
COMBINING DOCUMENTS
COLLECTIONS OF DOCUMENTS
AGGREGATION WITH INDEPENDENT WORKS
TRANSLATION
TERMINATION
FUTURE REVISIONS OF THIS LICENSE
How to use this License for your documents
Notices
Programming interface information
Trademarks
Index

Preface

This v1.02 edition, published on 18 November 2002, applies to version 2, release 2, modification 16 of the Linux kernel and to all subsequent releases and modifications until otherwise indicated in new editions. This edition replaces LNUX-1007-02 published in July 2001.


Low-level system information

Machine interface

This section describes the processor-specific information for the S/390 processors.


Processor architecture

[ESA/390 Principles of Operation] (SA22–7201) defines the ESA/390 architecture.

Programs intended to execute directly on the processor use the ESA/390 instruction set, and the instruction encoding and semantics of the architecture.

An application program can assume that all instructions defined by the architecture that are neither privileged nor optional exist and work as documented.

To be ABI-conforming the processor must implement the instructions of the architecture, perform the specified operations, and produce the expected results. The ABI neither places performance constraints on systems nor specifies what instructions must be implemented in hardware. A software emulation of the architecture could conform to the ABI.

There are some instructions in the ESA/390 architecture which are described as 'optional'. Linux for S/390 requires some of these to be available; in particular:

  • additional floating point facilities,

  • compare and move extended,

  • immediate and relative instructions,

  • string instructions.

The ABI guarantees that these instructions are present. In order to comply with the ABI the operating system must emulate these instructions on machines which do not support them in the hardware. Other instructions are not available in some current models; programs using these instructions do not conform to the S/390 ABI and executing them on machines without the extra capabilities will result in undefined behavior.

In the ESA/390 architecture a processor runs in big-endian mode. (See the Section called Byte ordering.)


Data representation

Byte ordering

The architecture defines an 8-bit byte, a 16-bit halfword, a 32-bit word and a 64-bit doubleword. Byte ordering defines how the bytes that make up halfwords, words and doublewords are ordered in memory. Most significant byte (MSB) ordering, or "Big-Endian" as it is sometimes called, means that the most significant byte of a structure is located in the lowest addressed byte position in a storage unit (byte 0).

Figure 1 to Figure 3 illustrate the conventions for bit and byte numbering within storage units of various widths. These conventions apply to both integer data and floating-point data, where the most significant byte of a floating-point value holds the sign and the exponent (or at least the start of the exponent). The figures show big-endian byte numbers in the upper left corners and bit numbers in the lower corners.

Figure 1. Bit and byte numbering in halfwords

Figure 2. Bit and byte numbering in words

Figure 3. Bit and byte numbering in doublewords


Fundamental types

Table 1 shows how ANSI C scalar types correspond to those of the S/390 processor. For all types a NULL pointer has the value zero (binary).

Table 1. Scalar types

Type

ANSI C

sizeof (bytes)

Alignment

type (S/390)

Character

signed char

char

unsigned char

1

1

byte

Short

signed short

short

unsigned short

2

2

halfword

Integer

signed int

int

unsigned int

enum

signed long

long

unsigned long

4

4

word

Long long

signed long long

long long

unsigned long long

8

8

doubleword

Pointer

any-type *

any-type (*) ()

4

4

unsigned word

Floating point

float

4

4

single precision (IEEE)

double

8

8

double precision (IEEE)

long

double¹

16

16

extended precision (IEEE)

¹Compilers and systems may implement the long double data type in some other way, for performance reasons, using a compiler option. Examples of such formats could be two successive doubles or even a single double. Such usage does not conform to this ABI however, and runs the risk of passing a wrongly formatted floating-point number to another function as an argument. Programs using other formats should transform long double floating-point numbers to a conforming format before passing them.


Aggregates and unions

Aggregates (structures and arrays) and unions assume the alignment of their most strictly aligned component, that is, the component with the largest alignment. The size of any object, including aggregates and unions, is always a multiple of the alignment of the object. An array uses the same alignment as its elements. Structure and union objects may require padding to meet size and alignment constraints:

  • An entire structure or union object is aligned on the same boundary as its most strictly aligned member.

  • Each member is assigned to the lowest available offset with the appropriate alignment. This may require internal padding, depending on the previous member.

  • If necessary, a structure's size is increased to make it a multiple of the structure's alignment. This may require tail padding if the last member does not end on the appropriate boundary.

In the following examples (Figure 4 to Figure 8), member byte offsets (for the big-endian implementation) appear in the upper left corners.

Table 2.

struct {

         char c;

};

Figure 4. Structure smaller than a word

Table 3.

struct {

         char c;

         char d;

         short s;

         long n;

};

Figure 5. No padding

Table 4.

struct {

         char c;

         short s;

};

Figure 6. Internal padding

Table 5.

struct {

         char c;

         double d;

         short s;

};

Figure 7. Internal and tail padding

Table 6.

union  {

         char c;

         short s;

         int   j;

};

Figure 8. Union padding


Bit-fields

C struct and union definitions may have "bit-fields," defining integral objects with a specified number of bits (see Table 7).

Table 7. Bit fields

Bit-field type

Width n

Range

signed char



char



unsigned char

1 to 8

-2���¹ to

2��¹ - 1



0 to 2� - 1



0 to 2� - 1

signed short



short



unsigned short

1 to 16

-2��¹ to

2��¹ - 1



0 to 2� - 1



0 to 2� - 1

signed int



int



unsigned int



enum



signed long



long



unsigned long

1 to 32

-2��¹ to

2��¹ - 1



0 to 2� - 1



0 to 2� - 1



0 to 2� - 1



-2��¹ to 2��¹ - 1



0 to 2� - 1



0 to 2� - 1

signed long long



long long



unsigned long long

1 to 64

-2��¹ to

2��¹ - 1



0 to 2� - 1



0 to 2� -

1

"Plain" bit-fields (that is, those neither signed nor unsigned) always have non-negative values. Although they may have type short, int or long (which can have negative values), bit-fields of these types have the same range as bit-fields of the same size with the corresponding unsigned type. Bit-fields obey the same size and alignment rules as other structure and union members, with the following additions:

  • Bit-fields are allocated from left to right (most to least significant).

  • A bit-field must entirely reside in a storage unit appropriate for its declared type. Thus, a bit-field never crosses its unit boundary.

  • Bit-fields must share a storage unit with other structure and union members (either bit-field or non-bit-field) if and only if there is sufficient space within the storage unit.

  • Unnamed bit-fields' types do not affect the alignment of a structure or union, although an individual bit-field's member offsets obey the alignment constraints. An unnamed, zero-width bit-field shall prevent any further member, bit-field or other, from residing in the storage unit corresponding to the type of the zero-width bit-field.

The following examples (Figure 9 through Figure 14) show structure and union member byte offsets in the upper left corners. Bit numbers appear in the lower corners.

Figure 9. Bit numbering

Figure 10. Left-to-right allocation

Figure 11. Boundary alignment

Figure 12. Storage unit sharing

Figure 13. Union allocation

Figure 14. Unnamed bit fields


Function calling sequence

This section discusses the standard function calling sequence, including stack frame layout, register usage, and parameter passing.


Registers

The ABI makes the assumption that the processor has 16 general purpose registers and 16 IEEE floating point registers. S/390 processors have 16 general purpose registers; newer models have 16 IEEE floating point registers but older systems have only four non-IEEE floating point registers. On these older machines Linux for S/390 emulates 16 IEEE registers within the kernel. The width of the general purpose registers is 32 bits, and the width of the floating point registers is 64 bits. The use of the registers is described in the table below.

Table 8.

Register name

Usage

Call effect

r0,

r1

General purpose

Volatile¹

r2,

r3

Parameter passing and return values

Volatile

r4,

r5

Parameter passing

Volatile

r6

Parameter passing

Saved²

r7 -

r11

Local variables

Saved

r12

Local variable, commonly used as GOT pointer

Saved

r13

Local variable, commonly used as Literal Pool pointer

Saved

r14

Return address

Volatile

r15

Stack pointer

Saved

f0,

f2

Parameter passing and return values

Volatile

f4,

f6

General purpose

Saved

f1, f3, f5, f7 –

f15

General purpose

Volatile

Access register 0

Reserved for system use

Volatile

Access registers 1-15

General purpose

Volatile

¹Volatile: These registers are not preserved across function calls.

²Saved: These registers belong to the calling function. A called function shall save these registers' values before it changes them, restoring their values before it returns.

  • Registers r6 through r13, r15, f4 and f6 are nonvolatile; that is, they "belong" to the calling function. A called function shall save these registers' values before it changes them, restoring their values before it returns.

  • Registers r0, r1, r2, r3, r4, r5, r14, f0, f1, f2, f3, f5, f6 through f15 are volatile; that is, they are not preserved across function calls.

  • Furthermore the values in registers r0 and r1 may be altered by the interface code in cross-module calls, so a function cannot depend on the values in these registers having the same values that were placed in them by the caller.

The following registers have assigned roles in the standard calling sequence:

Table 9.

r12

Global Offset Table pointer. If a position-independent module uses cross-linking the compiler must point r12 to the GOT as described in the Section called Dynamic Linking in the chapter called Program loading and dynamic linking. If not this register may be used locally.

r13

Commonly used as the Literal Pool pointer. If the Literal Pool is not required this register may be used locally.

r14

This register will contain the address to which a called function will normally return. r14 is volatile across function calls.

r15

The stack pointer (stored in r15) will maintain an 8-byte alignment. It will always point to the lowest allocated valid stack frame, and will grow towards low addresses. The contents of the word addressed by this register may point to the previously allocated stack frame. If required it can be decremented by the called function – see the Section called Dynamic stack space allocation.

Signals can interrupt processes. Functions called during signal handling have no unusual restrictions on their use of registers. Moreover, if a signal handling function returns, the process will resume its original execution path with all registers restored to their original values. Thus programs and compilers may freely use all registers listed above, except those reserved for system use, without the danger of signal handlers inadvertently changing their values.


Register usage

With these calling conventions the following usage of the registers for inline assemblies is recommended:

  • General registers r0 and r1 should be used internally whenever possible

  • General registers r2 to r5 should be second choice

  • General registers r12 to r15 should only be used for their standard function.


The stack frame

A function will be passed a frame on the runtime stack by the function which called it, and may allocate a new stack frame. A new stack frame is required if the called function will in turn call further functions (which must be passed the address of the new frame). This stack grows downwards from high addresses. Figure 15 shows the stack frame organization. SP in the figure denotes the stack pointer (general purpose register r15) passed to the called function on entry. Maintenance of the back chain pointers is not a requirement of the ABI, but the storage area for these pointers must be allocated whether used or not.

Figure 15. Standard stack frame

The format of the register save area created by the gcc compiler is:

Figure 16. Register save area

The following requirements apply to the stack frame:

  • The stack pointer shall maintain 8-byte alignment.

  • The stack pointer points to the first word of the lowest allocated stack frame. If the "back chain" is implemented this word will point to the previously allocated stack frame (towards higher addresses), except for the first stack frame, which shall have a back chain of zero (NULL). The stack shall grow downwards, in other words towards lower addresses.

  • The called function may create a new stack frame by decrementing the stack pointer by the size of the new frame. This is required if this function calls further functions. The stack pointer must be restored prior to return.

  • The parameter list area shall be allocated by the caller and shall be large enough to contain the arguments that the caller stores in it. Its contents are not preserved across calls.

  • Other areas depend on the compiler and the code being compiled. The standard calling sequence does not define a maximum stack frame size.

The stack space for the register save area and back chain must be allocated by the caller. The size of these is 96 bytes.

Except for the stack frame header and any padding necessary to make the entire frame a multiple of 8 bytes in length, a function need not allocate space for the areas that it does not use. If a function does not call any other functions and does not require any of the other parts of the stack frame, it need not establish a stack frame. Any padding of the frame as a whole shall be within the local variable area; the parameter list area shall immediately follow the stack frame header, and the register save areas shall contain no padding.


Parameter passing

Arguments to called functions are passed in registers. Since all computations must be performed in registers, memory traffic can be eliminated if the caller can compute arguments into registers and pass them in the same registers to the called function, where the called function can then use these arguments for further computation in the same registers. The number of registers implemented in a processor architecture naturally limits the number of arguments that can be passed in this manner.

For Linux for S/390, the following applies:

  • General registers r2 to r6 are used for integer values.

  • Floating point registers f0 and f2 are used for floating point values.

If there are more than five integral values or two floating point values, the rest of the arguments are passed on the stack 96 bytes above the initial stack pointer.

Beside these general rules the following rules apply:

  • char, short and int are passed in general registers.

  • long long are passed in two consecutive general registers if the next available register is smaller than 6. If the upper 32 bits would end in general register 6 then this register is skipped and the whole 64 bit value is passed on the stack.

  • Structures equivalent to a floating point type are passed in floating point registers. A structure is equivalent to a floating point type if and only if it has exactly one member which is either of floating point type of itself a structure equivalent to a floating point type.

  • Structures with a size of 1, 2, or 4 bytes which are not equivalent to a floating point type are passed as integral values.

  • Structures with a size of 8 bytes which are not equivalent to a floating point type are passed as an integal value in two registers.

  • All other structures are passed by reference. If needed, the called function makes a copy of the value.

  • Complex numbers are passed as structures.

Figure 17. Parameter list area

The following algorithm specifies where argument data is passed for the C language. For this purpose, consider the arguments as ordered from left (first argument) to right, although the order of evaluation of the arguments is unspecified. In this algorithm fr contains the number of the next available floating-point register, gr contains the number of the next available general purpose register, and starg is the address of the next available stack argument word.

INITIALIZE

Set fr=0, gr=2, and starg to the address of parameter word 1.

SCAN

If there are no more arguments, terminate. Otherwise, select one of the following depending on the type of the next argument:

DOUBLE_OR_FLOAT:

A DOUBLE_OR_FLOAT is one of the following:

  • A single length floating point type,

  • A double length floating point type.

  • A structure equivalent to a floating point type.

If fr>2, that is, if there are no more available floating-point registers, go to OTHER. Otherwise, load the argument value into floating-point register fr, set fr to fr+2, and go to SCAN.

SIMPLE_ARG

A SIMPLE_ARG is one of the following:

  • One of the simple integer types no more than 32 bits wide (char, short, int, long, enum).

  • A pointer to an object of any type.

  • A struct or a union of 1, 2 or 4 bytes which is not a structure equivalent to a floating point type.

  • A struct or union of another size, or a long double, any of which shall be passed as a pointer to the object, or to a copy of the object where necessary to enforce call-by-value semantics. Only if the caller can ascertain that the object is "constant" can it pass a pointer to the object itself.

If gr>6, go to OTHER. Otherwise load the argument value into general register gr, set gr to gr+1, and go to SCAN. Values shorter than 32 bits are sign- or zero-extended (as appropriate) to 32 bits.

DOUBLE_ARG

A DOUBLE_ARG is one of type long long, or is a struct or a union of size 8 bytes which is not a structure equivalent to a floating point type.

If gr>5 set gr to 7 and go to OTHER. Load the lower-addressed word of the long long into gr and the higher-addressed word into gr+1, set gr to gr+2, and go to SCAN.

OTHER

Arguments not otherwise handled above are passed in the parameter words of the caller's stack frame. SIMPLE_ARGs, as defined above, are considered to have a size of 4 bytes, where simple interger types shorter than 4 bytes are signed or zero-extended (as appropriate) to 4 bytes, and other arguments of size less than 4 bytes will be placed right-justified into a 4 byte slot. float arguments have a size of 4 bytes; long long and double arguments have a size of 8 bytes.

Coy the argument to the current stack position starg, using the argument size of 4 or 8 bytes as given above. Increment starg by the argument size, then go to SCAN.

The contents of registers and words which are skipped by the above algorithm for alignment purposes (padding) are undefined.

As an example, assume the declarations and the function call shown in Figure 18. The corresponding register allocation and storage would be as shown in Table 10.

int i, j, k, l;

long long ll;

double f, g, h;

int m;



x = func(i, j, g, k, l, ll, f, h,

m);

Figure 18. Parameter passing example

Table 10. Parameter passing example: Register allocation

General purpose registers

Floating-point registers

Stack frame offset

r2: i

f0: g

96: ll

r3: j

f2: f

104: h

r4: k

112: m

r5: l

r6: -

In this example r6 is unused as the long long variable ll will not fit into a single register.


Variable argument lists

Some otherwise portable C programs depend on the argument passing scheme, implicitly assuming that 1) all arguments are passed on the stack, and 2) arguments appear in increasing order on the stack. Programs that make these assumptions have never been portable, but they have worked on many implementations. However, they do not work on the ESA/390 architecture because some arguments are passed in registers. Portable C programs use the header files <stdarg.h> or <varargs.h> to deal with variable argument lists on S/390 and other machines as well.


Return values

In general, arguments are returned in registers, as described in Table 11.

Table 11. Registers for return values

Type

Returned in register:

char, short, int and long

general register 2 (r2)

long long

general registers 2 and 3 (r2, r3)

double and float

floating point register 0 (f0)

Functions shall return float or double values in f0, with float values rounded to single precision. Functions shall return values of type int, long, enum, short and char, or a pointer to any type as unsigned or signed integers as appropriate, zero- or sign-extended to 32 bits if necessary, in r2.

Values of type long long and unsigned long long shall be returned with the lower addressed half in r2 and the higher in r3.

Values of type long double and structures or unions are returned in a storage buffer allocated by the caller. The address of this buffer is passed as a hidden argument in r2 as if it were the first argument, causing gr in the argument passing algorithy above to be initialized to 3 instead of 2.


Operating system interface

Virtual address space

Processes execute in a 31-bit virtual address space. Memory management translates virtual addresses to physical addresses, hiding physical addressing and letting a process run anywhere in the system's real memory. Processes typically begin with three logical segments, commonly called "text", "data" and "stack". An object file may contain more segments (for example, for debugger use), and a process can also create additional segments for itself with system services.

Note

The term "virtual address" as used in this document refers to a 31-bit address generated by a program, as contrasted with the physical address to which it is mapped.


Page size

Memory is organized into pages, which are the system's smallest units of memory allocation. The hardware page size for the ESA/390 architecture is 4096 bytes.


Virtual address assignments

Processes have the full 31-bit address space available to them.

Figure 19 shows the virtual address configuration on the S/390 architecture. The segments with different properties are typically grouped in different areas of the address space. The loadable segments may begin at zero (0); the exact addresses depend on the executable file format (see the chapter called Object files and the chapter called Program loading and dynamic linking). The process' stack resides at the end of the virtual memory and grows downwards. Processes can control the amount of virtual memory allotted for stack space, as described below.

Figure 19. Virtual address configuration

Note

Although application programs may begin at virtual address 0, they conventionally begin above 0x1000 (4 Kbytes), leaving the initial 4 Kbytes with an invalid address mapping. Processes that reference this invalid memory (for example by de-referencing a null pointer) generate an translation exception as described in the Section called Exception interface.

Although applications may control their memory assignments, the typical arrangement follows the diagram above. When applications let the system choose addresses for dynamic segments (including shared object segments), the system will prefer addresses in the upper half of the address space (above 1 Gbyte).


Managing the process stack

The section the Section called Process initialization describes the initial stack contents. Stack addresses can change from one system to the next – even from one process execution to the next on a single system. A program, therefore, should not depend on finding its stack at a particular virtual address.

A tunable configuration parameter controls the system maximum stack size. A process can also use setrlimit to set its own maximum stack size, up to the system limit. The stack segment is both readable and writable.


Coding guidelines

Operating system facilities, such as mmap, allow a process to establish address mappings in two ways. Firstly, the program can let the system choose an address. Secondly, the program can request the system to use an address the program supplies. The second alternative can cause application portability problems because the requested address might not always be available. Differences in virtual address space can be particularly troublesome between different architectures, but the same problems can arise within a single architecture.

Processes' address spaces typically have three segments that can change size from one execution to the next: the stack (through setrlimit); the data segment (through malloc); and the dynamic segment area (through mmap). Changes in one area may affect the virtual addresses available for another. Consequently an address that is available in one process execution might not be available in the next. Thus a program that used mmap to request a mapping at a specific address could appear to work in some environments and fail in others. For this reason programs that want to establish a mapping in their address space should let the system choose the address.

Despite these warnings about requesting specific addresses the facility can be used properly. For example, a multiprocess application might map several files into the address space of each process and build relative pointers among the files' data. This could be done by having each process ask for a certain amount of memory at an address chosen by the system. After each process receives its own private address from the system it would map the desired files into memory at specific addresses within the original area. This collection of mappings could be at different addresses in each process but their relative positions would be fixed. Without the ability to ask for specific addresses, the application could not build shared data structures because the relative positions for files in each process would be unpredictable.


Processor execution modes

Two execution modes exist in the ESA/390 architecture: problem (user) state and supervisor state. Processes run in problem state (the less privileged). The operating system kernel runs in supervisor state. A program executes an supervisor call (svc) instruction to change execution modes.

Note that the ABI does not define the implementation of individual system calls. Instead programs shall use the system libraries. Programs with embedded system call or trap instructions do not conform to the ABI.


Exception interface

The ESA/390 exception mechanism allows the processor to change to supervisor state as a result of six different causes: system calls, I/O interrupts, external interrupts, machine checks, restart interruptions or program checks (unusual conditions arising in the execution of instructions).

When exceptions occur:

  1. information (such as the address of the next instruction to be executed after control is returned to the original program) is saved,

  2. program control passes from user to supervisor level, and

  3. software continues execution at an address (the exception vector) predetermined for each exception.

Exceptions may be synchronous or asynchronous. Synchronous exceptions, being caused by instruction execution, can be explicitly generated by a process. The operating system handles an exception either by completing the faulting operation in a manner transparent to the application or by delivering a signal to the application. The correspondence between exceptions and signals is shown in Table 12.

Table 12. Exceptions and Signals

Exception Name

Signal

Examples

Illegal instruction

SIGILL

Illegal or privileged instruction, Invalid instruction form, Optional, unimplemented instruction

Storage access

SIGSEGV

Unmapped instruction or data location access, Storage protection violation

Alignment

SIGBUS

Invalid data item alignment, Invalid memory access

Breakpoint

SIGTRAP

Breakpoint program check

Floating exception

SIGFPE

Floating point overflow or underflow, Floating point divide by zero, Floating point conversion overflow, Other enabled floating point exceptions

The signals that an exception may give rise to are SIGILL, SIGSEGV, SIGBUS, SIGTRAP, and SIGFPE. If one of these signals is generated due to an exception when the signal is blocked, the behavior is undefined.


Process initialization

This section describes the machine state that exec creates for "infant" processes, including argument passing, register usage, and stack frame layout. Programming language systems use this initial program state to establish a standard environment for their application programs. For example, a C program begins executing at a function named main, conventionally declared in the way described in Figure 20:

   extern int main (int argc, char

*argv[ ], char *envp[ ]);

Figure 20. Declaration for main

Briefly, argc is a non-negative argument count; argv is an array of argument strings, with argv[argc] == 0, and envp is an array of environment strings, also terminated by a NULL pointer.

Although this section does not describe C program initialization, it gives the information necessary to implement the call to main or to the entry point for a program in any other language.


Registers

When a process is first entered (from an exec system call), the contents of registers other than those listed below are unspecified. Consequently, a program that requires registers to have specific values must set them explicitly during process initialization. It should not rely on the operating system to set all registers to 0. Following are the registers whose contents are specified:

Table 13.

r15

The initial stack pointer, aligned to a 8-byte boundary and pointing to a stack location that contains the argument count (see the Section called Process stack for further information about the initial stack layout)

fpc

The floating point control register contains 0, specifying "round to nearest" mode and the disabling of floating-point exceptions


Process stack

Every process has a stack, but the system defines no fixed stack address. Furthermore, a program's stack address can change from one system to another – even from one process invocation to another. Thus the process initialization code must use the stack address in general purpose register r15. Data in the stack segment at addresses below the stack pointer contain undefined values.

Whereas the argument and environment vectors transmit information from one application program to another, the auxiliary vector conveys information from the operating system to the program. This vector is an array of structures, which are defined in Figure 21.

typedef struct {

                int a_type;

                union {

                       long a_val;

                       void *a_ptr;

                       void (*a_fcn)();

               } a_un;

} auxv_t;

Figure 21. Auxiliary vector structure

The structures are interpreted according to the a_type member, as shown in Table 14.

Table 14. Auxiliary Vector Types, a_type

Name

Value

a_un

AT_NULL

0

ignored

AT_IGNORE

1

ignored

AT_EXECFD

2

a_val

AT_PHDR

3

a_ptr

AT_PHENT

4

a_val

AT_PHNUM

5

a_val

AT_PAGESZ

6

a_val

AT_BASE

7

a_ptr

AT_FLAGS

8

a_val

AT_ENTRY

9

a_ptr

AT_NOTELF

10

a_val

AT_UID

11

a_val

AT_EUID

12

a_val

AT_GID

13

a_val

AT_EGID

14

a_val

a_type auxiliary vector types are described in 'Auxiliary Vector Types Description' below.

Auxiliary Vector Types Description

AT_NULL

The auxiliary vector has no fixed length; so an entry of this type is used to denote the end of the vector. The corresponding value of a_un is undefined.

AT_IGNORE

This type indicates the entry has no meaning. The corresponding value of a_un is undefined.

AT_EXECFD

exec may pass control to an interpreter program. When this happens, the system places either an entry of type AT_EXECFD or one of type AT_PHDR in the auxiliary vector. The a_val field in the AT_EXECFD entry contains a file descriptor for the application program's object file.

AT_PHDR

Under some conditions, the system creates the memory image of the application program before passing control to an interpreter program. When this happens, the a_ptr field of the AT_PHDR entry tells the interpreter where to find the program header table in the memory image. If the AT_PHDR entry is present, entries of types AT_PHENT, AT_PHNUM and AT_ENTRY must also be present. See the section the chapter called Program loading and dynamic linking for more information about the program header table.

AT_PHENT

The a_val field of this entry holds the size, in bytes, of one entry in the program header table at which the AT_PHDR entry points.

AT_PHNUM

The a_val field of this entry holds the number of entries in the program header table at which the AT_PHDR entry points.

AT_PAGESZ

If present this entry's a_val field gives the system page size in bytes. The same information is also available through sysconf.

AT_BASE

The a_ptr member of this entry holds the base address at which the interpreter program was loaded into memory.

AT_FLAGS

If present, the a_val field of this entry holds 1-bit flags. Undefined bits are set to zero.

AT_ENTRY

The a_ptr field of this entry holds the entry point of the application program to which the interpreter program should transfer control.

AT_NOTELF

The a_val field of this entry is non-zero if the program is in another format than ELF, for example in the old COFF format.

AT_UID

The a_ptr field of this entry holds the real user id of the process.

AT_EUID

The a_ptr field of this entry holds the effective user id of the process.

AT_GID

The a_ptr field of this entry holds the real group id of the process.

AT_EGID

The a_ptr field of this entry holds the effective group id of the process.

Other auxiliary vector types are reserved. No flags are currently defined for AT_FLAGS on the S/390 architecture.

When a process receives control, its stack holds the arguments, environment, and auxiliary vector from exec. Argument strings, environment strings, and the auxiliary information appear in no specific order within the information block; the system makes no guarantees about their relative arrangement. The system may also leave an unspecified amount of memory between the null auxiliary vector entry and the beginning of the information block. A sample initial stack is shown in Figure 22.

Figure 22. Initial Process Stack


Coding examples

This section describes example code sequences for fundamental operations such as calling functions, accessing static objects, and transferring control from one part of a program to another. Previous sections discussed how a program may use the machine or the operating system, and they specified what a program may and may not assume about the execution environment. Unlike previous material, the information in this section illustrates how operations may be done, not how they must be done.

As before, examples use the ANSI C language. Other programming languages may use the same conventions displayed below, but failure to do so does not prevent a program from conforming to the ABI. Two main object code models are available:

Absolute code

Instructions can hold absolute addresses under this model. To execute properly, the program must be loaded at a specific virtual address, making the program's absolute addresses coincide with the process' virtual addresses.

Position-independent code

Instructions under this model hold relative addresses, not absolute addresses. Consequently, the code is not tied to a specific load address, allowing it to execute properly at various positions in virtual memory.

The following sections describe the differences between these models. When different, code sequences for the models appear together for easier comparison.

Note

The examples below show code fragments with various simplifications. They are intended to explain addressing modes, not to show optimal code sequences or to reproduce compiler output.


Code model overview

When the system creates a process image, the executable file portion of the process has fixed addresses and the system chooses shared object library virtual addresses to avoid conflicts with other segments in the process. To maximize text sharing, shared objects conventionally use position-independent code, in which instructions contain no absolute addresses. Shared object text segments can be loaded at various virtual addresses without having to change the segment images. Thus multiple processes can share a single shared object text segment, even if the segment resides at a different virtual address in each process.

Position-independent code relies on two techniques:

  • Control transfer instructions hold addresses relative to the Current Instruction Address (CIA), or use registers that hold the transfer address. A CIA-relative branch computes its destination address in terms of the CIA, not relative to any absolute address.

  • When the program requires an absolute address, it computes the desired value. Instead of embedding absolute addresses in instructions (in the text segment), the compiler generates code to calculate an absolute address (in a register or in the stack or data segment) during execution.

Because the ESA/390 architecture provides CIA-relative branch instructions and also branch instructions using registers that hold the transfer address, compilers can satisfy the first condition easily.

A Global Offset Table (GOT), provides information for address calculation. Position-independent object files (executable and shared object files) have a table in their data segment that holds addresses. When the system creates the memory image for an object file, the table entries are relocated to reflect the absolute virtual address as assigned for an individual process. Because data segments are private for each process, the table entries can change – unlike text segments, which multiple processes share.

Two position-independent models give programs a choice between more efficient code with some size restrictions and less efficient code without those restrictions. Because of the processor architecture, a GOT with no more than 1024 entries (4096 bytes) is more efficient than a larger one. Programs that need more entries must use the larger, more general code. In the following sections, the term "small model position-independent code" is used to refer to code that assumes the smaller GOT, and "large model position-independent code" is used to refer to the general code.


Function prolog and epilog

This section describes the prolog and epilog code of functions . A function's prolog establishes a stack frame, if necessary, and may save any nonvolatile registers it uses. A function's epilog generally restores registers that were saved in the prolog code, restores the previous stack frame, and returns to the caller.


Prolog

The prolog of a function has to save the state of the calling function and set up the base register for the code of the function body. The following is in general done by the function prolog:

  • Save all registers used within the function which the calling function assumes to be non-volatile.

  • Set up the base register for the literal pool.

  • Allocate stack space by decrementing the stack pointer.

  • Set up the dynamic chain by storing the old stack pointer value at stack location zero if the "back chain" is implemented.

  • Set up the GOT pointer if the compiler is generating position independent code.

    (A function that is position independent will probably want to load a pointer to the GOT into a nonvolatile register. This may be omitted if the function makes no external data references. If external data references are only made within conditional code, loading the GOT pointer may be deferred until it is known to be needed.)

  • Set up the frame pointer if the function allocates stack space dynamically (with alloca).

The compiler tries to do as little as possible of the above; the ideal case is to do nothing at all (for a leaf function without symbolic references).


Epilog

The epilog of a function restores the registers saved in the prolog (which include the stack pointer) and branches to the return address.


Prolog and epilog example

.LC18:

          .string "hello, world\n"

          .align  4

          .globl  main

          .type   main,@function

main:

                                       # Prolog

          STM     11,15,44(15)         # Save callers registers

          BRAS    13,.LTN0_0           # Set up literal pool and branch

over

.LT0_0:

.LC21:

          .long   .LC18

.LC22:

          .long   printf

.LTN0_0:

          LR      1,15                 # Load stack pointer in GPR 1

          AHI     15,-96               # Allocate stack space

          ST      1,0(15)              # Save backchain

                                       # Prolog end

          L       2,.LC21-.LT0_0(13)

          L       1,.LC22-.LT0_0(13)

          BASR    14,1

          SLR     2,2



                                       # Epilog

          L       4,152(15)            # Load return address

          LM      11,15,140(15)        # Restore registers

          BR      4                    # Branch back to caller

                                       # Epilog end

Figure 23. Prolog and epilog example


Profiling

This section shows a way of providing profiling (entry counting) on S/390 systems. An ABI-conforming system is not required to provide profiling; however if it does this is one possible (not required) implementation.

If a function is to be profiled it has to call the _mcount routine after the function prolog. This routine has a special linkage. It gets an address in register 1 and returns without having changed any register. The address is a pointer to a word-aligned one-word static data area, initialized to zero, in which the _mcount routine is to maintain a count of the number of times the function is called.

For example Figure 24 shows how the code after the function prolog may look.

          STM     7,15,28(15)          #

Save callers registers

          BRAS    13,.LTN0_0           # Jump to function prolog

.LT0_0:

.LC3:     .long   _mcount              # Literal pool entry for _mcount

.LC4:     .long   .LP0                 # Literal pool entry for profile

counter

.LTN0_0:

          LR      1,15                 # Stack pointer

          AHI     15,-96               # Allocate new

          ST      1,0(15)              # Save backchain

          LR      11,15                # Local stack pointer

          .data

          .align 4

.LP0:     .long   0                    # Profile counter

          .text

                                       # Function profiler

          ST    14,4(15)               # Preserve r14

          L     14,.LC3-.LT0_0(13)     # Load address of _mcount

          L     1,.LC4-.LT0_0(13)      # Load address of profile counter

          BASR  14,14                  # Branch to _mcount

          L     14,4(15)               # Restore r14

Figure 24. Code for profiling


Data objects

This section describes only objects with static storage duration. It excludes stack-resident objects because programs always compute their virtual addresses relative to the stack or frame pointers.

Because S/390 instructions cannot hold 31-bit addresses directly, a program has to build an address in a register and access memory through that register. In order to do so a function normally has a literal pool that holds the addresses of data objects used by the function. Register 13 is set up in the function prolog to point to the start of this literal pool.

Position-independent code cannot contain absolute addresses. In order to access a local symbol the literal pool contains the (signed) offset of the symbol relative to the start of the pool. Combining the offset loaded from the literal pool with the address in register 13 gives the absolute address of the local symbol. In the case of a global symbol the address of the symbol has to be loaded from the Global Offset Table. The offset in the GOT can either be contained in the instruction itself or in the literal pool. See Figure 25 for an example.

Figure 25 through Figure 27 show sample assembly language equivalents to C language code for absolute and position-independent compilations. It is assumed that all shared objects are compiled as position-independent and only executable modules may have absolute addresses. The code in the figures contains many redundant operations as it is only intended to show how each C statement could have been compiled independently of its context. The function prolog is not shown, and it is assumed that it has loaded the address of the literal pool in register 13.

Table 15.

C

S/390 machine instructions (Assembler)

extern int src;

extern int dst;

extern int *ptr;



dst = src;

















ptr = &dst;















*ptr = src;








                              # Literal pool

.LT0:

.LC1:      .long dst

.LC2:      .long src

                              # Code

           L     2,.LC1-.LT0(13)

           L     1,.LC2-.LT0(13)

           MVC   0(4,2),0(1)



                              # Literal pool

.LT0:

.LC1:      .long ptr

.LC2:      .long dst

                              # Code

           L     1,.LC1-.LT0(13)

           MVC   0(4,1),.LC2-.LT0(13)



                              # Literal pool

.LT0:

.LC1:      .long ptr

.LC2:      .long src

                              # Code

           L     1,.LC1-.LT0(13)

           L     2,.LC2-.LT0(13)

           L     3,0(1)

           MVC  

0(4,3),0(2)

Figure 25. Absolute addressing

Table 16.

C

S/390 machine instructions (Assembler)

extern int src;

extern int dst;

extern int *ptr;



dst = src;



















ptr = &dst;



















*ptr = src;








                              # Literal pool

.LT0:

.LC1:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                              # Code

           L     12,.LC1-.LT0(13)

           LA    12,0(12,13)

           L     2,dst@GOT(12)

           L     1,src@GOT(12)

           MVC   0(4,2),0(1)



                              # Literal pool

.LT0:

.LC1:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                              # Code

           L     12,.LC1-.LT0(13)

           LA    12,0(12,13)

           L     1,ptr@GOT(12)

           L     2,dst@GOT(12)

           ST    2,0(1)



                              # Literal pool

.LT0:

.LC1:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                              # Code

           L     12,.LC1-.LT0(13)

           LA    12,0(12,13)

           L     1,ptr@GOT(12)

           L     2,src@GOT(12)

           L     3,0(1)

           MVC  

0(4,3),0(2)

Figure 26. Small model position-independent addressing

Table 17.

C

S/390 Assembler

extern int src;

extern int dst;

extern int *ptr;



dst = src;



























ptr = &dst;



























*ptr = src;








                              # Literal pool

.LT0:

.LC1:      .long dst@GOT

.LC2:      .long src@GOT

.LC3:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                              # Code

           L     12,.LC3-.LT0(13)

           LA    12,0(12,13)

           L     2,.LC1-.LT0(13)

           L     1,.LC2-.LT0(13)

           L     2,0(2,12)

           L     1,0(1,12)

           MVC   0(4,2),0(1)



                              # Literal pool

.LT0:

.LC1:      .long ptr@GOT

.LC2:      .long dst@GOT

.LC3:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                              # Code

           L     12,.LC3-.LT0(13)

           LA    12,0(12,13)

           L     2,.LC1-.LT0(13)

           L     1,.LC2-.LT0(13)

           L     2,0(2,12)

           L     1,0(1,12)

           ST    1,0(2)



                              # Literal pool

.LT0:

.LC1:      .long ptr@GOT

.LC2:      .long src@GOT

.LC3:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                              # Code

           L   12,.LC1-.LT0(13)

           LA  12,0(12,13)

           L   1,.LC1-.LT0(13)

           L   2,.LC2-.LT0(13)

           L   1,0(1,12)

           L   2,0(2,12)

           L   3,0(1)

           MVC

0(4,3),0(2)

Figure 27. Large model position-independent addressing


Function calls

Programs can use the ESA/390 BRAS instruction to make direct function calls. A BRAS instruction has a self-relative branch displacement that can reach 64 Kbytes in either direction. Hence the use of the BRAS instruction is limited to very rare cases. The usual method of calling a function is to load the address in a register and use the BASR instruction for the call. Register 14 is used as the first operand of BASR to hold the return address as shown in Figure 28.

The called function may be in the same module (executable or shared object) as the caller, or it may be in a different module. In the former case, if the called function is not in a shared object, the linkage editor resolves the symbol. In all other cases the linkage editor cannot directly resolve the symbol. Instead the linkage editor generates "glue" code and resolves the symbol to point to the glue code. The dynamic linker will provide the real address of the function in the Global Offset Table. The glue code loads this address and branches to the function itself. See the Section called Procedure Linkage Table in the chapter called Program loading and dynamic linking for more details.

Table 18.

C

S/390 machine instructions (Assembler)

extern void func();

extern void (*ptr)();



ptr = func;

















func();











(*ptr) ();






                              # Literal pool

.LT0:

.LC1:      .long ptr

.LC2:      .long func

                              # Code

           L     1,.LC1-.LT0(13)

           MVC   0(4,1),.LC2-.LT0(13)



                              # Literal pool

.LT0:

.LC1:      .long func

                              # Code

           L     1,.LC1-.LT0(13)

           BASR  14,1



                              # Literal pool

.LT0:

.LC1:      .long ptr

                              # Code

           L     1,.LC1-.LT0(13)

           L     1,0(1)

           BASR 

14,1

Figure 28. Absolute direct function call

Table 19.

C

S/390 machine instructions (Assembler)

extern void func();

extern void (*ptr)();



ptr = func;



















func();























(*ptr) ();






                              # Literal pool

.LT0:

.LC1:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                              # Code

           L     12,.LC1-.LT0(13)

           LA    12,0(12,13)

           L     1,ptr@GOT(12)

           L     2,func@GOT(12)

           ST    2,0(1)



                              # Literal pool

.LT0:

.LC1:      .long _GLOBAL_OFFSET_TABLE_-.LT0

.LC2:      .long func@PLT-.LT0

                              # Code

           L     12,.LC1-.LT0(13)

           LA    12,0(12,13)

           L     1,.LC2-.LT0(13)

           BAS   14,0(1,13)



                              # Literal pool

.LT0:

.LC1:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                              # Code

           L     12,.LC1-.LT0(13)

           LA    12,0(12,13)

           L     1,ptr@GOT(12)

           L     2,0(1)

           BASR 

14,2

Figure 29. Small model position-independent direct function call

Table 20.

C

S/390 machine instructions (Assembler)

extern void func();

extern void (*ptr)();



ptr = func;





























func();

















(*ptr) ();






                              # Literal pool

.LT0:

.LC1:      .long ptr@GOT

.LC2:      .long func@GOT

.LC3:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                              # Code

           L     12,.LC3-.LT0(13)

           LA    12,0(12,13)

           L     2,.LC1-.LT0(13)

           L     1,.LC2-.LT0(13)

           L     2,0(2,12)

           L     1,0(1,12)

           ST    1,0(2)



                              # Literal pool

.LT0:

.LC1:      .long _GLOBAL_OFFSET_TABLE_-.LT0

.LC2:      .long func@PLT-.LT0

                              # Code

           L     12,.LC1-.LT0(13)

           LA    12,0(12,13)

           L     1,.LC2-.LT0(13)

           BAS   14,0(1,13)



                             # Literal pool

.LT0:

.LC1:      .long ptr@GOT

.LC2:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                             # Code

           L     12,.LC2-.LT0(13)

           LA    12,0(12,13)

           L     1,.LC1-.LT0(13)

           L     1,0(1,12)

           L     2,0(1)

           BASR 

14,2

Figure 30. Large model position-independent direct function call

Table 21.

C

S/390 machine instructions (Assembler)

extern void func();

extern void (*ptr) ();



ptr = func;















(*ptr) ();






                             # Literal pool

.LT0:

.LC1:     .long ptr

.LC2:     .long func

                             # Code

          L     1,.LC1-.LT0(13)

          MVC   0(4,1),.LC2-.LT0(13)



                             # Literal pool

.LT0:

.LC1:     .long ptr

                             # Code

          L     1,.LC1-.LT0(13)

          L     1,0(1)

          BASR 

14,1

Figure 31. Absolute indirect function call

Table 22.

C

S/390 machine instructions (Assembler)

extern void func();

extern void (*ptr) ();



ptr = func;



















(*ptr) ();






                              # Literal pool

.LT0:

.LC1:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                              # Code

           L     12,.LC2-.LT0(13)

           LA    12,0(12,13)

           L     1,ptr@GOT(12)

           L     2,func@GOT(12)

           ST    2,0(1)



                              # Literal pool

.LT0:

.LC1:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                              # Code

           L     12,.LC1-.LT0(13)

           LA    12,0(12,13)

           L     1,ptr@GOT(12)

           L     2,0(1)

           BASR 

14,2

Figure 32. Small model position-independent indirect function call

Table 23.

C

S/390 machine instructions (Assembler)

extern void func();

extern void (*ptr) ();



ptr = func;



























(*ptr) ();






                              # Literal pool

.LT0:

.LC1:      .long ptr@GOT

.LC2:      .long func@GOT

.LC3:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                              # Code

           L     12,.LC3-.LT0(13)

           LA    12,0(12,13)

           L     2,.LC1-.LT0(13)

           L     1,.LC2-.LT0(13)

           L     2,0(2,12)

           L     1,0(1,12)

           ST    1,0(2)



                              # Literal pool

.LT0:

.LC1:      .long ptr@GOT

.LC2:      .long _GLOBAL_OFFSET_TABLE_-.LT0

                             # Code

           L     12,.LC2-.LT0(13)

           LA    12,0(12,13)

           L     1,.LC1-.LT0(13)

           L     1,0(1,12)

           L     2,0(1)

           BASR 

14,2

Figure 33. Large model position-independent indirect function call


Branching

Programs use branch instructions to control their execution flow. The ESA/390 architecture has a variety of branch instructions. The most commonly used of these performs a self-relative jump with a 128-Kbyte range (up to 64 Kbytes in either direction).

Table 24.

C

S/390 machine instructions (Assembler)

label:

        ...

        goto label;

.L01:

          &