C Program Compilation Steps
You compile c program and get executables. Have you ever wondered what happens during compilation process and how c program gets converted to executable?
In this module we will learn what are the stages involved in c program compilation using gcc on Linux.
Normally C program building process involves four stages to get executable (.exe)
The following Figure shows the steps involved in the process of building the C program starting from the preprocessing until the loading of the executable image into the memory for program running.
Compilation with gcc with different options
-E Preprocess only; do not compile, assemble or link
-S Compile only; do not assemble or link
-c Compile and assemble, but do not link
-o <file> Place the output into <file>
We will use below hello.c program to expain all the 4 phases
#include<stdio.h> //Line 1
#define MAX_AGE 21 //Line 2
printf( "Maximum age : %d ",MAX_AGE); //Line 5
This is the very first stage through which a source code passes. In this stage the following tasks are done:
- Macro substitution
- Comments are stripped off
- Expansion of the included files
To understand preprocessing better, you can compile the above ‘hello.c’ program using flag –E with gcc. This will generate the preprocessed hello.i
>gcc -E hello.c -o hello.i
//hello.i file content
# 1 "hello.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "hello.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 28 "/usr/include/stdio.h" 3 4
Truncated some text…
extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__));
# 918 "/usr/include/stdio.h" 3 4
# 2 "hello.c" 2
printf( "Maximum age : %d ",21);
In above code (hello.i) you can see macros are substituted with its value (MA_AGE with 21 in printf statement), comments are stripped off (//Line 1, //Line 2 and //Line 5)and libraries are expanded(<stdio.h>)
Compilation is the second pass. It takes the output of the preprocessor (hello.i) and generates assembler source code (hello.s)
> gcc -S hello.i -o hello.s
//hello.s file content
.string "Maximum age : %d "
.type main, @function
movq %rsp, %rbp
.cfi_offset 6, -16
movl $.LC0, %eax
movl $21, %esi
movq %rax, %rdi
movl $0, %eax
.cfi_def_cfa 7, 8
.size main, .-main
.ident "GCC: (GNU) 4.4.2 20091027 (Red Hat 4.4.2-7)"
Above code is assembly code which assembler can understand and generate machine code.
Assembly is the third stage of compilation. It takes the assembly source code (hello.s) and produces an assembly listing with offsets. The assembler output is stored in an object file (hello.o)
>gcc -c hello.s -o hello.o
Since the output of this stage is a machine level file (hello.o). So we cannot view the content of it. If you still try to open the hello.o and view it, you’ll see something that is totally not readable
//hello.o file content
^@^@^@^@¾^U^@^@^@H<89>ç¸^@^@^@^@è^@^@^@^@éã^@^@^@Maximum age :%d
By looking at above code only thing we can explain is ELF (executable and linkable format). This is a relatively new format for machine level object files and executable that are produced by gcc.
Linking is the final stage of compilation. It takes one or more object files or libraries as input and combines them to produce a single executable file (hello.exe). In doing so, it resolves references to external symbols, assigns final addresses to procedures/functions and variables, and revises code and data to reflect new addresses (a process called relocation).
> gcc hello.o -o hello
Maximum age : 21
Now you know c program compilation steps (Preprocessing, Compiling, Assembly, and Linking). There is lot more things to explain in liking phase.