Compilation process in C

What is Compilation?

Compilation in C refers to the process of translating the human-readable C source code into machine-readable binary code that can be executed by a computer's processor. This process involves several steps, including preprocessing, compiling, assembling, and linking. Here's the example how the C code is converted into machine-readable binary code:


#include 
int main() {
    printf("Hello dotnetustad.com");
    return 0;
}

The above source code is converted into machine-readable binary code.


00100011 01101001 01101110 01100011 01101100 01110101 01100100 01100101 00100000 00111100 01110011 
01110100 01100100 01101001 01101111 00101110 01101000 00111110 00001010 01101001 01101110 01110100 
00100000 01101101 01100001 01101001 01101110 00101000 00101001 00100000 01111011 00001010 00100000 
00100000 00100000 00100000 01110000 01110010 01101001 01101110 01110100 01100110 00101000 00100010 
01001000 01100101 01101100 01101100 01101111 00100000 01100100 01101111 01110100 01101110 01100101 
01110100 01110101 01110011 01110100 01100001 01100100 00101110 01100011 01101111 01101101 00100010 
00101001 00111011 00001010 00100000 00100000 00100000 00100000 01110010 01100101 01110100 01110101 
01110010 01101110 00100000 00110000 00111011 00001010 01111101 

This process of compiling C code transforms the provided source code into object code or machine code, and it can be broken down into four distinct stages: Preprocessing, Compilation, Assembly, and Linking.

During preprocessing, the source code is initially taken as input, and its comments are removed. The preprocessor also interprets preprocessor directives. For instance, if the program contains a directive such as , the preprocessor interprets this directive and substitutes it with the content of the 'stdio.h' file.

The compilation process in C involves several steps that transform your C source code into an executable program. Here is an overview of the typical compilation process:

  1. Preprocessing:
    • The first step is preprocessing, where the C preprocessor (often a separate program) processes the source code before actual compilation.
    • It handles preprocessor directives like #include, #define, and #ifdef, performs macro substitutions, and removes comments.
    • The output of this stage is called the "preprocessed code."
  2. Compilation:
    • The preprocessed code is fed into the C compiler (e.g., gcc, clang, or MSVC) to generate an assembly code or an intermediate representation.
    • The compiler checks the code for syntax errors, type checking, and optimization opportunities.
    • If errors are found, they are reported as compilation errors.
  3. Assembly (Optional):
    • Some compilers generate assembly code (human-readable low-level code) as an intermediate step. This step can be skipped if the compiler generates machine code directly.
  4. Linking:
    • In the case of multiple source files, each source file is compiled separately into an object file (e.g., .o or .obj).
    • The linker (e.g., ld on Unix-like systems) is responsible for linking these object files together along with any necessary system libraries to create the final executable.
    • It resolves references to functions or variables defined in other files and ensures that everything is properly connected.
  5. Output Generation:
    • The linker generates an executable file (e.g., an .exe on Windows or a binary file on Unix-like systems).
    • This executable file is what you can run to execute your C program.
  6. Execution:
    • You can execute the generated executable to run your program. For example, on the command line, you would typically type ./program (Unix-like systems) or program.exe (Windows) to run the program.

Here's a simplified diagram of the compilation process:

During these steps, the compiler and linker perform various optimizations to improve the program's performance and size. These optimizations can include removing dead code, inlining functions, and reordering instructions.

Understanding the compilation process is essential for diagnosing and fixing compilation errors, optimizing code, and managing dependencies when working with C programs.

Let's understand through an example:

helloWorld.c


#include 
int main() {
    printf("Hello dotnetustad.com");
    return 0;
}

Next, we'll generate a flowchart for the program described above:

In the flowchart depicted above, the program execution proceeds through the following sequence of steps:

  1. Initially, the input file, "helloWorld.c," is supplied to the preprocessor. The preprocessor's role is to transform the source code into an expanded version. The resulting file is referred to as "helloWorld.i."
  2. The expanded source code, "helloWorld.i," is then directed to the compiler. The compiler's task is to further process this expanded source code and convert it into assembly code. The assembly code generated carries the extension "helloWorld.s."
  3. Subsequently, the assembly code is transmitted to the assembler, which takes on the responsibility of translating the assembly code into object code.
  4. With the object code now available, the linker comes into play. The linker's function is to assemble the various object code files and produce the final executable file.
  5. Following the creation of the executable file, the loader takes over and loads this executable file into memory for execution. This marks the commencement of the program's execution.