ASSEMBLER LINKER AND LOADER




ASSEMBLER

An assembler is a program that converts assembly language into   machine code. 
It takes the basic commands and operations from assembly code and   converts them into binary code that can be recognized by a specific type   of processor.


The assembler reads the assembly language source code twice before it outputs object code. Each read of the source code is called a pass.

This is because assembly language source code often contains forward references. A forward reference occurs when a label is used as an operand, for example as a branch target, earlier in the code than the definition of the label. The assembler cannot know the address of the forward reference label until it reads the definition of the label.

During each pass, the assembler performs different functions. In the first pass, the assembler:

Checks the syntax of the instruction or directive. It faults if there is an error in the syntax, for example if a label is specified on a directive that does not accept one.

Determines the size of the instruction and data being assembled and reserves space.

Determines offsets of labels within sections.

Creates a symbol table containing label definitions and their memory addresses.
In the second pass, the assembler:

Faults if an undefined reference is specified in an instruction operand or directive.

Encodes the instructions using the label offsets from pass 1, where applicable.
Generates relocations.

Generates debug information if requested.

Outputs the object file.

Memory addresses of labels are determined and finalized in the first pass. Therefore, the assembly code must not change during the second pass. All instructions must be seen in both passes. Therefore you must not define a symbol after a :DEF: test for the symbol. The assembler faults if it sees code in pass 2 that was not seen in pass 1.

Line not seen in pass 1

The following example shows that num EQU 42 is not seen in pass 1 but is seen in pass 2:
    AREA x,CODE
    [ :DEF: foo
num EQU 42
    ]
foo DCD num
    END
Assembling this code generates the error:
A1903E: Line not seen in first pass; cannot be assembled.
Line not seen in pass 2
The following example shows that MOV r1,r2 is seen in pass 1 but not in pass 2:
    AREA x,CODE
    [ :LNOT: :DEF: foo
    MOV r1, r2
    ]
foo MOV r3, r4
    END
Assembling this code generates the error:
A1909E: Line not seen in second pass; cannot be assembled.





LINKER

In computer science, a linker is a computer program that takes one or more object files generated by a compiler and combines them into one, executable program.




How the Linker Works?

The compiler compiles a single high-level language file (C language, for example) into a single object module file. The linker (ld) can only work with object modules to link them together. Object modules are the smallest unit that the linker works with.

Typically, on the linker command line, you will specify a set of object modules (that has been previously compiled) and then a list of libraries, including the Standard C Library. The linker takes the set of object modules that you specify on the command line and links them together. Afterwards there will probably be a set of "undefined references". A reference is essentially a function call. An undefined reference is a function call, with no defined function to match the call.

The linker will then go through the libraries, in order, to match the undefined references with function definitions that are found in the libraries. If it finds the function that matches the call, the linker will then link in the object module in which the function is located. This part is important: the linker links in THE ENTIRE OBJECT MODULE in which the function is located. Remember, the linker knows nothing about the functions internal to an object module, other than symbol names (such as function names). The smallest unit the linker works with is object modules.

When there are no more undefined references, the linker has linked everything and is done and outputs the final application.







LOADER

Loader is the program of the operating system which loads the executable from the disk into the primary memory(RAM) for execution. It allocates the memory space to the executable module in main memory and then transfers control to the beginning instruction of the program .

  How the Loader Works?

Most  of the time the first call given after the compile execution of xlC compiler [while using the strace command]  is ‘execve()‘ which actually is the loader . 

This loader creates the process which involves:

Reading the file and creating an address space for the process.
Page table entries for the instructions, data and program stack are created and the register set is initialized.

Then, Executes a jump instruction to the first instruction of the program which generally causes a page fault and the first page of your instructions is brought into memory.