Tuesday, January 8, 2013

C++ Compiling

I’ve found that the college doesn’t really do a good job of describing what, exactly, is happening when you hit ‘enter’ after the “g++ filename.cpp” command has been typed in. This is a general post I hope to be able to point people at in the future, and will be a -very- broad overview of what is happening when a file gets compiled. I’m not going to talk about tokens or machine code or any of that; I’m going to cover, quickly, the most common preprocessor directives, and broadly what it is the linker does.

First, preprocessor directives. If it begins with a #, it’s a preprocessor directive. These directives are not C++ code; they’re instructions to the preprocessor, and as such, they are executed before actual proper compilation begins. The #include statement is essentially a copy and paste operation. The file named in the #include statement will have its entire contents copied, and then pasted into the file at the location of the #include. If you use #include “file”, with the double quotes, the preprocessor will start its search for the file in the current directory. If instead you use #include , the preprocessor will start looking at some location defined by your compiler, typically where your standard header files are at.

Another useful preprocessor directive is the #ifndef, #define, and #endif. The first one can be read as ‘if not defined’. This should be at the top of every one of your header files you make, and should be immediately followed by a #define statement. What this is telling the preprocessor is ‘if this hasn’t been defined yet, define it now’. Normally this definition will have a name similar to the name of the header file. So, for my header, nonsense.h, the full preprocessor directive should look like this:

#ifndef NONSENSE_H
#define NONSENSE_H

At the end of all the code in my header, I will put in a #endif. What this does is it makes sure I don’t accidentally try to compile the same header file more than once, even if it’s called by multiple files. So if I have main.cpp, nonsense.cpp, and whatever.cpp all with a #include “nonsense.h”, nonsense.h will still only be defined once. This is good, because nonsense.h should, as a header file, also have all my declarations in it, and C++ will get cranky ( read: not compile ) if it finds multiple declarations of the same thing.

There is another common use for #define. It is often used to create constants, for example #define PI 3.14. It’s worth noting that it’s not making an actual constant, like const int pi = 3.14. All it’s doing is forcing a substitution. When the preprocessor hits this particular #define, it will go through your code, and everywhere it sees PI, it replaces it with 3.14 instead. Remember, the preprocessor does not know C++. Be careful when doing this to not treat PI ( or whatever your #define is ) like a variable.

There are other uses for the preprocessor directives, but those are the most common ones. Now onto the linker. The linker is a promise keeper, of sorts. It checks the promises that you as a programmer have made, and makes sure that you’ve kept those promises in code. The promises you’ve -made- are your declarations. Keeping those promises happens in your definitions. A function declaration, for example, is this:

int factorial( int n );

That’s the promise. By making this declaration, you are promising the linker that later on, you will have a definition. Your definition might be something like this:

int factorial(int n ) {
if ( n == 1 ) return 1;
return ( n * factorial ( n - 1 ));
}

And that’s a promise kept. The linker also takes all the .o files generated during compilation and links them together into one executable file. Usually this will be static linking, where all the code actually exists in one executable file. However, you can also run across dynamic linking. Dynamic linking is complicated, and all I’ll say about it here is that when you’re using dynamic linking, the files will -not- all be compiled into a single executable file. Instead, there will be the executable file, and it will need some external code in order to run properly, usually in the form of DLLs ( dynamically linked libraries ) or SO ( shared object ) files.

No comments:

Post a Comment