Back to C++ Optimization Techniques
5: Compiler Optimization
A good compiler can have a huge effect on code performance. Most PC compilers are good, but not great, at optimization. Be aware that sometimes the compiler won't perform optimizations even though it can. The compiler assigns a higher priority to producing consistent and correct code than optimizing performance. Be thankful for small favors.
5.1: Compiler C Language Settings
The following table lists all of the MS Visual C++ 6.0 "C" optimizations for reference. Alternate methods are given when an optimization can be specified directly in the code. Microsoft default values for release builds are highlighted.
Name |
Option |
Description |
Blend |
/GB |
Optimize for 386 and above |
Pentium |
/G5 |
Optimize for Pentium and above |
Pentium Pro |
/G6 |
Optimize for PentiumPro and above |
Windows |
/GA |
Optimize for Windows (specifically access to thread-specific data) |
DLL |
/GD |
Not currently implemented. Reserved for future use. |
/Gd |
Caller cleans stack. Slow. Allows variable argument functions. Alternate: __cdecl |
|
Stdcall |
/Gz |
Callee cleans stack. Fast. No variable argument functions. Alternate: __stdcall |
/Gr |
Callee cleans stack. Uses registers. Fastest. No variable argument functions. Can't be used with _export. Alternate: __fastcall |
|
String pooling |
/Gf |
Put duplicate strings in one memory location. |
String pooling RO |
/GF |
Put duplicate strings in one read-only memory location. |
Stack probes off |
/Gs |
Turn off stack checking. Alternate: #pragma check_stack |
Func-level linking |
/Gy |
Linker only includes functions referenced in the OBJ rather than the entire contents |
Small |
/O1 |
Same as /Og /Oy /Ob1 /Gs /Gf /Gy /Os (global opts, omit frame ptr, allow inlines, stack probes off, func-level linking, favor code size over speed) |
Fast |
/O2 |
Same as /Og /Oy /Ob1 /Gs /Gf /Gy /Oi /Ot (global opts, omit frame ptr, allow inlines, stack probes off, func-level linking, favor code speed, intrinsic functions) |
No aliasing |
/Oa |
Assume no aliasing occurs within functions. Alternate: #pragma optimize("a") |
Intra-func aliasing |
/Ow |
Assume aliasing occurs across function calls. Alternate: #pragma optimize("w") |
Disable all opts |
/Od |
Turn off all optimizations |
Global opts |
/Og |
Turn on loop, common subexpression and register optimizations. Alternate: #pragma optimize("g") |
Intrinsic functions |
/Oi |
Replace specific functions with inline versions (memcpy, strcpy, strlen, etc.). Alternate: #pragma intrinsic/function |
Float consistency |
/Op |
Increase the precision of floating point operations at the expense of speed and size |
Small code |
/Os |
Favor code size over speed. Alternate: #pragma optimize("s") |
Fast code |
/Ot |
Favor code speed over size. Alternate: #pragma optimize("f") |
Full optimizations |
/Ox |
Enable the following: /Ob1 /Og /Oi /Ot /Oy /Gs |
Omit frame pointer |
/Oy |
Suppress creation of frame pointers on call stack. Frees the EBP register for other uses. Alternate: #pragma optimize("y") |
Struct packing |
/Zp8 |
Sets the structure member alignment. n=1,2,4,8(default),16. Smaller values generate smaller, slower code. Larger values generate larger, faster code. |
5.2: Compiler C++ Language Settings
The following table lists all of the Microsoft Visual C++ 6.0 "C++" optimizations for reference. Alternate methods are given when an optimization can be specified directly in the code. Microsoft default values for release builds are highlighted.
Name |
Option |
Description |
__declspec (novtable) |
Stops the compiler from generating code to initialize the vfptr in the constructor. Apply to pure interface classes for code size reduction. |
|
__declspec (nothrow) |
Stops the compiler from tracking unwindable objects. Apply to functions that don't throw exceptions for code size reduction. Same as using C++ throw() specification. |
|
/GR- |
Turn off run time type information. |
|
/GX |
Turn on exception handling. |
|
/Ob1 |
Allow functions marked inline to be inline. Alternate: inline, __forceinline, #pragma inline_depth/inline_recursion |
|
/Ob2 |
Inline functions deemed appropriate by compiler. Alternate: #pragma auto_inline/inline_depth/inline_recursion |
|
Ctor displacement |
/vd0 |
Disable constructor displacement. Choose this option only if no class constructors or destructors call virtual functions. Use /vd1 (default) to enable. Alternate: #pragma vtordisp |
Best case ptrs |
/vmb |
Use best case "pointer to class member" representation. Use this option if you always define a class before you declare a pointer to a member of the class. The compiler will issue an error if it encounters a pointer declaration before the class is defined. Alternate: #pragma pointers_to_members |
Gen. purpose ptrs |
/vmg |
Use general purpose "pointer to class member" representation (the opposite of /vmb). Required if you need to declare a pointer to a member of a class before defining the class. Requires one of the following inheritance models: /vmm, /vms, /vmv. Alternate: #pragma pointers_to_members |
5.3: The Ultimate Compiler Settings
The ultimate options for fast programs. Microsoft default values for release builds highlighted.
Name |
Option |
Description |
__declspec (novtable) |
Stops compiler from generating code to initialize the vfptr in the constructor. Apply to pure interface classes. |
|
__declspec (nothrow) |
Stops compiler from tracking unwindable objects. Apply to functions that don't throw exceptions. Recommend using the Std C exception specification throw() instead. |
|
Pentium Pro |
/G6 |
Optimize for PentiumPro and above (program might not run on Pentium) |
Windows |
/GA |
Optimize for Windows |
/Gr |
Fastest calling convention |
|
String pooling RO |
/GF |
Merge duplicate strings into one read-only memory location |
/GR- |
Turn off run time type information. |
|
Stack probes off |
/Gs |
Turn off stack checking |
/GX- |
Turns off exception handling (assumes program isn't using excptn handling) |
|
Func-level linking |
/Gy |
Only include functions that are referenced |
Assume no aliasing |
/Oa |
Assume no aliasing occurs within functions |
/Ob2 or /Ob1 |
Inline any function deemed appropriate by compiler or turn inlining on. Alternates: inline, __forceinline |
|
Global opts |
/Og |
Full loop, common subexpression and register optimizations |
Intrinsic functions |
/Oi |
Replaces specific functions with inline versions (memcpy, strcpy, etc.) |
Fast code |
/Ot |
Favor code speed over size (see notes below) |
Omit frame pointer |
/Oy |
Omit frame pointer |
Ctor displacement |
/vd0 |
Disable constructor displacement. |
Best case ptrs |
/vmb |
Use best case "pointer to class member" representation |
Be aware that some of these options can cause your program to fail. See the section
below on unsafe optimizations. There are also some optimizations that you might not choose to use for your specific application. For instance, if you're using RTTI or exception handling, don't turn those options off.Optimizing for space can actually be faster than optimizing for speed because programs optimized for speed are almost always larger, and therefore more likely to cause additional paging than programs optimized for space. In fact, all Microsoft device drivers and Windows NT itself are built to minimize space. Try both ways and see which is faster for your app.
5.4: Disable VTable Initialization
The Microsoft-specific
__declspec(novtable) option instructs the compiler not to initialize the virtual function pointer in the constructor of the given object. Normally, this would be a "bad thing." However, for abstract classes, there's no reason to initialize the pointer, because it will always be properly initialized when a concrete class derived from the object is constructed.By the way, this option is misnamed. It sounds like it removes the vtable itself, which isn't at all true. The option should be called noinitvtable. Now consider the following example objects. Image is a typical abstract class. Frame is derived from Image. ImageNV and FrameNV are the same as Image and Frame respectively, except ImageNV uses the novtable option.
The ImageNV constructor has two fewer instructions, namely the instructions that initialize the virtual function table. The optimized constructor is 30% faster. Microsoft's own ATL class library uses this compiler option extensively.
5.5: Indicate Functions that Don't Throw Exceptions
The Microsoft-specific
__declspec(nothrow) option instructs the compiler not to track unwindable objects as it normally would in case an exception is thrown and objects must be unwound on the stack.A more portable method is to use the Standard C++ exception specification
throw(). This indicates that the specified function will not throw an exception. Here are three example functions. MayThrow is a typical function. The compiler must assume that it could throw an exception. NoThrowMS is specified using nothrow. NoThrowStdC is specified using throw(). Calling NoThrowMS and NoThrowStdC are about 1% faster than calling MayThrow.
5.6: Use the Fastcall Calling Convention
The Microsoft Visual C++ compiler supports the following function calling conventions: cdecl, stdcall and fastcall. Fastcall is roughly 2% faster than cdecl on a typical function call. Use fastcall. Your program will thank you.
5.7: Warning: Unsafe Optimizations Ahead
Don't change your optimization settings recklessly. Although most settings would never cause your program to crash, there are some settings that should be used only when you know your code is conforming to Microsoft's recommendations for that particular setting.
The following table lists all potentially risky optimizations.
Name |
Option |
Notes |
Pentium |
/G5 |
Code won't run on 486 or below (use /GB instead) |
Pentium Pro |
/G6 |
Code won't run on Pentium or below (use /GB instead) |
String pooling |
/Gf |
If a string is modified, it will be modified for any variable that points to it |
String pooling RO |
/GF |
If a string is modified, memory exception occurs |
Stack probes off |
/Gs |
A stack overflow will crash the program without an overflow error |
Exception handling |
/GX- |
If exception handling is not enabled, an exception may crash the program |
Assume no aliasing |
/Oa |
If there is aliasing in the program, the optimization can cause corrupted data |
Inline expansion |
/Ob1 |
Inlines can cause unexpected code bloat and cache misses |
Inline any |
/Ob2 |
Inlines can cause unexpected code bloat and cache misses |
Intrinsic functions |
/Oi |
Intrinsic functions increase code size |
Float consistency |
/Op |
Resulting floating-point code will be larger and slower |
Ctor displacement |
/vd0 |
A virtual function may be passed an incorrect "this" pointer if it is invoked from within a constructor or destructor. |
Struct packing |
/Zpn |
Can cause compatibility problems if packing is modified |