Back to C++ Optimization Techniques
1: General Strategies
There's a right way and a wrong way to do optimization. Here's some strategies common to all programming endeavors that work and don't work.
1.1: Optimization Strategies that Bomb
- Assuming some operations are faster than others. When it comes to optimizing, never ever assume anything. Benchmark everything. Use a profiler. Even while I was doing examples for this paper, some of my "optimizations" turned out to be major duds.
- Reducing code to improve performance. Reducing code might improve performance; it might not. Increasing the amount of code will often improve performance. Loop unrolling is a prime example.
- Optimizing as you go. Big mistake. As software engineer Donald Knuth said, "premature optimization is the root of all evil." Optimization is one of the last steps of a project. Plan for it, but don't optimize too soon. If you do, you'll end up optimizing code that you either don't use or that doesn't need to be streamlined in the first place. However, there are some efficiency techniques you can use throughout your project from day one. These tips can make your code more readable and concise. I've listed them in the "
as you go" section.
Worrying about performance before concentrating on code correctness. Never! Write the code without optimizations first. Use the profiler to determine if it needs to be revised. Don't ignore performance issues; let performance issues guide your design, data structures, and algorithms, but don't let performance affect every aspect of your code. In a typical application, only a small percentage of the code requires optimization. In a game, it's usually the inner loops of the blitting, AI or physics routines.
1.2: Optimization Strategies that Work
- Set design goals for performance levels. How responsive should the app be? Be specific. Think in terms of concrete millisecond values. Put the values in the specs. What's the target frame rate? How will the program deal with Internet latency and bandwidth issues? In what portions of the application is efficiency critical, and in what portions is efficiency secondary?
- Choose a program architecture appropriate to the problem at hand. When you evaluate each option, whether it be the choice of using a single-threaded vs. multithreaded architecture, using a database vs. a flat file, using a custom engine or writing your own, be sure to consider efficiency.
- Select the proper data structures. Carefully evaluate whether you should use floating point or integer math, lists or vectors, hash tables or trees. The right data structures can make the difference between a great game and a dog. That's why the id team used BSP trees instead of Z-buffering in Quake. BSP trees best solved their particular problem.
- Choose the right algorithms. A linear search may be more appropriate than a binary search. Insertion sort may be faster than quicksort. See the discussion of swapping algorithms in the
next section to see how small algorithm variations can effect performance.
Be sure to concentrate on perceived performance. Focus on how the app feels, not the actual numbers. Perhaps increasing object velocity produces a better app than increasing the frame rate. Try it. Use progress bars, animations or other effects to hide level loading or other long processes. Be sure the program is responsive to player input, too. If your app runs at 60 fps but takes half a second to process mouse clicks, it's time to refocus your optimization efforts where it will make a difference.
Understand the costs of common programming operations and algorithms. Is integer division faster than multiplication or the other way around? Is floating-point math faster than integer math for your target platform? Knowing the answer can help you make your app run faster. See the costs of common operations.
Profile to find bottlenecks. Add your optimizations. Profile to find bottlenecks. Rinse and repeat. Don't settle on an optimization without verifying that it actually improves the program. Always run "before" and "after" benchmarks to evaluate optimizations. Store the results of each profiling run so you can compare the differences.
Evaluate near-final code. Don't evaluate the debugging version, and don't evaluate the release version too early - you'll end up modifying code that you'll throw out anyway.
Use your QA department (or person) to create and automate profiling runs. If you don't have a QA department, automate the profiling runs yourself. You'll never regret it.
1.3: Example of Selecting the Proper Algorithm: Swapping
Swapping objects is a common operation, especially during sorting. The standard C++ swapping function is very simple.
template <class T> void MySwap(T& a, T& b)
{
T t = a; // copy ctor
a = b; // assignment operator
b = t; // assignment operator
} // destructor (element t)
Swapping is so simple that we really only need a single function to handle it, right? Not necessarily. Often an object can provide its own swapping method that is considerably faster than calling the object's constructor, assignment operator (twice), and destructor. In fact, with STL, there are many specialized swap routines, including string::swap, list::swap, and so forth.
As you can see, calling STL swap is the same as using the MySwap algorithm above. However, for specialized classes, like strings, swapping has been optimized to be 6-7 times faster. For the bitmap object, which has extremely expensive constructors and assignment operators, the bitmap::swap routine is over 8000 times faster!
Back to C++ Optimization Techniques