Coding Techniques That Can Improve Performance
- Use the _Import
keyword to identify data and function
imports from a DLL. This generates the most efficient
code. Avoid setting /qnoautoimported unless you really
need to generate objects without DLL considerations.
- Minimize the use of external (extern) variables to
improve aliasing information.
- Use the const qualifier whenever possible.
- Use static functions whenever possible. When declaring
C++ functions, use the const specifier whenever possible.
- Avoid taking the address of local variables. If you use a
local variable as a temporary variable and must take its
address, avoid reusing the temporary variable. Taking the
address of a local variable inhibits optimizations that
would otherwise be done on calculations involving that
variable.
- Avoid using short int (16-bit) values. Because all
integer arithmetic is done on long values, using short
values requires extra conversions. Also, 16-bit
operations are more costly to execute on x86 processors
than are 8-bit or 32-bit operations.
- Avoid using long long int types, except where absolutely
necessary. Extra instructions must be generated to
perform operations on such data types.
- Use unsigned types whenever possible. Faster code can be
generated for comparisons of unsigned types and for
division or modulo operations involving unsigned types.
- Use #pragma alloc_text and #pragma data_seg
to group code and data respectively, to improve the
locality of reference. Use #pragma alloc_text to group
functions that are used at the same time so that they can
be stored together. They might then fit on a single page
that can be used and then discarded. #pragma data_seg
works in a similar manner for grouping data.
- Make sure your data is aligned on a multiple of its size.
For example, align double types on an 8-byte boundary.
- Use constants instead of variables where possible. The
optimizer will be able to do a better job reducing
run-time calculations by doing them at compile-time
instead. For instance, if a loop body has a constant
number of iterations, use constants in the loop condition
to improve optimization; (for (i=0; i<4; i++)
can be better optimized than for (i=0; i<x; i++)).
- Avoid goto statements that jump into the middle of loops.
Such statements inhibit certain optimizations.
- Where possible, the most frequently accessed member of a
structure should be placed first within the structure.
Since no offset is needed to access the first member,
doing so can improve size and speed.
- Improve the predictability of your code by making the
fall-through path more probable. That is, code like if
(error) {handle error} else {real code}
should be written as if (!error) {real code} else
{error}.
- If one or two cases
of a switch are typically executed much more frequently
than other cases, break out those cases by handling them
separately before the switch statement.
- Inline your functions selectively. Inlined functions
require less overhead and are generally faster than a
function call. The best candidates for inlining are small
functions that are called frequently from a few places.
Large functions and functions that are called rarely may
not be good candidates for inlining.
Using automatic inlining, specifying the /Oi option with
a value, is not as effective as selective inlining.
Use try blocks for exception handling only when necessary
because they can inhibit optimization. Use the /Gx compiler option
to suppress the generation of exception handling code in
programs where it is not needed. Unless you specify this
option, some exception handling code is generated even
for programs that do not use catch or try blocks.
Avoid using overloaded operators to perform arithmetic
operations on user-defined types. The compiler cannot
perform the same optimizations for objects as it can for
simple types.
Avoid
performing a deep copy if a shallow copy
is all you require. For an object that contains pointers
to other objects, a shallow copy copies only the pointers
and not the objects to which they point. The result is
two objects that point to the same contained object. A
deep copy, however, copies the pointers and the objects
they point to, as well as any pointers or objects
contained within that object, and so on.
When you use the Collection classes from the IBM Open
Class Library to create classes, use a high level of
abstraction. After you establish the type of access to
your class, you can create more specific implementations.
This can improve performance with minimal code change.
- If you do not use the argc and argv arguments to main,
create a dummy _setuparg function that contains no code
(you will also need to specify the /EXTDICTIONARY linker
option).
- If you do not use the envp argument to main, create a
dummy _setupenv function that contains no code.
Some coding practices, although often necessary, will slow
down program performance:
- Using the setjmp and longjmp functions. These functions
involve storing and restoring the state of the thread.
- Using #pragma
handler. This #pragma causes code to be generated to
register and deregister an exception handler for a
function.
Calling
16-bit code. The compiler performs a number of
conversions to allow interaction between 32-bit and
16-bit code.

Overview of Optimization
Calling Conventions
Inlining

Optimize Your Application
Debug Optimized Code

Keywords in C and C++