CoreServices/CoreServices.h includes CarbonCore/CarbonCore.h includes CarbonCore/Timer.h |
This eases the burden on the programmer: rather than hunt down individual header files, you only need to include one framework/header combination. You can include other frameworks as needed.
The top of the file contains declarations for several global variables, including the timer proc that gets installed in the Time Manager queue. Once the application runs its course (five iterations), it disposes of the timer proc rather than leave it installed.
main first calls an init function to setup the timer task, then loops waiting for a flag to set. After disposing of the timer proc, main returns.
The timer proc (MyTimerProc) prints the current time and increments the counter. If the counter is below its limit, MyTimerProc re-primes the timer task, otherwise it sets the done flag.
MyInit sets up the timer task, installs it, then primes it the first time.
You could write this application to instead provide a user interface with windows, menus, and so on. Simply include the appropriate frameworks in the source code files.
#include <CoreServices/CoreServices.h>
void MyInit( void );
void MyTimerProc( TMTaskPtr tmTaskPtr );
Boolean gQuitFlag = false;
int gCount = 0;
TimerUPP gMyTimerProc = NULL;
int main( int argc, char *argv[])
{
MyInit();
while ( false == gQuitFlag ) {
;
}
DisposeTimerUPP( gMyTimerProc );
return 0;
}
void MyTimerProc( TMTaskPtr tmTaskPtr )
{
DateTimeRec localDateTime;
GetTime( &localDateTime );
printf( "MyTimerProc at %d:%d:%d\n", localDateTime.hour,
localDateTime.minute, localDateTime.second );
gCount++;
if ( gCount > 4 )
{
gQuitFlag = true;
}
else
{
PrimeTimeTask( ( QElemPtr ) tmTaskPtr, 1000 );
}
}
void MyInit( void )
{
struct TMTask myTask;
OSErr err = 0;
gMyTimerProc = NewTimerUPP( MyTimerProc );
if ( gMyTimerProc != NULL )
{
myTask.qLink = NULL;
myTask.qType = 0;
myTask.tmAddr = gMyTimerProc;
myTask.tmCount = 0;
myTask.tmWakeUp = 0;
myTask.tmReserved = 0;
err = InstallTimeTask( ( QElemPtr )&myTask );
if ( err == noErr )
PrimeTimeTask( ( QElemPtr )&myTask, 1000 );
else {
DisposeTimerUPP( gMyTimerProc );
gMyTimerProc = NULL;
gQuitFlag = true;
}
}
}
Mac OS X frameworks allow you to add system features and user interface capabilities to your applications. The frameworks are arranged hierarchically, so you only need to include in your source code the top-level framework of interest, rather than sub-frameworks underneath. When invoking GCC, the -framework linker option may be fed to the compiler, which will pass the option to the linker. The following GCC invocation compiles test.c, links against the CoreServices framework, and generates the executable output file test, as specified by the -o flag.
% gcc -framework CoreServices -o test test.c
To run the executable, invoke it by name. This example runs the file test in the current directory. The "./" specifies that the path to the command starts in the current directory, and "test" is the name of the file to execute. The output appears in the Terminal window.
% ./test MyTimerProc at 16:41:27 MyTimerProc at 16:41:28 MyTimerProc at 16:41:29 MyTimerProc at 16:41:30 MyTimerProc at 16:41:31 %
Here are brief descriptions of several GCC command-line options:
The verbose flag (-v) displays details of each command executed by GCC.
% gcc -v Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs Thread model: posix gcc version 3.3 20030304 (Apple Computer, Inc. build 1640) %
This example adds -v to the Carbon build example. Note the options passed to the linker as part of the ld invocation, near the bottom of the listing.
% gcc -v -framework CoreServices -o test test.c
Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
Thread model: posix
gcc version 3.3 20030304 (Apple Computer, Inc. build 1640)
/usr/libexec/gcc/darwin/ppc/3.3/cc1 -quiet -v -D__GNUC__=3 -D__GNUC_MINOR__=3
-D__GNUC_PATCHLEVEL__=0 -D__APPLE_CC__=1640 -D__DYNAMIC__ test.c -fPIC -quiet
-dumpbase test.c -auxbase test -version -o /var/tmp//ccPmjEgo.s
GNU C version 3.3 20030304 (Apple Computer, Inc. build 1640) (ppc-darwin)
compiled by GNU C version 3.3 20030304 (Apple Computer, Inc. build 1640).
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "/usr/ppc-darwin/include"
ignoring nonexistent directory "/Local/Library/Frameworks"
#include "..." search starts here:
#include <...> search starts here:
/usr/include/gcc/darwin/3.3
/usr/include
End of search list.
Framework search starts here:
/System/Library/Frameworks
/Library/Frameworks
End of framework search list.
/usr/libexec/gcc/darwin/ppc/as -arch ppc -o /var/tmp//ccpC3PmB.o
/var/tmp//ccPmjEgo.s
ld -arch ppc -dynamic -o test -lcrt1.o -lcrt2.o -L/usr/lib/gcc/darwin/3.3
-L/usr/lib/gcc/darwin -L/usr/libexec/gcc/darwin/ppc/3.3/../../..
-framework CoreServices /var/tmp//ccpC3PmB.o -lgcc -lSystem |
c++filt3
%
The following GCC invocation compiles test.c but does not invoke the linker; the -c flag stops the process after compilation.
% gcc -c test.c
The output file by default will be named test.o. If you need a different file, then specify the -o flag followed by a file name.
If you choose to stop the build process after compilation, you can then link by invoking the compiler again but allowing it to continue after the compilation step. This is the recommended approach. If you invoke the linker manually you may spend a lot of time getting the linker flags correct.
The -g flag instructs GCC to include debug info for use when running GDB.
% gcc -g -c test.c
Warning flags, including -W and the pickier -Wall, display warnings regarding code that is not technically in error, but that may cause problems. For example, removing the explicit cast in the call to PrimeTimeTask in the Carbon Example results in a warning:
PrimeTimeTask( tmTaskPtr, 1000 );
% gcc -c -W test.c
test.c: In function `MyInit':
test.c:63: warning: passing arg 1 of `PrimeTimeTask' from incompatible
pointer type
%
The -framework linker option was discussed above. This example links against the CoreServices framework, and invokes the compiler in verbose mode (-v) so you can see the generated linker call. The output file will be named test, and the input to this step is the file test.o.
% gcc -v -framework CoreServices -o test test.o
Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
Thread model: posix
gcc version 3.3 20030304 (Apple Computer, Inc. build 1640)
ld -arch ppc -dynamic -o test -lcrt1.o -lcrt2.o -L/usr/lib/gcc/darwin/3.3
-L/usr/lib/gcc/darwin -L/usr/libexec/gcc/darwin/ppc/3.3/../../..
-framework CoreServices test.o -lgcc -lSystem |
c++filt3
%
To link against multiple frameworks, include each with its own -framework flag:
-framework CoreServices -framework Carbon
You can change compiler versions using gcc_select.
sudo /usr/sbin/gcc_select <version: 2, 3 or 3.x>
% sudo /usr/sbin/gcc_select 3.1
Default compiler has been set to:
Apple Computer, Inc. GCC version 1256, based on gcc version 3.1 20021003
(prerelease)
%
The -l flag lists available compiler versions.
% gcc_select -l Available compiler versions: 2.95.2 3.1 3.3 3.3-fast %
Run gcc_select -h to view additional options.
The compiler version matters because changes to the Application Binary Interface since Mac OS X v10.0 have rendered C++ and Objective-C++ executables incompatible with earlier releases. However, C and Objective-C programs still run the same. To build for 10.1 and earlier you must use version 2 of GCC for C++/Obj-C++ applications and kernel extensions; version 2.95.2 was the GCC final release that shipped with Mac OS X v10.1. GCC version 3.1 shipped with Mac OS X v10.2 Jaguar, and 3.3 with Mac OS X v10.3 Panther. Note that if you mix languages in the application you should rebuild using the appropriate compiler version: 2.95.2 for Mac OS X v10.0 and v10.1, 3.1 for v10.2 Jaguar, and 3.3 for v10.3 Panther.
This list is not exhaustive. Look at the man gcc pages or one of the recommended references at the end of this article for additional flags.
Makefiles help automate the build process. A makefile is typically named makefile or Makefile, and contains commands for GCC regarding various targets and their dependencies. You can change the commands in the file and, once it is working, not worry about forgetting a flag or option. This is very useful in the middle of the night when you are tired and likely to make mistakes. Since GCC command-line entries can get very long you are less likely to invoke it incorrectly.
A makefile contains a set of targets. Each target may be dependent on other targets. Each target also includes a command-line invocation preceded by a <tab> character. Here is the syntax:
# Comments begin with a '#'.target-name: [dependency_1dependency_2...]command[flags]input-file(s)another-target-name:dependencycommand[flags]input-file(s)
The following example incorporates the test.c file used to generate the code optimization samples. The first target, named test, depends on the target test.obj. If the output generated by target test.obj is newer than test, or is not a file, then the make utility will run the appropriate command-line. In this case, invoke GCC and generate a file named test, using the file test.o as input.
Where does test.o come from? It is the output generated by the test.obj target. The input to target test.obj is the file test.c. You can specify an output file using the -o option, or let GCC name the output file using the pattern input-filename.o (lowercase letter 'o').
# The target named test depends on target test.obj. # The command for target test: # 1. invokes gcc, # 2. generates a file named test (no extension) as output, and # 3. uses file test.o as input. # test: test.obj gcc -o test test.o # # The next target name is test.obj. That is only its name. # The name does not have to relate to what the target actually builds. # # This target builds object files from source. # The command for test.obj instructs gcc to: # 1. stop after compilation (no linking), # 2. print verbose output, and # 3. use the file test.c as input. # # The default output file name here will be test.o. # test.obj: test.c gcc -c -v test.c # # Remove unwanted binaries, both the file named test and any .o files. # clean: rm test *.o # # Generate assembly files. Useful for debugging. # asm0: gcc -v -S -O0 -o test.O0.s test.c asm1: gcc -v -S -O1 -o test.O1.s test.c asm2: gcc -v -S -O2 -o test.O2.s test.c asm3: gcc -v -S -O3 -o test.O3.s test.c
You invoke the make utility and specify the target, shown here from within a Terminal session. Any messages will appear in the window.
$ make test
gcc -c -v test.c
Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
Thread model: posix
gcc version 3.3 20030304 (Apple Computer, Inc. build 1640)
/usr/libexec/gcc/darwin/ppc/3.3/cc1 -quiet -v -D__GNUC__=3 -D__GNUC_MINOR__=3
-D__GNUC_PATCHLEVEL__=0 -D__APPLE_CC__=1640 -D__DYNAMIC__ test.c -fPIC
-quiet -dumpbase test.c -auxbase test -version -o /var/tmp//ccmr23gL.s
GNU C version 3.3 20030304 (Apple Computer, Inc. build 1640) (ppc-darwin)
compiled by GNU C version 3.3 20030304 (Apple Computer, Inc. build 1640).
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "/usr/ppc-darwin/include"
ignoring nonexistent directory "/Local/Library/Frameworks"
#include "..." search starts here:
#include <...> search starts here:
/usr/include/gcc/darwin/3.3
/usr/include
End of search list.
Framework search starts here:
/System/Library/Frameworks
/Library/Frameworks
End of framework search list.
/usr/libexec/gcc/darwin/ppc/as -arch ppc -o test.o /var/tmp//ccmr23gL.s
gcc -o test test.o
$
If the make script is stored in a file named something other than makefile, pass the filename as a flag. For example, if the file was instead named test.mk, use:
make -f test.mk test
The first step is to compile your code with the -g flag: this includes GDB information in the object files. Without it you will get strange errors when you try to use GDB to run your executable.
You invoke GDB using the gdb command along with the name of the executable to debug. GDB prints a banner, after which it is ready to accept commands. The example used here is the Carbon application discussed earlier.
% gdb test GNU gdb 5.3-20030128 (Apple version gdb-309) (Thu Dec 4 15:41:30 GMT 2003) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "powerpc-apple-darwin". Reading symbols for shared libraries ... done (gdb)
The best thing to start with is a breakpoint. This command sets a breakpoint on line 1.
(gdb) break 1 Breakpoint 1 at 0x1bd8: file test.c, line 1. (gdb)
Begin execution by typing run. GDB prints information about what it is doing, then proceeds to and stops at the breakpoint.
(gdb) run
Starting program: /ktree/test
[Switching to thread 1 (process 1254 thread 0x1603)]
Reading symbols for shared libraries ......... done
Breakpoint 1, main (argc=795571314, argv=0x65652f74) at test.c:12
12 {
(gdb)
The step command steps into a function.
(gdb) step MyInit () at test.c:47 47 OSErr err = 0; (gdb)
Use the print command to view a variable value.
(gdb) print err $1 = 0 (gdb)
Step over a function using next.
(gdb) next
main (argc=1, argv=0xbffffbb0) at test.c:13
13 MyInit();
(gdb) next
15 while ( false == gQuitFlag ) {
(gdb)
Next, set a breakpoint in the callback (line 28), then continue execution. GDB pauses when execution reaches the breakpoint. Check the date/time value. Note that GDB does its best when displaying the structure fields. The structure looks a bit strange, so check the data type of localDateTime using the whatis command. This action may be performed on any variable that is in scope, and is useful if you do not want to jump back to a code editor window and look at the source code.
(gdb) break 28
Breakpoint 2 at 0x2b8c: file test.c, line 28.
(gdb) continue
Continuing.
[Switching to process 1080 thread 0x1203]
Breakpoint 2, MyTimerProc (tmTaskPtr=0xbffffd10) at test.c:28
28 GetTime( &localDateTime );
(gdb) print localDateTime
$3 = {
year = 0,
month = 119,
day = 25967,
hour = 7018,
minute = 0,
second = 0,
dayOfWeek = 0
}
(gdb) whatis localDateTime
type = DateTimeRec
(gdb)
Step through the next couple of lines and view the formatted output, which looks correct.
(gdb) step
30 printf( "MyTimerProc at %d:%d:%d\n", localDateTime.hour,
localDateTime.minute, localDateTime.second );
(gdb) step
MyTimerProc at 14:45:21
32 gCount++;
(gdb)
Use the where command to view a stack trace:
(gdb) where #0 MyTimerProc (tmTaskPtr=0xbffffd10) at test.c:32 #1 0x902be7e8 in TimerThread () #2 0x900246e8 in _pthread_body () (gdb)
Change a variable value using set. This example sets gCount past its threshold value and cause the program to terminate prematurely. Notice that the source code following each line number is the next line to be executed, not the last line executed.
(gdb) print gCount $4 = 0 (gdb) step 34 if ( gCount > 4 ) (gdb) print gCount $5 = 1 (gdb) set gCount = 5 (gdb) step 36 gQuitFlag = true; (gdb) continue Continuing. Program exited normally. (gdb)
Here is the conventional way to stop debugging and exit GDB:
(gdb) stop (gdb) quit
Code optimization may be performed by the programmer, the compiler, or the runtime environment. This section focuses on optimizations that GCC can perform at build time. The typical tradeoff is to choose smaller code size over faster execution speed, or vice versa. It is impossible to fully optimize for both at the same time, though GCC does its best, as do other compilers. When in doubt, it may be better to optimize for size, since smaller code may execute relatively faster. For example, large functions or loops containing data access patterns that do not exhibit a strong locality of reference may not fit into a processor's cache lines, which can lead to cache misses and subsequent fetches from memory. Smaller functions or loops with local data access stand a better chance of fitting in a given cache line and requiring fewer memory accesses.
Another reason for looking at optimization settings is because often developers debug without optimization enabled, then release an optimized version to the public. Being familiar (not necessarily intimate) with the assembly listing of your program under various optimization settings may help you determine where execution failed when you receive the occasional crash report.
Remember that optimized code typically bears little resemblance to the original source code. This makes it difficult to look at an assembly listing for an optimized program and determine the flow of control. It can be nearly impossible to to look at a crash log and determine the point in an optimized program where a problem occured, unless you have symbols included, which is not typically the case.
Unoptimized code follows the original source code directly, making it easier to debug. In fact, you will have better luck first debugging the code and then optimizing it, rather than the other way around. Trying to do both simultaneously is also a bad idea.
You can use the -S GCC option to stop the compilation process before running the assembler. The following command generates an unoptimized (level 0) output file named test.O0.s from input file test.c. You can then dissect the assembly code in the file using a text editor.
gcc -S -O0 -o test.O0.s test.c
Several of the common options are discussed here. The GCC manual contains additional information regarding optimization settings. The source code and assembly listings are available in the Optimization Example (88KB) folder.
Using an optimization flag of -O0 turns off optimization. This is the best setting when debugging code the first time, and maybe beyond. You should use this setting to generate a baseline build from which to start your debugging and subsequent performance analysis efforts. The machine instructions map easily to the source code, so to twist the WYSIWYG acronym a bit, "what you wrote is what you get" in the debugger. Several Level 0 examples are provided for reference in the following discussions.
GCC attempts to both reduce the code size and execution time. Only certain types of optimizations apply here. Register allocation attempts to place as many variables in registers as will fit, for faster access and fewer load/store instruction pairs.
This function generates the accompanying machine instructions under Level 0 and Level 1:
void arrayAssignmentLoop( void ) {
unsigned int count = 10;
unsigned int array[ 10 ], item = 0;
do {
array[ item++ ] = count;
count--;
} while ( count > 0 );
}
With no optimization (-O0) enabled, the value for count is stored and updated on the stack.
_arrayAssignmentLoop: stmw r30,-8(r1) stwu r1,-128(r1) mr r30,r1 li r0,10 stw r0,32(r30) ; count stored at 32 bytes off the SP li r0,0 stw r0,96(r30) L11: addi r11,r30,96 lwz r9,0(r11) mr r0,r9 slwi r2,r0,2 addi r0,r30,32 add r2,r2,r0 addi r2,r2,16 lwz r0,32(r30) stw r0,0(r2) addi r9,r9,1 stw r9,0(r11) lwz r2,32(r30) ; Load count into r2 addi r0,r2,-1 ; Decrement count stw r0,32(r30) ; Store count back on the stack lwz r0,32(r30) ; Load count for comparison cmpwi cr7,r0,0 bne cr7,L11 lwz r1,0(r1) lmw r30,-8(r1) blrEnabling Level 1 optimization (
-O1) moves those values to
registers. It eliminates the need for the variable count, loading
and using the count register instead.
_arrayAssignmentLoop:
li r0,10 ; max stored in r0
mtctr r0 ; Move 10 to count register
; var count has been optimized away
li r2,0
addi r9,r1,-64
L10:
slwi r0,r2,2
mfctr r11
stwx r11,r9,r0
addi r2,r2,1
bdnz L10 ; Decrement count register and branch if not zero
blr
GCC applies additional optimizations but excludes loop
unrolling and implicit function inlining, both of which reduce
execution time but increase code size. You can use the
inline keyword to indicate functions that should be
inlined and GCC will make a determination on whether to perform
inlining. Common subexpression elimination, strength reduction,
and loop optimizations are also performed. (See definitions in
the following section, Specific Optimizations.)
For example, this source code generates the accompanying machine instructions under Level 0 and Level 2:
unsigned int doWhileWithReturn( void ) {
unsigned int i = 100;
unsigned int result = 0;
unsigned int a = 31, b = 2, c = 99;
do {
result += a * b;
c = a * b;
} while ( i-- > 0 );
c = a * b;
return result;
}
Here is the unoptimized code (the -O0 option):
_doWhileWithReturn: stmw r30,-8(r1) ; save non-volatile registers stwu r1,-80(r1) ; SP update mr r30,r1 ; i li r0,100 stw r0,32(r30) li r0,0 ; result stw r0,36(r30) li r0,31 ; a stw r0,40(r30) li r0,2 ; b stw r0,44(r30) li r0,99 ; c stw r0,48(r30) L16: lwz r2,40(r30) ; load a into r2 lwz r0,44(r30) ; load b into r0 mullw r2,r2,r0 ; multiply a and b, store in r2 lwz r0,36(r30) ; load result add r0,r0,r2 ; add product to result stw r0,36(r30) ; store new value of result lwz r2,40(r30) lwz r0,44(r30) mullw r0,r2,r0 ; multiply a and b, store in r0 stw r0,48(r30) ; store new value of c lwz r2,32(r30) ; load i addi r0,r2,-1 ; subtract 1 from i mr r2,r0 stw r2,32(r30) ; store i li r0,-1 cmpw cr7,r2,r0 ; compare i to -1, update condition register bne cr7,L16 ; loop if i > 0 (branch to label l16) lwz r2,40(r30) lwz r0,44(r30) mullw r0,r2,r0 ; multiply a and b, store in r0 stw r0,48(r30) ; store new value of c lwz r0,36(r30) ; Load result mr r3,r0 ; Move result to r3 lwz r1,0(r1) ; Restore SP lmw r30,-8(r1) ; Restore registers blr ; Return
The -O2 option optimizes most of the loop and eliminates unused variables:
_doWhileWithReturn:
li r0,101 ; Load count register with 101
mtctr r0
L21: ; Loop has been almost completely optimized away
bdnz L21 ; Decrement count register and branch to label L21
; if not zero
li r3,6262 ; Load result value of 6,262 into r3
blr
GCC applies additional optimizations including implicit inlining, or inlining of functions not marked with the keyword inline. This is a general-purpose speed optimization setting.
This mode, invoked by -fast (for C and Objective-C; use -fastf for C++ and Objective-C++), packages a number of optimizations that target the G5. This mode generates faster, though probably larger, code. It will unroll loops, transpose nested loops (change the access order to improve locality of reference), convert loop initialization to memset calls, and inline library calls.
This setting aggressively inlines functions. For example, here is a main function that, aside from a few variable assignments, simply calls other functions.
int main( void ) {
unsigned int result;
double doubleResult;
arrayAssignmentLoop();
result = doWhileWithReturn();
printf( "doWhileWithReturn returned %d\n", result );
doubleResult = doubleTest();
printf( "doubleTest returned %lf\n", doubleResult );
return 0;
}
The unoptimized version calls each function:
_main: mflr r0 stmw r30,-8(r1) stw r0,8(r1) stwu r1,-96(r1) mr r30,r1 bcl 20,31,"L00000000001$pb" "L00000000001$pb": mflr r31 bl L_arrayAssignmentLoop$stub ; Branch to arrayAssignmentLoop bl L_doWhileWithReturn$stub ; Branch to doWhileWithReturn mr r0,r3 stw r0,64(r30) addis r3,r31,ha16(LC0-"L00000000001$pb") la r3,lo16(LC0-"L00000000001$pb")(r3) lwz r4,64(r30) bl L_printf$stub bl L_doubleTest$stub ; Branch to doubleTest ...
The -fast optimized version has inlined the calls to arrayAssignmentLoop and doWhileWithReturn:
_main: mflr r2 li r3,2 ; Begin inlined and unrolled arrayAssignmentLoop li r11,10 li r10,9 li r9,8 li r8,7 li r7,6 li r6,5 li r5,4 stw r2,8(r1) stwu r1,-128(r1) li r4,3 stw r3,96(r1) stw r11,64(r1) stw r10,68(r1) stw r9,72(r1) stw r8,76(r1) stw r7,80(r1) stw r6,84(r1) stw r5,88(r1) stw r4,92(r1) ; End of arrayAssignmentLoop li r3,1 li r2,99 stw r3,100(r1) .p2align 4,,15 L149: ; Top of loop for doWhileWithReturn cmpwi cr0,r2,3 addi r2,r2,-4 bne cr0,L149 ; Branch to top of loop lis r5,ha16(LC1) li r4,6262 ; Result of doWhileWithReturn la r3,lo16(LC1)(r5) bl L_printf$stub bl _doubleTest ; Branch to doubleTest ...
You can force smaller code size using the -Os flag. Smaller code may be a good choice because it can reduce cache misses and paging.
This example compares doubleTest under both -fast and -Os. Here is the source code:
double doubleTest( void ) {
const unsigned int limit = 100;
double array[ limit ][ limit ], sum = 0;
unsigned int i, j;
for ( i = 0; i < limit; i++ ) {
for ( j = 0; j < limit; j++ ) {
array[ i ][ j ] = i;
sum += array[ i ][ j ];
}
}
return sum;
}
Under -fast the inner loop gets unrolled, resulting in 10 each of the instructions stfd (store double precision floating-point) and fadd (floating-point add double precision).
_doubleTest: ... L117: ; Top of outer loop rldicl r5,r11,0,32 li r9,0 add r2,r10,r8 std r5,32(r30) lfd f2,32(r30) fcfid f0,f2 .p2align 4,,15 L116: ; Top of inner loop fadd f11,f1,f0 addi r9,r9,10 stfd f0,0(r2) stfd f0,8(r2) stfd f0,16(r2) stfd f0,24(r2) cmplwi cr0,r9,99 stfd f0,32(r2) stfd f0,40(r2) stfd f0,48(r2) stfd f0,56(r2) stfd f0,64(r2) stfd f0,72(r2) addi r2,r2,80 fadd f10,f11,f0 fadd f9,f10,f0 fadd f8,f9,f0 fadd f7,f8,f0 fadd f6,f7,f0 fadd f5,f6,f0 fadd f4,f5,f0 fadd f3,f4,f0 fadd f1,f3,f0 ble cr0,L116 ; Branch to top of inner loop addi r11,r11,1 addi r10,r10,800 cmplwi cr1,r11,99 ble cr1,L117 ; Branch to top of outer loop ...
Under -Os the inner loop is much tighter, with only 1 store and add pair per iteration. A profiler can help you determine whether this executes quicker than the -fast version.
_doubleTest: ... L41: ; Top of outer loop stw r9,36(r30) li r8,100 stw r10,32(r30) mtctr r8 lfd f0,32(r30) add r2,r11,r0 fsub f0,f0,f13 L46: ; Top of inner loop stfd f0,0(r2) ; Store followed by add fadd f1,f1,f0 addi r2,r2,8 bdnz L46 ; Branch to top of inner loop addi r9,r9,1 addi r11,r11,800 cmplwi cr7,r9,99 ble+ cr7,L41 ; Branch to top of outer loop ...
Expand a loop to include two or more iterations before checking the loop conditional. Use the flag -floop-optimize.
Functions with a line count below a certain threshold may have their instruction sequence substituted for the corresponding function call. Since function calls involve overhead for stack frame setup, this may result in faster (though longer) code. Turn on inlining using -finline-functions.
Replace expensive operations with simpler operations. Use -fstrength-reduce.
For example, this loop:
for ( i = 0; i < 1000; i++ )
sum += i * 5;
generates this unoptimized code:
li r0,0 stw r0,40(r30) ; sum li r0,0 stw r0,32(r30) ; i ... ; Top of loop lwz r0,32(r30) ; Load i into r0 mulli r2,r0,5 ; Multiply r0 by 5, place result in r2 lwz r0,40(r30) ; Load sum add r0,r0,r2 ; Add r2 to sum stw r0,40(r30) ; Store new sum ... ; Update i and branch to top of loop
This optimized version replaces the multiplication in the loop with an add:
li r3,0 ; sum li r2,0 ; i L30: add r3,r3,r2 ; Update sum addi r2,r2,5 ; Add 5 to r2 each iteration: 5 + 5 + 5 ... bdnz L30 ...
Remove unused code from the final image, reducing its size. This is currently an experimental feature in GCC: use the flag -fssa-dce.
Replace multiple references to the same expression with the result of that expression (calculated once). Several variants exist, but the basic version is -fgcse.
An expression whose value does not change between loop iterations may be move out of the loop. The flag -floop-optimize handles this and other loop optimizations.
Scheduling for a particular processor may result in faster execution on such a system, though running the same code on other processors may be less efficient.
In this mode the compiler has the entire application (all compilation units) in view when determining what and where to inline. Cross module inlining is enabled by two things: -fast and the inclusion of all compilation units on a single compiler invocation command line.
With Feedback Directed Optimization (FDO) the compiler uses a runtime profile of the application in order to make inlining and hot/cold code location decisions. It does this via a three step process:
-fast with -fcreate-profile specified;-fast with -fuse-profile to generate the FDO based optimization.Additional optimization settings are described in the GCC manual.
This article touched on the fundamentals of GCC, optimization, make, and GDB. The following resources provide additional information about these tools on Mac OS X:
man pages for the gcc, gdb, and make commands.Also try these non-ADC resources:
If you want to keep up with GCC on other platforms try the GCC home page. Keep in mind that Apple-specific builds and docs are not available on this site.
Posted: 2004-07-12