Table of Contents |
Up-conversion: Optimisation: IO | Down-conversion
A loop is designed to run a large number of operations quickly and the compiler can usually convert it into an efficient section of code. However, if there are branching statements such as if statements in the loop the C++ compiler is limited as to what it can do. There is no way of telling the compiler that one branch of execution is likely to happen 99.9% of the time and so the other path needn't be optimised. Branching statements are bad news if you want quick code so should be avoided if at all possible.
One way to avoid a branching statement is to break the original loop into sub-loops. This is the technique I have used to cope with image edges. When the filter is applied to the image it will access pixels either side of the one currently being calculated. If the pixel is at or near the edge of the image the filter will be hanging off the side and trying to access pixels that aren't there. In the case of C++ this will more than likely cause a segmentation fault.
The slow way of solving this problem would be to insert an if statement or two into the code to detect when the loop is nearing the image edge. The optimised routines described in this technical note use three loops instead of one. The first loop deals with the starting edge of the column/line and the third loop deals with the ending edge. The middle loop deals with everything in between. Where a pixel does not exist the code substitutes the pixel from the corresponding edge. Unfortunately this means there has to be some 'decision' arithmetic and this is done using ternary operators that are inline and hopefully faster.
Different compilers perform different optimisations so I tried another of the free compilers to compare execution time. The ICC compiler from Intel was tried but this yielded execution times around 15% longer than with GCC. The compiler options for ICC were specified in terms of which Pentium architecture the code was running on but the test PC was an Athlon machine.
The GCC compiler performs several optimisations but in the case of up-conversion the most important one is loop un-rolling. At the end of every cycle of a loop the CPU needs to evaluate a statement to decide whether to loop again or exit. Loop un-rolling changes a loop so that there aren't as many cycles and hence the loop condition is evaluated fewer times. For example a 20-cycle loop that contains one operation can be un-rolled into a 5-cycle loop that contains four operations. Obviously loop unrolling does not work for all loops but I verified its effectiveness for my code by compiling the code with all the optimisations switched on and compiling the code with just loop un-rolling switched on. The fully optimised version ran three times faster than the default version and the loop un-rolled version ran almost twice as fast as the default version.