Optimised C++ routines for image up-conversion and down-conversion

Table of Contents | Up-conversion: Optimisation | Up-conversion: Optimisation: IO

Up-conversion

Optimising the conversion: Arithmetic

Arithmetic takes time so an easy way to speed up execution is to trim out the unnecessary calculations and optimise those that cannot be removed. If the code has been planned efficiently then there will be few calculations that can be easily deleted. If the result of a calculation is known before the code is compiled then constants or look-up tables could be used. Operations such as dividing and multiplying an integer by a power of two can be replaced with a shift operation which the PC is better equipped to perform.

C++ uses a different number of bytes to store different numerical variables - typically 4 bytes for integers or floats, 2 bytes for short integers and 1 byte for characters. Therefore it should take a different amount of time to perform calculations using these different variables. In general it is sensible to use a variable that has sufficient accuracy and will not overflow when used in calculation. Balancing these accuracy considerations with the number of bytes taken to hold the variable is a trade-off worth investigating.

In the case of the up-conversion code a quick saving is obtained by halving the number of multiplies performed when applying the filter to the image. As the filter is symmetrical there is the option to perform ten multiplication and ten addition operations or five multiplication and ten addition operations. This is because the taps are mirrored about the centre tap making it more efficient to sum pixels with identical taps before performing the multiply operation.

Using short integers resulted in code that executed roughly twice as fast as code using integers. However, I was unable to isolate memory access time and calculation time so the speed increase should be due to the fact that the PC can fit twice as many short integers in working memory (see IO section). Using these small containers for the calculation results introduced variable overflow issues and so they were avoided for the sake of more readable code.

I also tested the theory that floating-point calculations were slower than integer calculations but the results were inconclusive. Unfortunately the majority of execution time was spent with IO so changing the calculations did not make a huge difference. The integer calculations appeared to be slightly faster but the time saving was below the margin of error in the timing values. The final code uses integer values for the pixels as they make the code easier to understand and eliminate any type conversions.

It is sensible to assume that the filter taps will be in floating-point format, as most filter design software will output them in this form. The next optimisation involved scaling all the taps by a factor of 256 and rounding the result to the nearest integer. Now all the tap calculations involved integers and there were no type conversions. The scaling can easily be removed by shifting the resultant pixel value by one byte, which is an efficient operation. Once again the result of this change was not significant in terms of execution time but I believe it would be if the code spent more of its time doing computation. This would be the case for smaller images or when the code was run on a processor with a significantly larger cache.

SourceForge.net Logo