It has always been a big deal to compromise accuracy for performance. In most of the cases highly complex and time taking application will need highest accuracy. The accuracy has always been the problem in using GPU for performance improvement of such algorithms. Since the GPUs don’t support double precession arithmetic it looks hard to achieve high precision with it. CPU does floating point division using double precision arithmetic. But even the latest GPU from nVIDIA(8800) uses reciprocal multiplication with single precession for division.
There will be situations when you have to deal with very convoluted shapes. In such cases it becomes hard to settle down for floating point accuracy. In other way if CPU is used it might be impractical to get the algorithm working at real-time. In such cases the tough question comes. Can accuracy be compromised for performance?
If the answer is Yes!
If it is not so important to get the highest accuracy we can of course go for GPU. The massive computation power can be used to get the algorithm executed in real-time. It becomes an easy way of optimizing your algorithm by allowing a % of tolerance to the output. In this case you must be sure that the tolerance comes into a range which makes the algorithm usable.
If it is a Big No!
Here comes the problem. You have an algorithm which is not executable in real-time because of less computation power you have with available resource. You must settle for the single precision arithmetic with GPU. Now what can we do to improve performance with highest accuracy. An idea is to use both CPU and GPU for the execution. At first run all the parallel code using GPU and calculate the output. Then calculate the tolerance using a CPU version of the code which gives highest accuracy. Now do some very less amount of iterations of your algorithm using CPU to find the best value.
A case study
Suppose you have to do registration between two 3D surfaces. At first calculate the parameters needed for registering both surface using GPU. It may take a lot of iteration to find the rotation, translation, scaling and shearing parameters. When the correct registration parameters are found using GPU do some iteration with the CPU to find the best convergence. So now you can achieve performance improvement by doing more number of iterations in GPU. The accuracy is also good since we done a CPU based calculation at the end with the help of approximate parameters calculated using GPU. This strategy can be used in most of the cases where highest amount of accuracy is needed.