Approximates division using multiply and shift.
Speeds up both sizes (8x8 and 16x16) by 30 times.
Fix the call sites to use the RTCD function.
Delete sse2 and mips implementation. They were based on a previous
implementation of the filter. It was changed in Dec 2015: