Commit ba6ab46c authored by Jingning Han's avatar Jingning Han
Browse files

Optimze inv 16x16 DCT with 10 non-zero coeffs - P1

This commit is the first patch optimizing SSE2 implementation of inverse
16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
transformation. It exploits the fact that only top-left 4x4 block contains
non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.

The average runtime of idct16x16_10 unit is reduced from
883 cycles -> 779 cycles (12% faster).

For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
down from 310651 ms  -> 305910 ms. The decoding speed goes up from
80.37 fps -> 80.87 fps.

Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
parent 8fcb74e6
Branches
Tags
Showing with 53 additions and 109 deletions
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment