Commit 876c72a0 authored by levytamar82's avatar levytamar82
Browse files

AVX2 Convolve Optimization

Two convolve functions were optimized for AVX2:
1. vp9_filter_block1d16_h8
2. vp9_filter_block1d16_v8
vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of
loop strides by half, two strides were processed in parallel.
vp9_filter_block1d16_v8 was also optimized in the same way also some of the
loads were being done outside of the loop and by that preventing redundant
loads.
This Optimization gives 43% function level gain and 1.3% user level gain.
Now can be compiled in Windows

Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
parent 20d0f2b9
No related merge requests found
Showing with 588 additions and 3 deletions
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment