1. 04 Jun, 2015 1 commit
    • Johann's avatar
      Make vp9 subpixel match vp8 · eb88b172
      Johann authored
      The only difference between the two was that the vp9 function allowed
      for every step in the bilinear filter (16 steps) while vp8 only allowed
      for half of those. Since all the call sites in vp9 (<< 1) the input, it
      only ever used the same steps as vp8.
      
      This will allow moving the subpel variance to vpx_dsp with the rest of
      the variance functions.
      
      Change-Id: I6fa2509350a2dc610c46b3e15bde98a15a084b75
      eb88b172
  2. 15 May, 2015 1 commit
  3. 14 Aug, 2014 1 commit
    • levytamar82's avatar
      32 Align Load bug · efdfdf57
      levytamar82 authored
      In the sub_pixel_avg_variance the parameter sec was also aligned load and
      changed to unaligned.
      
      Change-Id: I4d4966e0291059ea4d705baed1503dc58444fcb7
      efdfdf57
  4. 08 Aug, 2014 1 commit
    • levytamar82's avatar
      Fix bug 807 · 69a5f5ec
      levytamar82 authored
      in the sub_pixel_*variance* function the dst is aligned to 16 bytes and not
      to 32 bytes - now load unaligned data
      
      Change-Id: I2e0b9745543697efc56fefa32857ea10117af135
      69a5f5ec
  5. 01 Mar, 2014 1 commit
    • levytamar82's avatar
      AVX2 SubPixel AVG Variance Optimization · ea149096
      levytamar82 authored
      Optimizing 2 functions to process 32 elements in parallel instead of 16:
      1. vp9_sub_pixel_avg_variance64x64
      2. vp9_sub_pixel_avg_variance32x32
      both of those function were calling vp9_sub_pixel_avg_variance16xh_ssse3
      instead of calling that function, it calls vp9_sub_pixel_avg_variance32xh_avx2
      that is written in avx2 and process 32 elements in parallel.
      This Optimization gave 80% function level gain and 2% user level gain
      
      Change-Id: Iea694654e1b7612dc6ed11e2626208c2179502c8
      ea149096
  6. 19 Feb, 2014 1 commit
  7. 14 Feb, 2014 1 commit
    • levytamar82's avatar
      AVX2 SubPixel Variance Optimization · 52dac5d1
      levytamar82 authored
      Optimizing 2 functions to process 32 elements in parallel instead of 16:
      1. vp9_sub_pixel_variance64x64
      2. vp9_sub_pixel_variance32x32
      both of those function were calling vp9_sub_pixel_variance16xh_ssse3
      instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2
      that is written in avx2 and process 32 elements in parallel.
      This Optimization gave 70% function level gain and 2% user level gain
      
      Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde
      52dac5d1