1. 29 Sep, 2016 1 commit
    • Linfeng Zhang's avatar
      Unify loopfilter function names · 7f1f3518
      Linfeng Zhang authored
      Rename vpx_lpf_horizontal_edge_8() to vpx_lpf_horizontal_16().
      Rename vpx_lpf_horizontal_edge_16() to vpx_lpf_horizontal_16_dual().
      
      Change-Id: I798ca8fbbd657d06d3db2bfb0fb3321168f49e52
      7f1f3518
  2. 25 Jul, 2016 1 commit
  3. 23 Jun, 2016 1 commit
  4. 26 May, 2016 1 commit
    • Linfeng Zhang's avatar
      Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2 · 4b5e462d
      Linfeng Zhang authored
      Followed the code style of other lpf fuctions.
      These 2 functions put 2 rows of data in a single xmm register,
      so they have similar but not identical filter operations,
      and cannot share the same macros.
      
      Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc
      4b5e462d
  5. 17 Feb, 2016 2 commits
  6. 16 Feb, 2016 1 commit
  7. 17 Jul, 2015 1 commit
  8. 16 Jul, 2015 1 commit
  9. 02 Jul, 2015 1 commit
    • levytamar82's avatar
      VP9_LPF_VERTICAL_16_DUAL_SSE2 optimization · 3c5256d5
      levytamar82 authored
      The vp9_lpf_vertical_16_dual function optimized for x86 32bit target. The hot code in that function was caused by the call to the transpose8x16.
      The gcc generated assembly created uneeded fills and spills to the stack. By interleaving 2 loads and unpack instructions, in addition to hoisting the consumer
      instruction closer to the producer instructions, we eliminated most of the fills and spills and improve the function-level performance by 17%.
      credit for writing the function as well as finding the root cause goes to Erik Niemeyer (erik.a.niemeyer@intel.com)
      
      Change-Id: I6173cf53956d52918a047d1c53d9a673f952ec46
      3c5256d5
  10. 15 May, 2015 1 commit
  11. 07 May, 2015 1 commit
    • James Zern's avatar
      replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNED · fd3658b0
      James Zern authored
      this macro was used inconsistently and only differs in behavior from
      DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
      is used with calls to assembly, while generic c-code doesn't rely on it,
      so in a c-only build without an alignment attribute the code will
      function as expected.
      
      Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
      fd3658b0
  12. 18 Sep, 2014 2 commits
  13. 18 Dec, 2013 1 commit
    • Jim Bankoski's avatar
      rename loop filter functions · b720ba16
      Jim Bankoski authored
      This renames all the loop filter functions so that they no
      longer refer to mb
      
      Change-Id: I8a58a8c7fd253d835cb619bde13913e896ece90b
      b720ba16
  14. 22 Nov, 2013 1 commit
    • Yunqing Wang's avatar
      Do vertical loopfiltering in parallel · ed36720b
      Yunqing Wang authored
      This patch followed "Add filter_selectively_vert_row2 to enable
      parallel loopfiltering" commit, and added x86 SSE2 optimization
      to do 16-pixel filtering in parallel. For other optimizations
      (neon and dspr2), current 16-pixel functions were done by calling
      8-pixel functions twice, and real 16-pixel functions could be added
      later.
      
      Decoder speedup:
      tulip clip:     2% speed gain;
      old_town_cross: 1.2% speed gain;
      bus:            2% speed gain.
      
      Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
      ed36720b
  15. 16 Nov, 2013 1 commit
    • Yunqing Wang's avatar
      Do horizontal loopfiltering in parallel · 64f728ca
      Yunqing Wang authored
      This patch followed "Rewrite filter_selectively_horiz for parallel
      loopfiltering" commit, and added x86 SSE2 optimization to do
      16-pixel filtering in parallel. Also, corrected the declaration
      of aligned arrays. For 8-pixel-in-parallel case, improved the
      calculation of the masks and filters. Updated the threshold loading
      since the thresholds were already duplicated. Updated neon C functions
      to call neon loopfilters twice.
      
      Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
      
      Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
      64f728ca
  16. 29 Aug, 2013 1 commit
    • Scott LaVarnway's avatar
      Improved mb_lpf_horizontal_edge_w_sse2_8 · 22dc946a
      Scott LaVarnway authored
      This patch is a reformatted version of optimizations done by
      engineers at Intel (Erik/Tamar) who have been providing
      performance feedback for VP9.  For the test clips used (720p, 1080p),
      up to 1.2% performance improvement was seen.
      
      Change-Id: Ic1a7149098740079d5453b564da6fbfdd0b2f3d2
      22dc946a
  17. 16 Jul, 2013 2 commits
  18. 14 Jul, 2013 3 commits
  19. 10 Jul, 2013 1 commit
    • John Koleszar's avatar
      Wide loopfilter 16 pix at a time · 64f7a4d8
      John Koleszar authored
      Where possible, do the 16 pixel wide filter while doing the horizontal
      filtering pass. The same approach can be taken for the mbloop_filter
      when that's implemented. Doing so on the vertical pass is a little more
      involved, but possible.
      
      Change-Id: I010cb505e623464247ae8f67fa25a0cdac091320
      64f7a4d8
  20. 12 Jun, 2013 3 commits
  21. 10 May, 2013 1 commit
  22. 02 Apr, 2013 1 commit
    • Johann's avatar
      Demux vp9_loopfilter_x86.c · 3db60c8c
      Johann authored
      Allow more careful targeting of compiler flags.
      
      Change-Id: I963ab4a6479dedb165419310dfca52a58a9877b8
      3db60c8c
  23. 07 Feb, 2013 1 commit
  24. 23 Jan, 2013 1 commit
    • Scott LaVarnway's avatar
      Intrinsic version of loopfilter now matches C code · 6a997400
      Scott LaVarnway authored
      Updated the instrinsic code to match Yaowu's latest loopfilter change.
      (I584393906c4f5f948a581d6590959522572743bb)
      
      The decoder performance improved by ~30% for the test clip used.
      
      Change-Id: I026cfc75d5bcb7d8d58be6f0440ac9e126ef39d2
      6a997400
  25. 14 Jan, 2013 1 commit
  26. 12 Jan, 2013 2 commits
    • Scott LaVarnway's avatar
      WIP: Added sse2 version of vp9_mb_lpf_horizontal_edge_w · b20ce07d
      Scott LaVarnway authored
      and vp9_mb_lpf_vertical_edge_w_sse2.  This was quickly done so we can
      run some tests over the weekend.  Future commits will optimize/refactor these
      functions further.
      
      The decoder performance improved by ~17% for the clip used.
      
      Change-Id: I612687cd5a7670ee840a0cbc3c68dc2b84d4af76
      b20ce07d
    • Yaowu Xu's avatar
      Add loop filtering for UV plane · 9a1d73d0
      Yaowu Xu authored
      On block boundary within a MB when 8x8 block boundary only is filtered
      for Y.
      
      Change-Id: Ie1c804c877d199e78e2fecd8c2d3f1e114ce9ec1
      9a1d73d0
  27. 11 Jan, 2013 1 commit
  28. 20 Dec, 2012 1 commit
  29. 28 Nov, 2012 1 commit
    • Yunqing Wang's avatar
      Further improve macroblock loop filters · d2021386
      Yunqing Wang authored
      This change included:
      1. Aligned reads in vp9_mbloop_filter_vertical_edge function.
      Since we actually read 16 bytes, we can align the reads to read
      starting at (s - 8) instead of (s - 5).
      2. Combined u, v loop filters.
      3. Added 8x16 transpose.
      
      This gave 2% decoder performance gain (tulip clip).
      
      Change-Id: Ib14c2f1645c4a3436df17fe2f24789506bf0bb58
      d2021386
  30. 27 Nov, 2012 1 commit
    • John Koleszar's avatar
      Add vp9_ prefix to all vp9 files · fcccbcbb
      John Koleszar authored
      Support for gyp which doesn't support multiple objects in the same
      static library having the same basename.
      
      Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
      fcccbcbb
  31. 03 Nov, 2012 1 commit
  32. 01 Nov, 2012 1 commit