1. 26 May, 2016 1 commit
    • Linfeng Zhang's avatar
      Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2 · 4b5e462d
      Linfeng Zhang authored
      Followed the code style of other lpf fuctions.
      These 2 functions put 2 rows of data in a single xmm register,
      so they have similar but not identical filter operations,
      and cannot share the same macros.
      Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc
  2. 17 Feb, 2016 2 commits
  3. 16 Feb, 2016 1 commit
  4. 17 Jul, 2015 1 commit
  5. 16 Jul, 2015 1 commit
  6. 02 Jul, 2015 1 commit
    • levytamar82's avatar
      VP9_LPF_VERTICAL_16_DUAL_SSE2 optimization · 3c5256d5
      levytamar82 authored
      The vp9_lpf_vertical_16_dual function optimized for x86 32bit target. The hot code in that function was caused by the call to the transpose8x16.
      The gcc generated assembly created uneeded fills and spills to the stack. By interleaving 2 loads and unpack instructions, in addition to hoisting the consumer
      instruction closer to the producer instructions, we eliminated most of the fills and spills and improve the function-level performance by 17%.
      credit for writing the function as well as finding the root cause goes to Erik Niemeyer (erik.a.niemeyer@intel.com)
      Change-Id: I6173cf53956d52918a047d1c53d9a673f952ec46
  7. 15 May, 2015 1 commit
  8. 07 May, 2015 1 commit
    • James Zern's avatar
      James Zern authored
      this macro was used inconsistently and only differs in behavior from
      DECLARE_ALIGNED when an alignment attribute is unavailable. this macro
      is used with calls to assembly, while generic c-code doesn't rely on it,
      so in a c-only build without an alignment attribute the code will
      function as expected.
      Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79
  9. 18 Sep, 2014 2 commits
  10. 18 Dec, 2013 1 commit
    • Jim Bankoski's avatar
      rename loop filter functions · b720ba16
      Jim Bankoski authored
      This renames all the loop filter functions so that they no
      longer refer to mb
      Change-Id: I8a58a8c7fd253d835cb619bde13913e896ece90b
  11. 22 Nov, 2013 1 commit
    • Yunqing Wang's avatar
      Do vertical loopfiltering in parallel · ed36720b
      Yunqing Wang authored
      This patch followed "Add filter_selectively_vert_row2 to enable
      parallel loopfiltering" commit, and added x86 SSE2 optimization
      to do 16-pixel filtering in parallel. For other optimizations
      (neon and dspr2), current 16-pixel functions were done by calling
      8-pixel functions twice, and real 16-pixel functions could be added
      Decoder speedup:
      tulip clip:     2% speed gain;
      old_town_cross: 1.2% speed gain;
      bus:            2% speed gain.
      Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
  12. 16 Nov, 2013 1 commit
    • Yunqing Wang's avatar
      Do horizontal loopfiltering in parallel · 64f728ca
      Yunqing Wang authored
      This patch followed "Rewrite filter_selectively_horiz for parallel
      loopfiltering" commit, and added x86 SSE2 optimization to do
      16-pixel filtering in parallel. Also, corrected the declaration
      of aligned arrays. For 8-pixel-in-parallel case, improved the
      calculation of the masks and filters. Updated the threshold loading
      since the thresholds were already duplicated. Updated neon C functions
      to call neon loopfilters twice.
      Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
      Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
  13. 29 Aug, 2013 1 commit
    • Scott LaVarnway's avatar
      Improved mb_lpf_horizontal_edge_w_sse2_8 · 22dc946a
      Scott LaVarnway authored
      This patch is a reformatted version of optimizations done by
      engineers at Intel (Erik/Tamar) who have been providing
      performance feedback for VP9.  For the test clips used (720p, 1080p),
      up to 1.2% performance improvement was seen.
      Change-Id: Ic1a7149098740079d5453b564da6fbfdd0b2f3d2
  14. 16 Jul, 2013 2 commits
  15. 14 Jul, 2013 3 commits
  16. 10 Jul, 2013 1 commit
    • John Koleszar's avatar
      Wide loopfilter 16 pix at a time · 64f7a4d8
      John Koleszar authored
      Where possible, do the 16 pixel wide filter while doing the horizontal
      filtering pass. The same approach can be taken for the mbloop_filter
      when that's implemented. Doing so on the vertical pass is a little more
      involved, but possible.
      Change-Id: I010cb505e623464247ae8f67fa25a0cdac091320
  17. 12 Jun, 2013 3 commits
  18. 10 May, 2013 1 commit
  19. 02 Apr, 2013 1 commit
    • Johann's avatar
      Demux vp9_loopfilter_x86.c · 3db60c8c
      Johann authored
      Allow more careful targeting of compiler flags.
      Change-Id: I963ab4a6479dedb165419310dfca52a58a9877b8
  20. 07 Feb, 2013 1 commit
  21. 23 Jan, 2013 1 commit
    • Scott LaVarnway's avatar
      Intrinsic version of loopfilter now matches C code · 6a997400
      Scott LaVarnway authored
      Updated the instrinsic code to match Yaowu's latest loopfilter change.
      The decoder performance improved by ~30% for the test clip used.
      Change-Id: I026cfc75d5bcb7d8d58be6f0440ac9e126ef39d2
  22. 14 Jan, 2013 1 commit
  23. 12 Jan, 2013 2 commits
    • Scott LaVarnway's avatar
      WIP: Added sse2 version of vp9_mb_lpf_horizontal_edge_w · b20ce07d
      Scott LaVarnway authored
      and vp9_mb_lpf_vertical_edge_w_sse2.  This was quickly done so we can
      run some tests over the weekend.  Future commits will optimize/refactor these
      functions further.
      The decoder performance improved by ~17% for the clip used.
      Change-Id: I612687cd5a7670ee840a0cbc3c68dc2b84d4af76
    • Yaowu Xu's avatar
      Add loop filtering for UV plane · 9a1d73d0
      Yaowu Xu authored
      On block boundary within a MB when 8x8 block boundary only is filtered
      for Y.
      Change-Id: Ie1c804c877d199e78e2fecd8c2d3f1e114ce9ec1
  24. 11 Jan, 2013 1 commit
  25. 20 Dec, 2012 1 commit
  26. 28 Nov, 2012 1 commit
    • Yunqing Wang's avatar
      Further improve macroblock loop filters · d2021386
      Yunqing Wang authored
      This change included:
      1. Aligned reads in vp9_mbloop_filter_vertical_edge function.
      Since we actually read 16 bytes, we can align the reads to read
      starting at (s - 8) instead of (s - 5).
      2. Combined u, v loop filters.
      3. Added 8x16 transpose.
      This gave 2% decoder performance gain (tulip clip).
      Change-Id: Ib14c2f1645c4a3436df17fe2f24789506bf0bb58
  27. 27 Nov, 2012 1 commit
    • John Koleszar's avatar
      Add vp9_ prefix to all vp9 files · fcccbcbb
      John Koleszar authored
      Support for gyp which doesn't support multiple objects in the same
      static library having the same basename.
      Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
  28. 03 Nov, 2012 1 commit
  29. 01 Nov, 2012 3 commits
  30. 31 Oct, 2012 1 commit