1. 30 Nov, 2015 1 commit
    • Jian Zhou's avatar
      SSE2 speed up of h_predictor_4x4 · 9d29d762
      Jian Zhou authored
      Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
      Speed up by ~25% in ./test_intra_pred_speed.
      
      Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
      9d29d762
  2. 25 Nov, 2015 2 commits
  3. 23 Nov, 2015 1 commit
  4. 21 Nov, 2015 1 commit
  5. 19 Nov, 2015 4 commits
  6. 18 Nov, 2015 1 commit
  7. 11 Nov, 2015 1 commit
  8. 10 Nov, 2015 2 commits
  9. 22 Oct, 2015 1 commit
  10. 20 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimize vpx_quantize_{b,b_32x32} assembler. · 9cfba09a
      Geza Lore authored
      Added optimization of the 8 bit assembly quantizer routines. This makes
      these functions up to 100% faster, depending on encoding parameters.
      
      This patch maskes the encoder faster in both the high bitdepth and 8bit
      configurations. In the high bitdepth configuration, it effects profile 0
      only.
      
      Based on my profiling using 1080p input the net gain is between 1-3% for
      the 8 bit config, and around 2.5-4.5% for the high bitdepth config,
      depending on target bitrate. The difference between the 8 bit and high
      bitdepth configurations for the same encoder run is reduced by 1% in all
      cases I have profiled.
      
      Change-Id: I86714a6b7364da20cd468cd784247009663a5140
      9cfba09a
  11. 16 Oct, 2015 1 commit
  12. 14 Oct, 2015 1 commit
  13. 13 Oct, 2015 1 commit
  14. 09 Oct, 2015 2 commits
  15. 06 Oct, 2015 1 commit
    • Julia Robson's avatar
      SSSE3 optimisation for quantize in high bit depth · 37c68efe
      Julia Robson authored
      When configured with high bit detpth enabled, the 8bit quantize
      function stopped using optimised code. This made 8bit content
      decode slowly. This commit re-enables the SSSE3 optimisations.
      
      Change-Id: I194b505dd3f4c494e5c5e53e020f5d94534b16b5
      37c68efe
  16. 05 Oct, 2015 2 commits
  17. 01 Oct, 2015 1 commit
    • Ronald S. Bultje's avatar
      vp10: reimplement d45/4x4 to match vp8 instead of vp9. · 62a15795
      Ronald S. Bultje authored
      This is more a proof of concept than anything else. The problem here
      isn't so much how to code it, but rather where to place the resulting
      code. All intrapred DSP code lives in vpx_dsp, so do we want the vp10
      specific intra pred functions to live there, or in vp10/?
      
      See issue 1015.
      
      Change-Id: I675f7badcc8e18fd99a9553910ecf3ddf81f0a05
      62a15795
  18. 30 Sep, 2015 2 commits
    • Ronald S. Bultje's avatar
      vp8: change build_intra4x4_predictors() to use vpx_dsp. · c26a9eca
      Ronald S. Bultje authored
      I've added a few new functions (d45e, d63e, he, ve) to cover the
      filtered h/v 4x4 predictors that are vp8-specific, the "correct"
      d45 with the correctly filtered bottom-right pixel (as opposed to
      the unfiltered version in vp9), and the "broken" d63 with weirdly
      filtered bottom-right pixels (which is correctly filtered in vp9).
      
      There may be a minor performance impact on all systems because we
      have to do an extra copy of the Above pixel array to incorporate
      the topleft pixel in the same array (thus fitting the vpx_dsp API).
      In addition, armv6 will have a more serious performance impact b/c
      I removed the armv6/vp8-specific assembly. I'm not sure anyone
      cares...
      
      Change-Id: I7f9e5ebee11d8e21aca2cd517a69eefc181b2e86
      c26a9eca
    • Ronald S. Bultje's avatar
      vp8: change build_intra_predictors_mby_s to use vpx_dsp. · 54d48955
      Ronald S. Bultje authored
      Change-Id: I2000820e0c04de2c975d370a0cf7145330289bb2
      54d48955
  19. 29 Sep, 2015 1 commit
    • Julia Robson's avatar
      Accelerated transform in high bit depth · 406030d1
      Julia Robson authored
      When configured with high bitdepth enabled, the 8bit transform
      stopped using optimised code. This made 8bit content decode slowly.
      
      Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
      406030d1
  20. 18 Sep, 2015 1 commit
  21. 17 Sep, 2015 1 commit
    • James Zern's avatar
      vpx_subpixel_8t_ssse3: fix reg counts/access · 683b5a31
      James Zern authored
      fixes build on windows x64; previously 'heightq' i.e., the 64-bit register
      was accessed when only the 32-bit value was needed. given this is from a
      stack variable the upper bits were undefined.
      
      + bump register/xmm counts; users of SETUP_LOCAL_VARS touch xmm13 in
      64-bit builds and filter_block1d16_v* uses one extra temp variable
      
      Change-Id: I9c768c0b2047481d1d3b11c2e16b2f8de6eb0d80
      683b5a31
  22. 16 Sep, 2015 1 commit
    • Ronald S. Bultje's avatar
      vp10: code sign bit before absolute value in non-arithcoded header. · a3df343c
      Ronald S. Bultje authored
      For reading, this makes the operation branchless, although it still
      requires two shifts. For writing, this makes the operation as fast
      as writing an unsigned value, branchlessly. This is also how other
      codecs typically code signed, non-arithcoded bitstream elements.
      
      See issue 1039.
      
      Change-Id: I6a8182cc88a16842fb431688c38f6b52d7f24ead
      a3df343c
  23. 08 Sep, 2015 1 commit
  24. 04 Sep, 2015 1 commit
    • Scott LaVarnway's avatar
      VPX: subpixel_8t_ssse3 asm using x86inc · 19588302
      Scott LaVarnway authored
      This is based on the original patch optimized for 32bit
      platforms by Tamar/Ilya and now uses the x86inc style asm.
      The assembly was also modified to support 64bit platforms.
      
      Change-Id: Ice12f249bbbc162a7427e3d23fbf0cbe4135aff2
      19588302
  25. 31 Aug, 2015 1 commit
  26. 28 Aug, 2015 1 commit
  27. 27 Aug, 2015 2 commits
    • Johann's avatar
      Add sse2 versions of halfpix variance · a28b2c6f
      Johann authored
      These were lost in the great sub pixel variance move of
      6a82f0d7
      
      Not having these functions caused a ~10% performance regression in
      some realtime vp8 encodes.
      
      Change-Id: I50658483d9198391806b27899f2c0d309233c4b5
      a28b2c6f
    • James Zern's avatar
      vpx_dsp_common: add VPX prefix to MIN/MAX · 5e16d397
      James Zern authored
      prevents redeclaration warnings;
      vp8 has its own define which will be resolved in a future commit
      
      Change-Id: Ic941fef3dd4262fcdce48b73075fe6b375f11c9c
      5e16d397
  28. 26 Aug, 2015 1 commit
  29. 20 Aug, 2015 1 commit
  30. 19 Aug, 2015 1 commit
  31. 18 Aug, 2015 1 commit