1. 30 Nov, 2015 1 commit
    • Jian Zhou's avatar
      SSE2 speed up of h_predictor_4x4 · 9d29d762
      Jian Zhou authored
      Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
      Speed up by ~25% in ./test_intra_pred_speed.
      
      Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
      9d29d762
  2. 25 Nov, 2015 1 commit
    • James Zern's avatar
      add vp9_satd_neon · eb1d0f8d
      James Zern authored
      ~60-65% faster at the function level across block sizes
      
      Change-Id: Iaf8cbe95731c43fdcbf68256e44284ba51a93893
      eb1d0f8d
  3. 23 Nov, 2015 1 commit
  4. 20 Nov, 2015 2 commits
    • James Zern's avatar
      fix vp9_satd_sse2 · 60760f71
      James Zern authored
      accumulate satd in 32-bits
      + add unit test
      
      Change-Id: I6748183df3662ddb9d635f9641f9586f2fd38ad5
      60760f71
    • James Zern's avatar
      vp9_satd: return an int · 3e0138ed
      James Zern authored
      the final sum may use up to 26 bits
      
      + add a unit test
      + disable the sse2 as the result will rollover; this will be fixed in a
      future commit
      
      Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
      3e0138ed
  5. 19 Nov, 2015 1 commit
    • Jian Zhou's avatar
      Speed up tm_predictor_4x4 · 79b68626
      Jian Zhou authored
      tm_predictor_4x4 is implemented with SSE2 using XMM registers.
      Speed up by ~25% in ./test_intra_pred_speed.
      
      Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
      79b68626
  6. 14 Nov, 2015 1 commit
  7. 13 Nov, 2015 2 commits
  8. 10 Nov, 2015 1 commit
  9. 09 Nov, 2015 2 commits
  10. 06 Nov, 2015 3 commits
  11. 05 Nov, 2015 1 commit
  12. 03 Nov, 2015 1 commit
  13. 31 Oct, 2015 1 commit
  14. 30 Oct, 2015 1 commit
  15. 29 Oct, 2015 1 commit
  16. 28 Oct, 2015 2 commits
  17. 22 Oct, 2015 1 commit
  18. 21 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimize vp9_highbd_block_error_8bit assembly. · aa8f8522
      Geza Lore authored
      A new version of vp9_highbd_error_8bit is now available which is
      optimized with AVX assembly. AVX itself does not buy us too much, but
      the non-destructive 3 operand format encoding of the 128bit SSEn integer
      instructions helps to eliminate move instructions. The Sandy Bridge
      micro-architecture cannot eliminate move instructions in the processor
      front end, so AVX will help on these machines.
      
      Further 2 optimizations are applied:
      
      1. The common case of computing block error on 4x4 blocks is optimized
      as a special case.
      2. All arithmetic is speculatively done on 32 bits only. At the end of
      the loop, the code detects if overflow might have happened and if so,
      the whole computation is re-executed using higher precision arithmetic.
      This case however is extremely rare in real use, so we can achieve a
      large net gain here.
      
      The optimizations rely on the fact that the coefficients are in the
      range [-(2^15-1), 2^15-1], and that the quantized coefficients always
      have the same sign as the input coefficients (in the worst case they are
      0). These are the same assumptions that the old SSE2 assembly code for
      the non high bitdepth configuration relied on. The unit tests have been
      updated to take this constraint into consideration when generating test
      input data.
      
      Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
      aa8f8522
  19. 16 Oct, 2015 1 commit
  20. 09 Oct, 2015 2 commits
  21. 08 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimization of 8bit block error for high bitdepth · 0134764f
      Geza Lore authored
      If high bit depth configuration is enabled, but encoding in profile 0,
      the code now falls back on optimized SSE2 assembler to compute the
      block errors, similar to when high bit depth is not enabled.
      
      Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
      0134764f
  22. 07 Oct, 2015 1 commit
  23. 06 Oct, 2015 1 commit
    • James Zern's avatar
      invalid_file_test: loosen error check w/tile-threading · fb209003
      James Zern authored
      The serial decode check is too strict for tile-threaded decoding as
      there is no guarantee on the decode order nor which specific error
      will take precedence. Currently a tile-level error is not forwarded so
      the frame will simply be marked corrupt.
      
      Change-Id: I51cf1e39e44bedeac93746154b36a4ccb2f059b1
      fb209003
  24. 30 Sep, 2015 4 commits
  25. 26 Sep, 2015 2 commits
    • Ronald S. Bultje's avatar
      vp9/10: improve support for render_width/height. · 812945a8
      Ronald S. Bultje authored
      In the decoder, map this to the output variable vpx_image_t.r_w/h.
      This is intended as an improved version of VP9D_GET_DISPLAY_SIZE,
      which doesn't work with parallel frame decoding. In the encoder,
      map this to a codec control func (VP9E_SET_RENDER_SIZE) that takes
      a w/h pair argument in a int[2] (identical to VP9D_GET_DISPLAY_SIZE).
      
      Also add render_size to the encoder_param_get_to_decoder unit test.
      
      See issue 1030.
      
      Change-Id: I12124c13602d832bf4c44090db08c1009c94c7e8
      812945a8
    • Angie Chiang's avatar
      comment out fdct32 · 6a382101
      Angie Chiang authored
      comment out fdct32
      remove fdct32 test
      
      Change-Id: I31c47fb435377465cd3265e39621ca50d3aae656
      6a382101
  26. 25 Sep, 2015 1 commit
  27. 24 Sep, 2015 3 commits