1. 07 Sep, 2017 1 commit
  2. 14 Aug, 2017 1 commit
  3. 11 Aug, 2017 1 commit
  4. 11 Jul, 2017 1 commit
  5. 05 May, 2017 1 commit
  6. 01 May, 2017 1 commit
    • Johann's avatar
      move vp9_error_intrin_avx2.c · 2ff01aa1
      Johann authored
      There is only one avx2 implementation. Drop '_intrin'
      
      Change-Id: I887a0d27d58567eaad49f749f127eca61313f312
      2ff01aa1
  7. 27 Apr, 2017 1 commit
    • Johann's avatar
      vp9 temporal filter: sse4 implementation · 6dfeea65
      Johann authored
      Approximates division using multiply and shift.
      
      Speeds up both sizes (8x8 and 16x16) by 30 times.
      
      Fix the call sites to use the RTCD function.
      
      Delete sse2 and mips implementation. They were based on a previous
      implementation of the filter. It was changed in Dec 2015:
      ece4fd5d
      
      BUG=webm:1378
      
      Change-Id: I0818e767a802966520b5c6e7999584ad13159276
      6dfeea65
  8. 09 Mar, 2017 1 commit
    • James Zern's avatar
      move vp9_scale_and_extend_frame_c to vp9_frame_scale.c · 2f31a164
      James Zern authored
      this is similar to the x86 configuration and helps mitigate an issue
      with a circular dependency between this function and the ssse3 variant
      causing an outsized increase in binary size (~300K for chrome)
      chrome.dll:
      .text 255B000 -> 252B000
      .data 7B000 -> 75000
      -221184 bytes
      
      BUG=chromium:697956
      
      Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c
      2f31a164
  9. 24 Feb, 2017 2 commits
    • Johann's avatar
      consolidate block_error functions · 904b957a
      Johann authored
      vp9_highbd_block_error_8bit_c was a very simple wrapper around
      vp9_block_error_c. The SSE2 implemention was practically identical to
      the non-HBD one. It was missing some minor improvements which only
      went into the original version.
      
      In quick speed tests, the AVX implementation showed minimal
      improvement over SSE2 when it does not detect overflow. However, when
      overflow is detected the function is run a second time. The
      OperationCheck test seems to trigger this case and reverses any
      speed benefits by running ~60% slower. AVX2 on the other hand is
      always 30-40% faster.
      
      Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
      904b957a
    • Jerome Jiang's avatar
      Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8. · 0998a146
      Jerome Jiang authored
      Only works for bitdepth = 8 when compiled with high bitdepth flag.
      4x speed ups for handling 1:2 down/upsampling.
      
      Validated manually for:
      1) Dynamic resize for a single layer encoding
      2) SVC encoding with 3 spatial layers
      Results are bitexact with the patch and the speed gain (~4x) in the
      scaling was verified.
      
      BUG=webm:1371
      
      Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712
      0998a146
  10. 14 Feb, 2017 1 commit
  11. 07 Feb, 2017 1 commit
  12. 24 Jan, 2017 1 commit
  13. 27 Aug, 2016 1 commit
  14. 25 Aug, 2016 1 commit
    • Yury Gitman's avatar
      Create interface for the ALT_REF_AQ class · 292d221f
      Yury Gitman authored
      Current commit is just an API template  for the rest of the code, and
      I will add inner logic later.
      
      Altref  frames  generate a  lot  of  bitrate  and  at the  same  time
      other  frames  refer to  them  a  lot, so  it  makes  sense to  apply
      special  compensation-based adaptive  quantization scheme  for altref
      frames. E.g.,  for blocks  that are  good predictors  for the  future
      apply rate-control  chosen quantizer  while for bad  predictors apply
      worse one.
      
      Change-Id: Iba3f8ec349470673b7249f6a125f6859336a47c8
      292d221f
  15. 30 Jun, 2016 1 commit
  16. 08 Jun, 2016 1 commit
  17. 27 May, 2016 1 commit
    • Linfeng Zhang's avatar
      Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10. · af7fb17c
      Linfeng Zhang authored
      Function level timing test shows about 27% time saving on
      a Xeon E5-2680 v2 desktop.
      
      Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
      rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
      duplicate basenames.
      
      Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
      are identical. TODO: They should be unified later if there is
      no intention to keep a duplicate.
      
      Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
      af7fb17c
  18. 24 May, 2016 1 commit
  19. 08 Feb, 2016 1 commit
  20. 04 Feb, 2016 1 commit
  21. 14 Jan, 2016 1 commit
    • Debargha Mukherjee's avatar
      Adding an aq mode for 360 videos · 02345be9
      Debargha Mukherjee authored
      Different quality levels are used for different regions in
      the frame depending on how far they are vertically from the
      center. Specifically, three segments are used based on the
      mi_row index with respect number to the number of mi_rows in
      the frame.
      
      Change-Id: Ifc8b777bc58ea8521dffc4640360c67d99f8d381
      02345be9
  22. 14 Dec, 2015 1 commit
  23. 11 Nov, 2015 1 commit
    • Geza Lore's avatar
      Add AVX vectorized vp9_diamond_search_sad · 5eefd3eb
      Geza Lore authored
      This function now has an AVX intrinsics version which is about 80%
      faster compared to the C implementation. This provides a 2-4% total
      speed-up for encode, depending on encoding parameters. The function
      utilizes 3 properties of the cost function lookup table, constructed
      in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
      For the joint cost:
        - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
      For the component costs:
        - For all i: mvsadcost[0][i] == mvsadcost[1][i]
              (equal per component cost)
        - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
              (Cost function is even)
      These must hold, otherwise the AVX version of the function cannot be used.
      
      Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
      5eefd3eb
  24. 06 Nov, 2015 1 commit
    • James Zern's avatar
      Revert "Add AVX vectorized vp9_diamond_search_sad" · 30466f26
      James Zern authored
      This reverts commit f1342a7b.
      
      This breaks 32-bit builds:
       runtime error: load of misaligned address 0xf72fdd48 for type 'const
      __m128i' (vector of 2 'long long' values), which requires 16 byte
      alignment
      
      + _mm_set1_epi64x is incompatible with some versions of visual studio
      
      Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
      30466f26
  25. 05 Nov, 2015 1 commit
    • Geza Lore's avatar
      Add AVX vectorized vp9_diamond_search_sad · f1342a7b
      Geza Lore authored
      This function now has an AVX intrinsics version which is about 80%
      faster compared to the C implementation. This provides a 2-4% total
      speed-up for encode, depending on encoding parameters. The function
      utilizes 3 properties of the cost function lookup table, constructed
      in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
      For the joint cost:
        - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
      For the component costs:
        - For all i: mvsadcost[0][i] == mvsadcost[1][i]
              (equal per component cost)
        - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
              (Cost function is even)
      These must hold, otherwise the AVX version of the function cannot be used.
      
      Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
      f1342a7b
  26. 02 Nov, 2015 1 commit
    • Marco's avatar
      Move noise level estimate outside denoiser. · c7da053d
      Marco authored
      Source noise level estimate is also useful for
      setting variance encoder parameters (variance thresholds,
      qp-delta, mode selection, etc), so allow it to be used also
      if denoising is not on.
      
      Change-Id: I4fe23d47607b4e17a35287057f489c29114beed1
      c7da053d
  27. 21 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimize vp9_highbd_block_error_8bit assembly. · aa8f8522
      Geza Lore authored
      A new version of vp9_highbd_error_8bit is now available which is
      optimized with AVX assembly. AVX itself does not buy us too much, but
      the non-destructive 3 operand format encoding of the 128bit SSEn integer
      instructions helps to eliminate move instructions. The Sandy Bridge
      micro-architecture cannot eliminate move instructions in the processor
      front end, so AVX will help on these machines.
      
      Further 2 optimizations are applied:
      
      1. The common case of computing block error on 4x4 blocks is optimized
      as a special case.
      2. All arithmetic is speculatively done on 32 bits only. At the end of
      the loop, the code detects if overflow might have happened and if so,
      the whole computation is re-executed using higher precision arithmetic.
      This case however is extremely rare in real use, so we can achieve a
      large net gain here.
      
      The optimizations rely on the fact that the coefficients are in the
      range [-(2^15-1), 2^15-1], and that the quantized coefficients always
      have the same sign as the input coefficients (in the worst case they are
      0). These are the same assumptions that the old SSE2 assembly code for
      the non high bitdepth configuration relied on. The unit tests have been
      updated to take this constraint into consideration when generating test
      input data.
      
      Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
      aa8f8522
  28. 08 Oct, 2015 1 commit
    • Geza Lore's avatar
      Optimization of 8bit block error for high bitdepth · 0134764f
      Geza Lore authored
      If high bit depth configuration is enabled, but encoding in profile 0,
      the code now falls back on optimized SSE2 assembler to compute the
      block errors, similar to when high bit depth is not enabled.
      
      Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
      0134764f
  29. 07 Aug, 2015 1 commit
  30. 28 Jul, 2015 4 commits
  31. 27 Jul, 2015 1 commit
  32. 22 Jul, 2015 1 commit
  33. 20 Jul, 2015 1 commit
  34. 17 Jul, 2015 1 commit
    • Yunqing Wang's avatar
      Migrate quantization functions from vp9/ to vpx_dsp/ · 38f1fbbb
      Yunqing Wang authored
      The following quantization functions were moved:
      vp9_quantize_b
      vp9_quantize_b_32x32
      vp9_highbd_quantize_b
      vp9_highbd_quantize_b_32x32
      
      vp9_quantize_dc
      vp9_quantize_dc_32x32
      vp9_highbd_quantize_dc
      vp9_highbd_quantize_dc_32x32
      
      The purpose of doing that was to allow these functions to be shared
      by multiple codecs.
      
      Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
      38f1fbbb
  35. 07 Jul, 2015 1 commit
  36. 06 Jul, 2015 1 commit