1. 14 Dec, 2017 1 commit
    • Johann's avatar
      add copyright to rtcd files · e4b3f03c
      Johann authored
      Allows them to pass the license check in chromium.
      
      BUG=chromium:98319
      
      Change-Id: Iefc1706152a549d8c4ae774c917596bf1c9492d8
      e4b3f03c
  2. 09 Nov, 2017 1 commit
    • Scott LaVarnway's avatar
      vpx: [x86] add vp9_block_error_fp_avx2() · 62ab5e99
      Scott LaVarnway authored
      SSE2 asm vs AVX2 intrinsics speed gains:
      blocksize   16: ~1.00
      blocksize   64: ~1.17
      blocksize  256: ~1.67
      blocksize 1024: ~1.81
      
      Change-Id: I2a86db239cf57e3ff617890ccb2d236aba83ad5e
      62ab5e99
  3. 03 Nov, 2017 1 commit
  4. 13 Sep, 2017 1 commit
  5. 07 Sep, 2017 1 commit
  6. 23 Aug, 2017 1 commit
    • Johann's avatar
      quantize fp: neon implementation · e83d99d7
      Johann authored
      About 4x faster when values are below the dequant threshold and 10x
      faster if everything needs to be calculated.
      
      Both numbers would improve if the division for dqcoeff could be
      simplified.
      
      BUG=webm:1426
      
      Change-Id: I8da67c1f3fcb4abed8751990c1afe00bc841f4b2
      e83d99d7
  7. 14 Aug, 2017 1 commit
  8. 10 Jul, 2017 1 commit
    • Johann's avatar
      remove vp9_full_sad_search · 109faffe
      Johann authored
      This code is unused in vp9. Only vp8 still contains references to
      vpx_sad_NxMx[3|8] and only for sizes 16x16, 16x8, 8x16, 8x8 and 4x4.
      
      Remove the remaining sizes and all the highbitdepth versions.
      
      BUG=webm:1425
      
      Change-Id: If6a253977c8e0c04599e25cbeb45f71a94f563e8
      109faffe
  9. 10 May, 2017 2 commits
  10. 05 May, 2017 1 commit
  11. 03 May, 2017 1 commit
  12. 28 Apr, 2017 1 commit
    • Johann's avatar
      Use uint32_t for accumulator · 657f3e9f
      Johann authored
      Be specific about the data type size.
      
      Use convenience macro vp9_zero_array.
      
      Change-Id: I5fadf7dbd408befb73820d85db0be4832e8cfcbd
      657f3e9f
  13. 27 Apr, 2017 1 commit
    • Johann's avatar
      vp9 temporal filter: sse4 implementation · 6dfeea65
      Johann authored
      Approximates division using multiply and shift.
      
      Speeds up both sizes (8x8 and 16x16) by 30 times.
      
      Fix the call sites to use the RTCD function.
      
      Delete sse2 and mips implementation. They were based on a previous
      implementation of the filter. It was changed in Dec 2015:
      ece4fd5d
      
      BUG=webm:1378
      
      Change-Id: I0818e767a802966520b5c6e7999584ad13159276
      6dfeea65
  14. 18 Apr, 2017 1 commit
    • Marco's avatar
      vp9: Add phase to get averaging filter for 1:2 downsampling. · 348bdc01
      Marco authored
      The scaling filter with zero shift will give sub-sampling for
      2x downsampling. Allow for a phase shift to get an averaging filter.
      
      Usage is for source scaling in 1 pass SVC mode for 1:2 downscale.
      Reduces aliasing in downsampled image.
      
      Keep the phase to 0/off for now.
      
      Change-Id: Ic547ea0748d151b675f877527e656407fcf4d51e
      348bdc01
  15. 22 Mar, 2017 1 commit
  16. 24 Feb, 2017 3 commits
    • Johann's avatar
      consolidate block_error functions · 904b957a
      Johann authored
      vp9_highbd_block_error_8bit_c was a very simple wrapper around
      vp9_block_error_c. The SSE2 implemention was practically identical to
      the non-HBD one. It was missing some minor improvements which only
      went into the original version.
      
      In quick speed tests, the AVX implementation showed minimal
      improvement over SSE2 when it does not detect overflow. However, when
      overflow is detected the function is run a second time. The
      OperationCheck test seems to trigger this case and reverses any
      speed benefits by running ~60% slower. AVX2 on the other hand is
      always 30-40% faster.
      
      Change-Id: I9fcb9afbcb560f234c7ae1b13ddb69eca3988ba1
      904b957a
    • Jerome Jiang's avatar
      Make vp9_scale_and_extend_frame_ssse3 work for hbd when bitdepth = 8. · 0998a146
      Jerome Jiang authored
      Only works for bitdepth = 8 when compiled with high bitdepth flag.
      4x speed ups for handling 1:2 down/upsampling.
      
      Validated manually for:
      1) Dynamic resize for a single layer encoding
      2) SVC encoding with 3 spatial layers
      Results are bitexact with the patch and the speed gain (~4x) in the
      scaling was verified.
      
      BUG=webm:1371
      
      Change-Id: I1bdb5f4d4bd0df67763fc271b6aa355e60f34712
      0998a146
    • Johann's avatar
      block error sse2: use tran_low_t · 3c16bbb7
      Johann authored
      Change-Id: Ib04990e4a7bda9fbf501f294da2057a2b2595deb
      3c16bbb7
  17. 16 Feb, 2017 5 commits
  18. 14 Feb, 2017 2 commits
  19. 07 Feb, 2017 1 commit
  20. 01 Feb, 2017 1 commit
    • Jingning Han's avatar
      Fix real-time compression regression in hbd mode · 969957f9
      Jingning Han authored
      This commit resolves the compression performance regression in
      real-time encoding setting when high bit-depth mode is enabled.
      
      The current solution temporarily disables the SIMD implementations
      of vpx_satd, hadamard8x8, and hadamard16x16 in high bit-depth mode.
      
      The commit makes the coding results bit-wise identical between
      regular coding pipeline and high bit-depth at profile 0.
      
      BUG=webm:1365
      
      Change-Id: Icfb900821733749685370460a1a5a7e07f76f4bf
      969957f9
  21. 12 Dec, 2016 1 commit
  22. 05 Nov, 2016 1 commit
  23. 30 Sep, 2016 1 commit
  24. 20 Sep, 2016 1 commit
  25. 12 Jul, 2016 1 commit
  26. 30 Jun, 2016 1 commit
  27. 08 Jun, 2016 1 commit
  28. 27 May, 2016 1 commit
    • Linfeng Zhang's avatar
      Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10. · af7fb17c
      Linfeng Zhang authored
      Function level timing test shows about 27% time saving on
      a Xeon E5-2680 v2 desktop.
      
      Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
      rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
      duplicate basenames.
      
      Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
      are identical. TODO: They should be unified later if there is
      no intention to keep a duplicate.
      
      Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
      af7fb17c
  29. 02 May, 2016 1 commit
  30. 31 Mar, 2016 1 commit
  31. 04 Feb, 2016 1 commit
  32. 14 Dec, 2015 1 commit