1. 20 Sep, 2017 1 commit
  2. 14 Aug, 2017 1 commit
  3. 04 Aug, 2017 2 commits
  4. 30 Jun, 2017 1 commit
  5. 26 Jun, 2017 2 commits
  6. 21 Jun, 2017 1 commit
    • Linfeng Zhang's avatar
      Clean 32x32 full idct sse2 and ssse3 code · 2b43a1ee
      Linfeng Zhang authored
      vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster
      than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are
      code relocations, no new code.
      
      Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
      2b43a1ee
  7. 15 Jun, 2017 2 commits
  8. 13 Jun, 2017 3 commits
  9. 21 Mar, 2017 1 commit
  10. 14 Mar, 2017 1 commit
  11. 11 Mar, 2017 1 commit
    • James Zern's avatar
      inv_txfm_ssse3,butterfly: fix win32 abi compatibility · 48fca113
      James Zern authored
      only the first 3 parameters can be aligned to 16 as required by __m128i,
      make them all pointers for consistency.
      
      since:
      07c48ccf Improve idct32x32_34_add SSSE3 intrinsics performance
      
      BUG=webm:1384
      
      Change-Id: I0324f701e723a27cb470036a180693ba8829d01d
      48fca113
  12. 10 Mar, 2017 1 commit
    • Yi Luo's avatar
      Improve idct32x32_135_add SSSE3 intrinsics performance · 327add99
      Yi Luo authored
      - Split the inv txfm into three parts to avoid stack spillover.
      - Function level speed improves ~12%.
      - Use function and macro to remove some repeated code.
      
      Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee
      327add99
  13. 01 Mar, 2017 1 commit
    • Yi Luo's avatar
      Improve idct32x32_34_add SSSE3 intrinsics performance · 07c48ccf
      Yi Luo authored
      - Split the transform into first half and second half.
      - Reschedule the instructions to avoid stack spillover.
      - Function level speed improves ~16%.
      
      Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35
      07c48ccf
  14. 17 Feb, 2017 2 commits
    • Yi Luo's avatar
      Fix idct8x8 SSSE3 SingleExtremeCoeff unit tests · 1f8e8e5b
      Yi Luo authored
      - In SSSE3 optimization, 16-bit addition and subtraction would
        overflow when input coefficient is 16-bit signed extreme values.
      - Function-level speed becomes slower (unit ms):
        idct8x8_64: 284 -> 294
        idct8x8_12: 145 -> 158.
      
      BUG=webm:1332
      
      Change-Id: I1e4bf9d30a6d4112b8cac5823729565bf145e40b
      1f8e8e5b
    • Yi Luo's avatar
      Replace idct32x32_1024_add_ssse3 assembly with intrinsics · f62dcc9c
      Yi Luo authored
      - Encoding/decoding test, BQTerrace_1920x1080_60.y4m, on
        i7-6700, no obvious user-level speed performance downgrade.
      - Passed unit tests.
      
      Change-Id: I20688e0dd3731021ec8fb4404734336f1a426bfc
      f62dcc9c
  15. 16 Feb, 2017 1 commit
    • Yi Luo's avatar
      Add idct32x32_135_add SSSE3 intrinsics · 72a43e23
      Yi Luo authored
      - Replace the corresponding assembly code.
      - No user level speed performance degrade.
      - Unit tests passed.
      
      Change-Id: Idd0c5a4bad4976f1617c34100cb46e75e3b961e5
      72a43e23
  16. 14 Feb, 2017 2 commits
  17. 08 Feb, 2017 1 commit
  18. 01 Feb, 2017 1 commit