- 01 Dec, 2017 1 commit
-
-
Johann authored
nasm should infer .text but does not for windows: https://bugzilla.nasm.us/show_bug.cgi?id=3392451 Change-Id: Ib195465e5f33405f5ff61c4cf88aa2a72640cacb
-
- 29 Nov, 2017 1 commit
-
-
Kyle Siefring authored
Change-Id: Ie60381a0c6ee01f828cd364a43f01517f4cb03e9
-
- 27 Nov, 2017 1 commit
-
-
Johann authored
Change-Id: I9f95f47bc7ecbb7980f21cbc3a91f699624141af
-
- 17 Nov, 2017 1 commit
-
-
Kyle Siefring authored
Change-Id: If8b91aaa883c01107f0ea3468139fa24cfb301d2
-
- 15 Nov, 2017 1 commit
-
-
Johann authored
Fixes a build issue when relocation is not allowed: relocation R_X86_64_32 against '.rodata' can not be used when making a shared object Change-Id: Ica3e90c926847bc384e818d7854f0030f4d69aa0
-
- 10 Nov, 2017 1 commit
-
-
Scott LaVarnway authored
SSE2 instrinsic vs AVX2 intrinsic speed gains: blocksize 16: ~1.33 blocksize 64: ~1.51 blocksize 256: ~3.03 blocksize 1024: ~3.71 Change-Id: I79b28cba82d21f9dd765e79881aa16d24fd0cb58
-
- 09 Nov, 2017 1 commit
-
-
Scott LaVarnway authored
Change-Id: I7364a157de39eb7137b599808474b8d46d19d376
-
- 03 Nov, 2017 1 commit
-
-
Kyle Siefring authored
The added AVX-512 support requires the subset of AVX-512 added in Skylake-X. Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7
-
- 26 Oct, 2017 1 commit
-
-
Scott LaVarnway authored
Eliminates the following instruction for the x86 (64 bit) intrinsic code: movslq %esi,%rax Change-Id: I8f5ebd40726f998708a668b0f52ea7a0576befae
-
- 24 Oct, 2017 1 commit
-
-
Kyle Siefring authored
Changed the intrinsics to perform summation similiar to the way the assembly does. The new code diverges from the assembly by preferring unsaturated additions. Results for haswell SSSE3 Horiz/Vert Size Speedup Horiz x4 ~32% Horiz x8 ~6% Vert x8 ~4% AVX2 Horiz/Vert Size Speedup Horiz x16 ~16% Vert x16 ~14% BUG=webm:1471 Change-Id: I7ad98ea688c904b1ba324adf8eb977873c8b8668
-
- 23 Oct, 2017 1 commit
-
-
Scott LaVarnway authored
Use an intermediate buffer before storing to coeffs when highbitdepth is enabled. Change-Id: I101981a1995f1108ad107c55c37d6e09eadb404b
-
- 20 Oct, 2017 1 commit
-
-
Scott LaVarnway authored
~10% performance gain. Fixed the cosmetics noted in the previous commit. Change-Id: Iddf475f34d0d0a3e356b2143682aeabac459ed13
-
- 19 Oct, 2017 1 commit
-
-
Scott LaVarnway authored
This version is ~1.91x faster than the sse2 version. When highbitdepth is enabled, it is ~1.74x. Change-Id: I2b0e92ede9f55c6259ca07bf1f8c8a5d0d0955bd
-
- 17 Oct, 2017 1 commit
-
-
Kyle Siefring authored
Change-Id: I6539111dfb35a43028e9755785b2e9ea31854305
-
- 16 Oct, 2017 1 commit
-
-
Linfeng Zhang authored
Note this change will trigger the different C version on SSSE3 and generate different scaled output. Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3(). Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
-
- 10 Oct, 2017 1 commit
-
-
Linfeng Zhang authored
Change-Id: I51c190f0a88685867df36912522e67bdae58a673
-
- 09 Oct, 2017 1 commit
-
-
Kyle Siefring authored
Also adds vpx_convolve8_avg_horiz_avx2. Change-Id: I38783d972ac26bec77610e9e15a0a058ed498cbf
-
- 08 Oct, 2017 1 commit
-
-
Kyle Siefring authored
vpx_convolve8_avg works by first running a normal horizontal filter then a vertical filter averages at the end. The added vpx_convolve8_avg_avx2 calls pre-existing AVX2 code for the horizontal step. vpx_convolve8_avg_vert_avx2 is also added, but only uses ssse3 code. Change-Id: If5160c0c8e778e10de61ee9bf42ee4be5975c983
-
- 04 Oct, 2017 1 commit
-
-
Linfeng Zhang authored
Change-Id: I882da3a04884d5fabd4cd591c28682cbb2d76aa5
-
- 03 Oct, 2017 4 commits
-
-
Scott LaVarnway authored
BUG=webm:1462,766721 Change-Id: Icfa536a8e38623636b96c396e3c94889bfde7a98
-
Linfeng Zhang authored
Change-Id: Id6a8c549709a3c516ed5d7b719b05117c5ef8bac
-
Linfeng Zhang authored
Add some load and store sse2 inline functions. Change-Id: Ib1e0650b5a3d8e2b3736ab7c7642d6e384354222
-
Linfeng Zhang authored
Change-Id: Ic369dd86b3b81686f68fbc13ad34ab8ea8846878
-
- 29 Sep, 2017 1 commit
-
-
Scott LaVarnway authored
C vs SSE2 speed gains: _4x4 : ~1.81x C vs SSSE3 speed gains: _8x8 : ~1.96x _16x16 : ~1.88x _32x32 : ~2.02x BUG=webm:1411 Change-Id: Iefaf8b39afbbfe34c1ad1d21e3a003b20f1f61e0
-
- 28 Sep, 2017 1 commit
-
-
Scott LaVarnway authored
C vs SSE2 speed gains: _4x4 : ~2.04x C vs SSSE3 speed gains: _8x8 : ~2.82x _16x16 : ~5.93x _32x32 : ~2.79x BUG=webm:1411 Change-Id: I31d949695991c067dac89d91e0bed3e666c94993
-
- 27 Sep, 2017 2 commits
-
-
Linfeng Zhang authored
Exposed by fuzz test in high bitdepth. The bug is introduced in commit 64653fa1. BUG=webm:1466 Change-Id: Idd77d5c6a60efb9241471611ce1aba0646cb6ff5
-
Scott LaVarnway authored
C vs SSE2 speed gains: _4x4 : ~1.95x C vs SSSE3 speed gains: _8x8 : ~3.30x _16x16 : ~5.67x _32x32 : ~3.87x BUG=webm:1411 Change-Id: Ib483989b25614aa89b635e8c087d0879a5d71904
-
- 22 Sep, 2017 1 commit
-
-
Scott LaVarnway authored
C vs SSSE3 speed gains: _4x4 : ~2.45x _8x8 : ~10.61x _16x16 : ~11.34x _32x32 : ~6.36x BUG=webm:1411 Change-Id: Ic91389a4f1a8ad093f498afe53765b897fb9be09
-
- 20 Sep, 2017 3 commits
-
-
Linfeng Zhang authored
BUG=webm:1450 Change-Id: If59743aafe99226e0ec67ab5d20678ce25f53ab8
-
Linfeng Zhang authored
BUG=webm:1450 Change-Id: Ib046fe28caec5b9ebdc9d0152df7c54ff4266858
-
Linfeng Zhang authored
The unnecessary upcast to (int) will be cleaned later. BUG=webm:1450 Change-Id: Ia234575206d5a74540526924b06ed3939322d063
-
- 19 Sep, 2017 2 commits
-
-
Linfeng Zhang authored
Extract a couple of static functions into their caller functions. Change-Id: If8d8a0e217fba6b402d2a79ede13b5b444ff08a0
-
Scott LaVarnway authored
C vs SSE2 speed gains: _4x4 : ~2.94x C vs SSSE3 speed gains: _8x8 : ~8.69x _16x16 : ~6.32x _32x32 : ~5.33x BUG=webm:1411 Change-Id: I2c35b527eac2229f17aaa9d118fb601e7195efe4
-
- 12 Sep, 2017 1 commit
-
-
Johann authored
This reverts commit 8c42237b. Because ssse3 code is used for the reference, the qcoeff and dqcoeff reference buffers must be aligned. Original change's description: > quantize avx: copy 32x32 implementation > > Ensure avx and ssse3 stay in sync by testing them against each other. > > Change-Id: I699f3b48785c83260825402d7826231f475f697c Change-Id: Ieeef11b9406964194028b0d81d84bcb63296ae06
-
- 11 Sep, 2017 1 commit
-
-
Scott LaVarnway authored
C vs SSE2 speed gains: _4x4 : ~2.31x C vs SSSE3 speed gains: _8x8 : ~4.73x _16x16 : ~10.88x _32x32 : ~4.80x BUG=webm:1411 Change-Id: I0bac29db261079181ddabc6814bd62c463109caf
-
- 07 Sep, 2017 1 commit
-
-
Linfeng Zhang authored
So that 4 to 1 frame scaling can call them. Change-Id: I9ec438aa63b923ba164ad3c59d7ecfa12789eab5
-
- 05 Sep, 2017 2 commits
-
-
Linfeng Zhang authored
so that the convolve functions are independent of table alignment. Change-Id: Ieab132a30d72c6e75bbe9473544fbe2cf51541ee
-
Scott LaVarnway authored
C vs SSE2 speed gains: _4x4 : ~7.64x _8x8 : ~16.60x _16x16 : ~8.15x _32x32 : ~5.05x BUG=webm:1411 Change-Id: If165d419711cfda901bd428a05ca1560a009e62e
-
- 30 Aug, 2017 1 commit
-
-
Scott LaVarnway authored
C vs SSE2 speed gains: _4x4 : ~6.49x _8x8 : ~10.82x _16x16 : ~7.61x _32x32 : ~5.29x BUG=webm:1411 Change-Id: Ibc30c50cb7139049bf05298010803499e6ef949b
-
- 29 Aug, 2017 1 commit
-
-
Scott LaVarnway authored
C vs SSE2 speed gains: _4x4 : ~7.39x _8x8 : ~11.36x _16x16 : ~8.68x _32x32 : ~4.33x BUG=webm:1411 Change-Id: I7f1487cd1531d4e7f0fbb4596fed3bfb72a59d58
-