- 25 Jun, 2013 - 2 commits
-
-
Jingning Han authored
This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles. Change-Id: I137758b81cd127b936175284310e81378db64552
-
Jingning Han authored
This commit makes use of the butterfly structure to enable the sse2 version implementation of 8x8 ADST/DCT hybrid transform coding. The runtime of hybrid transform module goes down from 1170 cycles to 245 cycles. Overall speed-up around 1.5%. Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
-
- 21 Jun, 2013 - 5 commits
- 20 Jun, 2013 - 26 commits
-
-
Jim Bankoski authored
-
Jim Bankoski authored
Change-Id: Idfd69e66e8982275eb00d8007a55efd1a4f86a98
-
James Zern authored
-
Frank Galligan authored
- size_t vs int. Change-Id: Ib47ebd932a4b69db9f52a43000bb69d0a96b9134
-
James Zern authored
This reverts commit 90a9900a Seems to break the Mac build: src/include/gtest/internal/gtest-port.h:1208:: pthread_mutex_lock(&mutex_)failed with error 22 Abort trap: 6 Change-Id: Icbe31161d7c27f1b0a28d33409e7712430bbf0ae
-
Jingning Han authored
-
Johann authored
-
Dmitry Kovalev authored
-
Dmitry Kovalev authored
-
Deb Mukherjee authored
Improves the rd modeling function and implements them using interpolation from a table which is a little faster. Also uses sse as input to the modeling function rather than var - since there is no dc prediction used and as a result the sse works a little better. derfraw300: +0.05% Speedup: ~1% Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff
-
Johann authored
dboolhuff.c(50) : warning C4267: 'initializing' : conversion from 'size_t' to 'int' Change-Id: I6b85759efb2fa19f362f406623d8a7583a55c036
-
Jim Bankoski authored
adds a new speed feature to force partitioning to be greater than or less than a certain size Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0
-
Jim Bankoski authored
this feature lets you set a partitioning size to be used by the entire frame. Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06
-
Jim Bankoski authored
This uses variance to split partition. Variance is calculated using nearest mv, always from last ref frame. Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896
-
Jim Bankoski authored
Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
-
Jim Bankoski authored
Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427
-
Jim Bankoski authored
This uses the speed feature functionality for code. Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8
-
Jim Bankoski authored
force us to go through slow partitioning for keyframes, altref and overlays. Change-Id: I1a286361bf74083e71973575a7296be46eb98742
-
Ronald S. Bultje authored
Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 -> 3min58). Specific changes to timings for each function compared to original assembly-optimized versions (or just new version timings if no previous assembly-optimized version was available): sse2 4x4: 99 -> 82 cycles sse2 4x8: 128 cycles sse2 8x4: 121 cycles sse2 8x8: 149 -> 129 cycles sse2 8x16: 235 -> 245 cycles (?) sse2 16x8: 269 -> 203 cycles sse2 16x16: 441 -> 349 cycles sse2 16x32: 641 cycles sse2 32x16: 643 cycles sse2 32x32: 1733 -> 1154 cycles sse2 32x64: 2247 cycles sse2 64x32: 2323 cycles sse2 64x64: 6984 -> 4442 cycles ssse3 4x4: 100 cycles (?) ssse3 4x8: 103 cycles ssse3 8x4: 71 cycles ssse3 8x8: 147 cycles ssse3 8x16: 158 cycles ssse3 16x8: 188 -> 162 cycles ssse3 16x16: 316 -> 273 cycles ssse3 16x32: 535 cycles ssse3 32x16: 564 cycles ssse3 32x32: 973 cycles ssse3 32x64: 1930 cycles ssse3 64x32: 1922 cycles ssse3 64x64: 3760 cycles Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d
-
Jim Bankoski authored
need to rework these Change-Id: I17dc2c88d2faadd2f8fb117c52c25f04ea2e9856
-
Jim Bankoski authored
The new print out includes skips and has prefixed sections so you can grep to find things like transforms chosen on each frame. Change-Id: I195043424647d9514cfc3ff6720a5b20d010fa1b
-
Jim Bankoski authored
-
Jim Bankoski authored
Change-Id: I26e80ede80cb4389378a95afa95d229092a9859a
-
Jingning Han authored
Enable sign bias check and round-trip error unit tests for 4x4 hybrid transform modules. Change-Id: Icd3d839f098d4b92b00ff76eac146765b039d0d3
-
John Koleszar authored
-
Yaowu Xu authored
Since intra block decoding is handled by decode_sb_intra() separately. Change-Id: I42d757884714084c92fc23ec5d35d4dc946f4b15
-
- 19 Jun, 2013 - 6 commits
-
-
Dmitry Kovalev authored
Change-Id: Iab96e6a50aec543c63e15cd134f9d5f01ca7ceff
-
James Zern authored
currently threading is internal to libvpx so thread safety is unneeded in libgtest -- visual studio builds already operate in this way as they do not have pthread.h available by default. this removes an unconditional link to libpthread using $(extralibs) should libvpx require it. Change-Id: Ieae1d693406653a54b54fba818c598836797d33b
-
Yunqing Wang authored
-
Yunqing Wang authored
Optimized the quantization function by making it a two-pass process. The first pass does a quick checking of the transform coefficients against the base ZBIN, and only keep the good enough set of coefficients for quantization. A skipping check is added. If all coefficients are within the base ZBIN, no quantization is needed. The second pass is the actual quantization pass, which only processes the coefficient subset determined in first pass. This reduces the computation. Furthermore, an alternitive method is used for large transform size, which often has sparse nonzero quantized coefficients. Overall, the encoder speedup is about 4%. The quantization function itself gets 20% faster. Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22
-
Yaowu Xu authored
Change-Id: Ic924f07c6ab0c929c6cdf11880d3c625806e272c
-
Dmitry Kovalev authored
Change-Id: I183a38997a9d01e4a1b869e92509f6915216fa09
-
- 18 Jun, 2013 - 1 commit
-
-
John Koleszar authored
-