- 03 Jul, 2013 - 1 commit
-
-
Jingning Han authored
These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT hybrid transform coding. Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d
-
- 29 Jun, 2013 - 1 commit
-
-
Christian Duvivier authored
43,000 -> 5,750 cycles, about 7.5x faster. Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0
-
- 25 Jun, 2013 - 1 commit
-
-
Jingning Han authored
This commit enables 8x8 DCT and hybrid transform unit tests. It also tunes the forward hybrid transform rounding opertions for more precise round-trip performance. Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3
-
- 18 Jun, 2013 - 1 commit
-
-
Jingning Han authored
This commit makes use of dual fdct32x32 versions for rate-distortion optimization loop and encoding process, respectively. The one for rd loop requires only 16 bits precision for intermediate steps. The original fdct32x32 that allows higher intermediate precision (18 bits) was retained for the encoding process only. This allows speed-up for fdct32x32 in the rd loop. No performance loss observed. Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3
-
- 30 May, 2013 - 1 commit
-
-
Yaowu Xu authored
The commit changed to use a new variant of Walsh-Hadamard Transform by Tim Terriberry. This new variant has the best compression among a number of variants that developed by Tim. Change-Id: Icb3a88515463cfc644b17ca046fcd139db2557e9
-
- 27 May, 2013 - 1 commit
-
-
Timothy B. Terriberry authored
Saves 1 add, 3 shifts (and a shift bias) per 1-D transform. Change-Id: I1104bb1679fe342b2f9677df8a9cdc0cb9699e7d
-
- 16 Apr, 2013 - 2 commits
-
-
Christian Duvivier authored
Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda
-
Christian Duvivier authored
Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda
-
- 15 Mar, 2013 - 1 commit
-
-
Christian Duvivier authored
Scalar path is about 1.5x faster (3.1% overall encoder speedup). SSE2 path is about 7.2x faster (7.8% overall encoder speedup). Change-Id: I06da5ad0cdae2488431eabf002b0d898d66d8289
-
- 13 Mar, 2013 - 1 commit
-
-
Yaowu Xu authored
The commit changed the name of files and function to remove obselete reference to LLM and x8. Change-Id: I973b20fc1a55149ed68b5408b3874768e6f88516
-
- 28 Feb, 2013 - 1 commit
-
-
Christian Duvivier authored
Scalar path is about 1.4x faster (4% overall encoder speedup). SSE2 path is about 7x faster (13% overall encoder speedup). Change-Id: I7e85d8225a914a74c61ea370210414696560094d
-
- 27 Feb, 2013 - 1 commit
-
-
Dmitry Kovalev authored
Fixing code style, using array lookup instead of switch statements for forward hybrid transforms (in the same way as for their inverses). Consistent usage of ROUND_POWER_OF_TWO macro in appropriate places. Change-Id: I0d3822ae11f928905fdbfbe4158f91d97c71015f
-
- 26 Feb, 2013 - 2 commits
-
-
Yaowu Xu authored
The commit improves the 32x32 forward dct implementation: 1. change to use same constants and rounding as other forward dcts 2. select rounding to specifically minimize the roundtrip error, which improved average 19/block to .77/block using 100000 random input. Test showed a small but consistent gain on all test sets, about .15% Change-Id: If0afd6a71880a522f60c1c234be0462092c2eb53
-
Dmitry Kovalev authored
Pitch now means the number of elements, not the number of bytes. Change-Id: Idb9f2f012e39b09d596a3cc1802305a80b7c13af
-
- 25 Feb, 2013 - 3 commits
-
-
Jingning Han authored
Increase the first stage dynamic range by 4 times, and reduce it back with proper rounding before applying the second stage. Hence it still fits in the given dynamic range and slightly improves the key frame coding performance. Change-Id: Ia4c5907446f20a95dc3de079c314b3ad1221d8aa
-
Jingning Han authored
Rebased. Remove the old matrix multiplication transform computation. The 16x16 ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16 300/0 in vp9/common/vp9_blockd.h. Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f
-
Yaowu Xu authored
This commit added pre/post scaling for first half of fDCT16x16 to reduce error, by simulation of 100,000 blocks for random inputs, the average sse reduced from 2.1/block to 0.0498/block. also enabled tests for 16x16 fDCT and iDCT Change-Id: Id2a95f0464c6dd4118797d456237ae90274c0f02
-
- 23 Feb, 2013 - 1 commit
-
-
Yaowu Xu authored
The commit added a final rounding choice for 8x8 forward dct to get rid of a sign bias at DC position and improve the accuracry in term of round trip error for 8x8 fDCT/iDCT. This commit also enabled forward 8x8 dct test. Change-Id: Ib67f99b0a24d513e230c7812bc04569d472fdc50
-
- 22 Feb, 2013 - 1 commit
-
-
Jingning Han authored
This patch includes 4x4, 8x8, and 16x16 forward butterfly ADST/DCT hybrid transform. The kernel of 4x4 ADST is sin((2k+1)*(n+1)/(2N+1)). The kernel of 8x8/16x16 ADST is of the form sin((2k+1)*(2n+1)/4N). Change-Id: I8f1ab3843ce32eb287ab766f92e0611e1c5cb4c1
-
- 20 Feb, 2013 - 1 commit
-
-
Yaowu Xu authored
Change-Id: I7b7b8d4fda3a23699e0c920d727f8c15d37d43aa
-
- 15 Feb, 2013 - 1 commit
-
-
Ronald S. Bultje authored
Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78
-
- 14 Feb, 2013 - 1 commit
-
-
Yunqing Wang authored
Used same algorithm as others. Change-Id: Ifdac560762aec9735cb4bb6f1dbf549e415c38a0
-
- 13 Feb, 2013 - 1 commit
-
-
Paul Wilkins authored
Removal of experiment to simplify code base for other changes. Change-Id: If0a33952504558511926ad212bc311fc2bffb19a
-
- 12 Feb, 2013 - 1 commit
-
-
Yunqing Wang authored
Use consistent algorithm. Change-Id: Ib8484821ebc454b9d3380a3d6571798decd037f3
-
- 08 Feb, 2013 - 1 commit
-
-
Yunqing Wang authored
Test on derf set showed 0.047% overall psnr change. Change-Id: Id16c276c251a3943850ac9b95e9b09a56cf42b19
-
- 07 Feb, 2013 - 3 commits
-
-
Yaowu Xu authored
also removed some un-unsed functions. Change-Id: Ie363bcc8d94441d054137d2ef7c4fe59f56027e5
-
Jingning Han authored
Refactor the 8x8 inverse hybrid transform. It is now consistent with the new inverse DCT. Overall performance loss (due to the use of this variant ADST, and the rounding errors in the butterfly implementation) for std-hd is -0.02. Fixed BUILD warning. Devise a variant of the original ADST, which allows butterfly computation structure. This new transform has kernel of the form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures using floating-point multiplications was reported in Z. Wang, "Fast algorithms for the discrete W transform and for the discrete Fourier transform", IEEE Trans. on ASSP, 1984. This patch includes the butterfly implementation of the inverse ADST/DCT hybrid transform of dimension 8x8. Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d
-
Ronald S. Bultje authored
Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24
-
- 05 Feb, 2013 - 1 commit
-
-
Yaowu Xu authored
This commit changes the 4x4 iDCT to use same algorithm & constants as other iDCTs. The 4x4 fDCT is also changed to be based on the new iDCT. Change-Id: Ib1a902693228af903862e1f5a08078c36f2089b0
-
- 31 Jan, 2013 - 1 commit
-
-
Yaowu Xu authored
The commit fixes a minor error in 16 point fdct where in a rotation can produce result of -1 instead of 0. Change-Id: I45aac4a52bcd06225c6d04e643547a13e1c1aade
-
- 14 Jan, 2013 - 2 commits
-
-
Yaowu Xu authored
The warnings caused verify failure with gerrit for several commits Change-Id: I030df8638bd69b8783a3ac58e720ff9f0bfd546c
-
John Koleszar authored
Previous commit does not build cleanly on Jenkins with the DWT/DCT hybrid experiment enabled (--enable-dwtdcthybrid). Change-Id: Ia67e8f59d17ef2d5200ec6b90dfe6711ed6835a5
-
- 13 Jan, 2013 - 1 commit
-
-
Deb Mukherjee authored
Fixes some scaling issues. Adds an option to only compute the dct on the low-low subband for 32x32 and 64x64 blocks using only a single 16x16 dct after 1 and 2 wavelet decomposition levels respectively. Also adds an option to use a 8x8 dct as building block. Currenlty with the 2/6 filter and with a single 16x16 dct on the low low band, the reuslts compared to full 32x32 dct is as follows: derf: -0.15% yt: -0.29% std-hd: -0.18% hd: -0.6% These are my current recommended settings, since the 2/6 filter is very simple. Results with 8x8 dct are about 0.3% worse. Change-Id: I00100cdc96e32deced591985785ef0d06f325e44
-
- 10 Jan, 2013 - 1 commit
-
-
Ronald S. Bultje authored
Change-Id: I615651e4c7b09e576a341ad425cf80c393637833
-
- 08 Jan, 2013 - 1 commit
-
-
Deb Mukherjee authored
This is to add to the 64x64 transform experiment as an alternative to a 64x64 DCT. Two levels of wavelet decomposition is used on a 64x64 block, followed by 16x16 DCT on the four lowest subbands. The highest three subbands are left untransformed after the first level DWT. Change-Id: I3d48d5800468d655191933894df6b46e15adca56
-
- 26 Dec, 2012 - 1 commit
-
-
John Koleszar authored
Various fixups to resolve issues when building vp9-preview under the more stringent checks placed on the experimental branch. Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07
-
- 13 Dec, 2012 - 1 commit
-
-
Deb Mukherjee authored
Modifies the scanning pattern and uses a floating point 16x16 dct implementation for now to handle scaling better. Also experiments are in progress with 2/6 and 9/7 wavelets. Results have improved to within ~0.25% of 32x32 dct for std-hd and about 0.03% for derf. This difference can probably be bridged by re-optimizing the entropy stats for these transforms. Currently the stats used are common between 32x32 dct and dwt/dct. Experiments are in progress with various scan pattern - wavelet combinations. Ideally the subbands should be tokenized separately, and an experiment will be condcuted next on that. Change-Id: Ia9cbfc2d63cb7a47e562b2cd9341caf962bcc110
-
- 07 Dec, 2012 - 1 commit
-
-
Ronald S. Bultje authored
This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds code all over the place to wrap that in the bitstream/encoder/decoder/RD. Some implementation notes (these probably need careful review): - token range is extended by 1 bit, since the value range out of this transform is [-16384,16383]. - the coefficients coming out of the FDCT are manually scaled back by 1 bit, or else they won't fit in int16_t (they are 17 bits). Because of this, the RD error scoring does not right-shift the MSE score by two (unlike for 4x4/8x8/16x16). - to compensate for this loss in precision, the quantizer is halved also. This is currently a little hacky. - FDCT and IDCT is double-only right now. Needs a fixed-point impl. - There are no default probabilities for the 32x32 transform yet; I'm simply using the 16x16 luma ones. A future commit will add newly generated probabilities for all transforms. - No ADST version. I don't think we'll add one for this level; if an ADST is desired, transform-size selection can scale back to 16x16 or lower, and use an ADST at that level. Additional notes specific to Debargha's DWT/DCT hybrid: - coefficient scale is different for the top/left 16x16 (DCT-over-DWT) block than for the rest (DWT pixel differences) of the block. Therefore, RD error scoring isn't easily scalable between coefficient and pixel domain. Thus, unfortunately, we need to compute the RD distortion in the pixel domain until we figure out how to scale these appropriately. Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b
-
- 27 Nov, 2012 - 1 commit
-
-
John Koleszar authored
Support for gyp which doesn't support multiple objects in the same static library having the same basename. Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
-
- 25 Nov, 2012 - 1 commit
-
-
Jim Bankoski authored
More cleanup to do after this, but this is a good chunk of removing rtcd. Change-Id: I551db75e341a0a85c3ad650df1e9a60dc305681a
-