- 16 Jul, 2013 - 1 commit
-
-
Ronald S. Bultje authored
Cycle times: 4x4: 151 to 131 cycles (15% faster) 8x8: 334 to 306 cycles (9% faster) 16x16: 1401 to 1368 cycles (2.5% faster) 32x32: 7403 to 7367 cycles (0.5% faster) Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup. Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
-
- 11 Jul, 2013 - 1 commit
-
-
Dmitry Kovalev authored
Adding segmentation struct to vp9_seg_common.h. Struct members are from macroblockd and VP9Common structs. Moving segmentation related constants and enums to vp9_seg_common.h. Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
-
- 01 Jul, 2013 - 2 commits
-
-
Ronald S. Bultje authored
Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to 2min10.1, i.e. a 2.3% overall speed increase. Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87
-
Ronald S. Bultje authored
Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
-
- 28 Jun, 2013 - 1 commit
-
-
Ronald S. Bultje authored
This commit replaces zrun_zbin_boost, a method of biasing non-zero coefficients following runs of zero-coefficients to be rounded towards zero, with an explicit skip-block choice in the RD loop. The logic is basically that if individual coefficients should be rounded towards zero (from a RD point of view), the trellis/optimize loop should take care of it. If whole blocks should be zero (from a RD point of view), a single RD check is much more efficient than a complete serialization of the quantization loop. Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim. SIMD for quantize will follow in a separate patch. Results for other test sets pending. Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
-
- 27 Jun, 2013 - 1 commit
-
-
Ronald S. Bultje authored
Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from 3min15.0 to 3min10.9, i.e. 2.1% faster overall. Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c
-
- 19 Jun, 2013 - 1 commit
-
-
Yunqing Wang authored
Optimized the quantization function by making it a two-pass process. The first pass does a quick checking of the transform coefficients against the base ZBIN, and only keep the good enough set of coefficients for quantization. A skipping check is added. If all coefficients are within the base ZBIN, no quantization is needed. The second pass is the actual quantization pass, which only processes the coefficient subset determined in first pass. This reduces the computation. Furthermore, an alternitive method is used for large transform size, which often has sparse nonzero quantized coefficients. Overall, the encoder speedup is about 4%. The quantization function itself gets 20% faster. Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22
-
- 23 May, 2013 - 1 commit
-
-
Paul Wilkins authored
Removal from under configure flag. A bit renaming Change-Id: I2213229dfe852001dfec16b149f47c52ce88f3aa
-
- 17 May, 2013 - 1 commit
-
-
John Koleszar authored
This is a mostly-working implementation of an extra channel in the bitstream. Configure with --enable-alpha to test. Notable TODOs: - Add extra channel to all mismatch tests, PSNR, SSIM, etc - Configurable subsampling - Variable number of planes (currently always uses all 4) - Loop filtering - Per-plane lossless quantizer - ARNR support This implementation just uses the same contents as the Y channel for the A channel, due to lack of content and general pain in playing back 4 channel content. A later patch will use the actual alpha channel passed in from outside the codec. Change-Id: Ibf81f023b1c570bd84b3064e9b4b8ae52e087592
-
- 07 May, 2013 - 3 commits
-
-
Dmitry Kovalev authored
Change-Id: If57e360c187a475fc90edb8c7170f498efcb31a5
-
Paul Wilkins authored
Skip Q values between the q.0 mode and a real q of 2.0 as these are not valuable from an RD perspective. Change-Id: I110c4858c57f97315953f4d88a2596d4764360df
-
Jingning Han authored
Pull sb8x8 out of experimental list. verified via borg run tests. Fixed unit test failures. Change-Id: I12a4bbd17395930580c048ab68becad1ffe46e76
-
- 03 May, 2013 - 1 commit
-
-
John Koleszar authored
This allows removing a large number of transform size specific functions, as well as supporting 444/alpha by routing all code through the subsampling-aware path. Change-Id: Ieb085cebe9f37f24fc24de179898b22abfda08a4
-
- 02 May, 2013 - 1 commit
-
-
John Koleszar authored
Creates a common encode (subtract, transform, quantize, optimize, inverse transform, reconstruct) function for all sb sizes, including the old 16x16 path. Change-Id: I964dff1ea7a0a5c378046a069ad83495f54df007
-
- 01 May, 2013 - 1 commit
-
-
Scott LaVarnway authored
Currently, only two values are used. Removed the unused values. Change-Id: Idc5b8be354d84ffc68df39ea3e45f9f50d977b35
-
- 30 Apr, 2013 - 2 commits
-
-
Ronald S. Bultje authored
Work-in-progress, not yet ready for review. TODO items: - bitstream writing (encoder) and reading (decoder) - decoder reconstruction Change-Id: I5afb7284e7e0480847b47cd0097cb469433c9081
-
Dmitry Kovalev authored
Moving common code from encoder and decoder to vp9_get_qindex function. Also moving quant-related constants from vp9_onyxc_int.h to vp9_quant_common.h. Change-Id: I70c5bfbaa1c8bf00fde0bfc459d077f88b6d46c8
-
- 26 Apr, 2013 - 1 commit
-
-
Dmitry Kovalev authored
Change-Id: I3a6d601e90e8740b9c26dd0afbfe9d467b75d367
-
- 25 Apr, 2013 - 3 commits
-
-
John Koleszar authored
There were 4 variants of the quantize loop in vp9_quantize.c, now there is 1. Change-Id: Ic853393411214b32d46a6ba53769413bd14e1cac
-
Ronald S. Bultje authored
Basic assumption: when talking about transform units, use b_; when talking about macroblock indices, use mb_. Change-Id: Ifd163f595d4924ff892de4eb0401ccd56dc81884
-
John Koleszar authored
This data can vary per-plane, but not per-block. Change-Id: I1971b0b2c2e697d2118e38b54ef446e52f63c65a
-
- 24 Apr, 2013 - 2 commits
-
-
John Koleszar authored
This data is fixed at the MB level, so move it to the common part of MACROBLOCK. Change-Id: Idd8c87118e501cdf0a202bd84c28b502a8234edf
-
John Koleszar authored
Quantizers can vary per plane, but not per block. Move these values to the per-plane part of MACROBLOCK. Change-Id: I320a55e38b7b28b29aec751a4aca5ccd0c9b9326
-
- 23 Apr, 2013 - 1 commit
-
-
John Koleszar authored
This commit moves the coeff storage from the MACROBLOCK struct to its per-plane part. The next commit will remove the coeff member from the BLOCK structure so that it is consistently accessed per-plane. Also refactors vp9_sb_block_error_c and vp9_sb_uv_block_error_c to be variable subsampling aware. Change-Id: I18c30f87f27c3a012119b6c1970d5fa499804455
-
- 22 Apr, 2013 - 2 commits
-
-
Dmitry Kovalev authored
Change-Id: Id4306ef6d65d4a3984aed50b775bdf48d4f6c438
-
Deb Mukherjee authored
This patch does not seem to give any benefits. Change-Id: I9d2b4091d6af3dfc0875f24db86c01e2de57f8db
-
- 16 Apr, 2013 - 1 commit
-
-
Dmitry Kovalev authored
New names are y_dc_delta_q, uv_dc_delta_q, uv_ac_delta_q. Change-Id: I4acae1fc23a4697ce2c5a5becb8dc28ef0a4b552
-
- 15 Apr, 2013 - 1 commit
-
-
Ronald S. Bultje authored
Change-Id: I697514efd6024e1b4153bbde58ae5e323b030981
-
- 11 Apr, 2013 - 1 commit
-
-
Ronald S. Bultje authored
More specifically, remove vp9_quantize_mb*, vp9_optimize_mb*, vp9_inverse_transform_mb* and vp9_transform_mb*. Instead, use the generic _sb* functions that take a size argument, and call them with BLOCK_SIZE_MB16X16. Change-Id: I33024afea95d3a23ffbc1df7da426e4645110f29
-
- 10 Apr, 2013 - 1 commit
-
-
Ronald S. Bultje authored
Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code gives identical encoder results before and after. There are a few macros for rectangular block sizes under the sbsegment experiment; this experiment is not yet functional and should not yet be used. Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728
-
- 09 Apr, 2013 - 1 commit
-
-
Dmitry Kovalev authored
Renaming Y1dequant to y_dequant, UVdequant to uv_dequant, QIndex to qindex. Change-Id: I1c356e5f886deb3f8807dc212de9799b55b09d58
-
- 05 Apr, 2013 - 1 commit
-
-
John Koleszar authored
Continue migrating data from BLOCKD/MACROBLOCKD to the per-plane structures. Change-Id: Ibbfa68d6da438d32dcbe8df68245ee28b0a2fa2c
-
- 04 Apr, 2013 - 1 commit
-
-
John Koleszar authored
Start grouping data per-plane, as part of refactoring to support additional planes, and chroma planes with other-than 4:2:0 subsampling. Change-Id: Idb76a0e23ab239180c818025bae1f36f1608bb23
-
- 26 Mar, 2013 - 2 commits
-
-
Ronald S. Bultje authored
These are mostly just for experimental purposes. I saw small gains (in the 0.1% range) when playing with this on derf. Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc
-
Deb Mukherjee authored
Replaces the default tables for single coefficient magnitudes with those obtained from an appropriate distribution. The EOB node is left unchanged. The model is represeted as a 256-size codebook where the index corresponds to the probability of the Zero or the One node. Two variations are implemented corresponding to whether the Zero node or the One-node is used as the peg. The main advantage is that the default prob tables will become considerably smaller and manageable. Besides there is substantially less risk of over-fitting for a training set. Various distributions are tried and the one that gives the best results is the family of Generalized Gaussian distributions with shape parameter 0.75. The results are within about 0.2% of fully trained tables for the Zero peg variant, and within 0.1% of the One peg variant. The forward updates are optionally (controlled by a macro) model-based, i.e. restricted to only convey probabilities from the codebook. Backward updates can also be optionally (controlled by another macro) model-based, but is turned off by default. Currently model-based forward updates work about the same as unconstrained updates, but there is a drop in performance with backward-updates being model based. The model based approach also allows the probabilities for the key frames to be adjusted from the defaults based on the base_qindex of the frame. Currently the adjustment function is a placeholder that adjusts the prob of EOB and Zero node from the nominal one at higher quality (lower qindex) or lower quality (higher qindex) ends of the range. The rest of the probabilities are then derived based on the model from the adjusted prob of zero. Change-Id: Iae050f3cbcc6d8b3f204e8dc395ae47b3b2192c9
-
- 07 Mar, 2013 - 3 commits
-
-
Dmitry Kovalev authored
Change-Id: I44660975e9985310d8c654c158ee7a61291b5a08
-
Ronald S. Bultje authored
This also changes the RD search to take account of the correct block index when searching (this is required for ADST positioning to work correctly in combination with tx_select). Change-Id: Ie50d05b3a024a64ecd0b376887aa38ac5f7b6af6
-
Deb Mukherjee authored
This patch revamps the entropy coding of coefficients to code first a non-zero count per coded block and correspondingly remove the EOB token from the token set. STATUS: Main encode/decode code achieving encode/decode sync - done. Forward and backward probability updates to the nzcs - done. Rd costing updates for nzcs - done. Note: The dynamic progrmaming apporach used in trellis quantization is not exactly compatible with nzcs. A suboptimal approach has been used instead where branch costs are updated to account for changes in the nzcs. TODO: Training the default probs/counts for nzcs Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
-
- 05 Mar, 2013 - 1 commit
-
-
Ronald S. Bultje authored
Split macroblock and superblock tokenization and detokenization functions and coefficient-related data structs so that the bitstream layout and related code of superblock coefficients looks less like it's a hack to fit macroblocks in superblocks. In addition, unify chroma transform size selection from luma transform size (i.e. always use the same size, as long as it fits the predictor); in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma transform will now use the 16x16 (instead of the 8x8) chroma transform, and 64x64 superblocks using the 32x32 luma transform will now use the 32x32 (instead of the 16x16) chroma transform. Lastly, add a trellis optimize function for 32x32 transform blocks. HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's a few negative points here and there that I might want to analyze a little closer. Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
-
- 27 Feb, 2013 - 1 commit
-
-
Ronald S. Bultje authored
Consistent with VP8. Change-Id: I8c316ee49f072e15abbb033a80e9c36617891f07
-