- 01 Jul, 2013 - 1 commit
-
-
Yaowu Xu authored
Change-Id: I921c9faba6386535aaf717a54301dd346a9b8540
-
- 29 Jun, 2013 - 12 commits
-
-
Jingning Han authored
-
Christian Duvivier authored
43,000 -> 5,750 cycles, about 7.5x faster. Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0
-
Ronald S. Bultje authored
-
Johann authored
-
James Zern authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
chm authored
- Add add_constant_residual_8x8 16x16 32x32 functions - Tested under RealView debugger enviroment Change-Id: I5c3a432f651b49bf375de6496353706a33e3e68e
-
Dmitry Kovalev authored
-
James Zern authored
since: 92479d95 Make update_partition_context faster fixes: vp9/common/vp9_blockd.h:408:22: error: non-constant-expression cannot be narrowed from type 'int' to 'char' in initializer list [-Wc++11-narrowing] char pcvalue[2] = {~(0xe << boffset), ~(0xf <<boffset)}; ^~~~~~~~~~~~~~~~~ Change-Id: Id5b00b9a72d00a2b314081a23879bd1fa3ce983b
-
Jingning Han authored
This commit enables SSE2 4x4 foward hybrid transform. The runtime goes from 249 cycles down to 74 cycles. Overall around 2% speed-up at no compression performance change. Change-Id: Iad4d526346e05c7be896466c05500711bb763660
-
Yaowu Xu authored
Change-Id: I692d800af1f976c84a76f8bd66864c4b39540abc
-
- 28 Jun, 2013 - 19 commits
-
-
Jingning Han authored
-
Dmitry Kovalev authored
Change-Id: Id641e5188adf55e53e606e5813ae45feaf7abbd2
-
Dmitry Kovalev authored
-
Jingning Han authored
Change-Id: I7c46354c4983feb5f6202c3ab4a1d9534da7e30f
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
-
Ronald S. Bultje authored
Makes cost_coeffs() a lot faster: 4x4: 236 -> 181 cycles 8x8: 888 -> 588 cycles 16x16: 3550 -> 2483 cycles 32x32: 17392 -> 12010 cycles Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup. Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
-
Dmitry Kovalev authored
-
Dmitry Kovalev authored
Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated ones from vp9_onyx_int.h and vp9_onyxd_int.h. Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
-
Ronald S. Bultje authored
4x4: 234 -> 236 cycles 8x8: 878 -> 888 cycles 16x16: 3664 -> 3550 cycles 32x32: 18134 -> 17392 cycles Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78
-
Ronald S. Bultje authored
Cycle timings for first 3 frames of bus (speed 0) at 1500kbps: 4x4: 298 -> 234 cycles 8x8: 1227 -> 878 cycles 16x16: 23426 -> 18134 cycles 32x32: 4906 -> 3664 cycles Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster. Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95
-
Ronald S. Bultje authored
This commit replaces zrun_zbin_boost, a method of biasing non-zero coefficients following runs of zero-coefficients to be rounded towards zero, with an explicit skip-block choice in the RD loop. The logic is basically that if individual coefficients should be rounded towards zero (from a RD point of view), the trellis/optimize loop should take care of it. If whole blocks should be zero (from a RD point of view), a single RD check is much more efficient than a complete serialization of the quantization loop. Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim. SIMD for quantize will follow in a separate patch. Results for other test sets pending. Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
-
Yaowu Xu authored
-
Yaowu Xu authored
-
Yaowu Xu authored
Change-Id: I379617c1c731a686b3f7e032b8805860c1055b12
-
Yaowu Xu authored
This commit change the partition search order to allow checking of rectangular partition to be done after square partitions. It also added a speed feature to skip rectangular partition check when NONE is better than SPLIT in RD sense. This feature roughly speed up encoder by 1.5X with loss on compression -0.91% on cif set -0.56% on stdhd set Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4
-
Ronald S. Bultje authored
-
James Zern authored
-
Ronald S. Bultje authored
Change-Id: I0b2be0ec2c410a527f88b95a44f24ac967b2dac1
-
- 27 Jun, 2013 - 8 commits
-
-
Dmitry Kovalev authored
Using vp9_set_pred_flag function instead of custom code, adding decode_tokens function which is now called from decode_atom, decode_sb_intra, and decode_sb. Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8
-
Frank Galligan authored
- Added vp9_loop_filter_horizontal_edge_neon and vp9_loop_filter_vertical_edge_neon. - The functions are based off the vp8 loopfilter functions. - Matches x86 md5 checksum. Change-Id: Id1c4dddb03584227e5ecd29f574a6ac27738fdd0
-
Dmitry Kovalev authored
-
Dmitry Kovalev authored
-
Ronald S. Bultje authored
Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from 3min15.0 to 3min10.9, i.e. 2.1% faster overall. Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c
-
Paul Wilkins authored
-
Paul Wilkins authored
-
Paul Wilkins authored
-