- 17 Jul, 2013 - 5 commits
-
-
Ronald S. Bultje authored
+0.2% SSIM and glbPSNR on derfraw300. Change-Id: I9cba0bca55e606a22f557c7732b064f738efe84d
-
Yunqing Wang authored
Current partition checking starts from small sizes, and then goes up to large sizes. This experiment uses the small partitions' motion estimation result, which is already available, to speed up the large partition's motion estimation. We can decide to skip some patition checkings if they are unlikely choices. We could use the motion vector(MV) result as current partition's prediction MV, limit the search range and reference frame. Current result at speed 1: psnr loss: 1.19% for stdhd, 0.287% for derf. speed gain: 14% for sunflower(hd), 11% for akiyo. Further improvement will be done later. Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
-
Paul Wilkins authored
Use an estimate based on DC_PRED for intra uv cost within the rd loop then only do a full uv mode analysis if an intra mode is chosen. Significant speed gains in some cases. Currently only enabled for speed 2 pending speed/quality tests. Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0
-
Paul Wilkins authored
Apply limit if search_method == USE_LARGESTALL to the range of UV tx sizes searched. Change-Id: I6db29f0dd237285ffc50d75a37e8b68151ad821c
-
Jingning Han authored
This commit makes the encoder to perform motion search only once per reference frame type for each 4x4/4x8/8x4 block. For bus_cif at 2000 kbps, the runtime goes from 253812ms -> 217817ms (14% speed-up) for speed 0. Change-Id: I5f17599ccc8cfaf93ccb4f98fcb6008af6d79e92
-
- 16 Jul, 2013 - 8 commits
-
-
Dmitry Kovalev authored
Change-Id: I4884cdc2557d25d50c7c4f7e19b1ad8bdb93cd63
-
Dmitry Kovalev authored
Removing tile_rows and tile_columns from VP9Common, removing redundant constants MIN_TILE_WIDTH and MAX_TILE_WIDTH, changing signature of vp9_get_tile_n_bits. Change-Id: I8ff3104a38179b2c6900df965c144c1d6f602267
-
James Zern authored
s/frame_rate/framerate/g Change-Id: I6fc3e088e419c5f46e3a9390dd8a2cad2677a2fc
-
Dmitry Kovalev authored
Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent with vp9_get_segment_id without using confusing sub(a, b) macro. Passing mi_row and mi_col to functions explicitly instead of replying on mb_to_right_edge and mb_to_bottom_edge. Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435
-
Paul Wilkins authored
Change-Id: I94b97a966b5efbc9a243048f1f5ddbbdc4b1846e
-
Yaowu Xu authored
This is a short term optimization till we work out a decoder implementation requiring no frame border extension. Change-Id: I02d15bfde4d926b50a4e58b393d8c4062d1be70f
-
Dmitry Kovalev authored
Removing unused and duplicated constants, moving them from *.h to *.c if possible. Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f
-
Ronald S. Bultje authored
Cycle times: 4x4: 151 to 131 cycles (15% faster) 8x8: 334 to 306 cycles (9% faster) 16x16: 1401 to 1368 cycles (2.5% faster) 32x32: 7403 to 7367 cycles (0.5% faster) Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup. Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
-
- 15 Jul, 2013 - 3 commits
-
-
Ronald S. Bultje authored
Also inline some of the block calculations to assist the compiler to not do silly things like calculating the same offset (or converting between raster/transform block offset or block, mi and pixel unit) many, many, many times. Cycle times: 4x4: 584 -> 505 cycles (16% faster) 8x8: 1651 -> 1560 cycles (6% faster) 16x16: 7897 -> 7704 cycles (2.5% faster) 32x32: 16096 -> 15852 cycles (1.5% faster) Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall. Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80
-
Jingning Han authored
Skip the inverse transform and reconstruction of inter-mode coded blocks in the rate-distortion optimization loop, when skip_encode_sb feature is turned on. This provides about 1% speed-up at speed 0, and 1.5% speed-up at speed 1. No performance change in both settings. Change-Id: I2932718bf4d007163702b61b16b6ff100cf9d007
-
Jingning Han authored
This speed feature allows the encoder to largely remove the spatial dependency between blocks inside a 64x64 superblock, thereby removing the need to repeatedly encode superblocks per partition type in the rate-distortion optimization loop. A major challenge lies in the intra modes tested in the rate-distortion optimization loop. The subsequent blocks do not have access to the reconstructed boundary pixels without the intermediate coding steps. This was resolved by using the original pixels for intra prediction in the rd loop, followed by an appropriately designed distortion modeling on the quantization parameters. Experiments also suggested that the performance impact is more discernible at lower bit-rate/psnr settings. Hence a quantizer dependent threshold is applied to deactivate skip of block coding. For bus_cif at 2000 kbps, speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB performance loss. speed 1: runtime 65312ms -> 61536ms, (7...
-
- 14 Jul, 2013 - 1 commit
-
-
James Zern authored
frames_since_golden / frames_till_alt_ref_frame are unused. Change-Id: I348e7689d4d75412cf4de7703d885be942e4a26b
-
- 13 Jul, 2013 - 1 commit
-
-
Dmitry Kovalev authored
Change-Id: Id9b6ceeddca3f9b34bfada5c499b1e7a2f42c30b
-
- 12 Jul, 2013 - 8 commits
-
-
James Zern authored
this was never fleshed out in the context of VP8, for which it was added. for VP9 it has no meaning. Change-Id: Iba2ecc026d9e947067b96690245d337e51e26eff
-
Dmitry Kovalev authored
Also removing unused declarations from vp9_entropymode.h file. Change-Id: Ib9c5826db3584a32f6bb3297a76c522b99d83402
-
Yaowu Xu authored
Change-Id: I23a75c495ed7ea917d7f312bef0990e20a6b53d9
-
James Zern authored
lg2 -> log2 Change-Id: I0602ddff49e42c9c40c29c084d04b7592b9f8edf
-
Deb Mukherjee authored
Implements some of the helper functions more efficiently with lookups rathers than branches. Modeling function is consolidated to reduce some computations. Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into one because there is no need to keep them separate (even though the semantics are a little different). No bitstream or output change. About 0.5% speedup Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f
-
Dmitry Kovalev authored
Removing redundant function arguments and curly braces. Change-Id: I46e02561f33fe02e84a3b19756f03b9504bd6a1b
-
Johann authored
For some reason iOS builds take a really long time to sort this function out. It's not used anywhere so remove it. Change-Id: Ia5c8513a0d9c7eb32641cca58ca1c1113e2dd9f4
-
Ronald S. Bultje authored
Change-Id: I78a79fc51c2d7cc3c261f35b569155397f3dc0c4
-
- 11 Jul, 2013 - 5 commits
-
-
Dmitry Kovalev authored
Change-Id: Iccd4ab95ea51a6d57ed43947f2fd7ad92e8979cf
-
Dmitry Kovalev authored
Adding segmentation struct to vp9_seg_common.h. Struct members are from macroblockd and VP9Common structs. Moving segmentation related constants and enums to vp9_seg_common.h. Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
-
Jingning Han authored
The function encode_block is called only by inter-prediction modes, hence removing the transform type branching there. Change-Id: I34a3172e28ce2388835efd0f8781922211bff857
-
Paul Wilkins authored
With sf->auto_mv_step_size on it is questionable whether sf->reduce_first_step_size is worthwhile. At speed 2 it was not having a big impact. Even at speed 2 sf->optimize_coefficients = 0 is not having a big speed imapct so for now I have moved it down into a higher speed setting. Change-Id: I8a54de76d486ad37aabce76474889da2768b14c1
-
Ronald S. Bultje authored
Change-Id: Ia942e56cf322821d42ba06178672791eeee2847e
-
- 10 Jul, 2013 - 9 commits
-
-
Dmitry Kovalev authored
Change-Id: I0543e72fa092eef3976b65e16bb597197c364873
-
Jingning Han authored
This commit fixed the mis-use of the tx_type for inverse transform in intra4x4 rate-distortion optimization loop. It improves the overall coding performance. Change-Id: I7fe9953175b74890357dbcee33c138573766e980
-
Dmitry Kovalev authored
Change-Id: Ic5257fa8278e9b6297de230e4fd26a1e23ad2bb7
-
Jim Bankoski authored
Change-Id: I5dea4570cb05df27a522abf6e7b695998654284a
-
Jim Bankoski authored
Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136
-
Deb Mukherjee authored
Adds a speed feature to eliminate full-rd computation if the modeled rd or rd based on a different parameter in the same mode is already a lot larger than the best rd yet. Specifically, only search the sharp and smooth filters if the modeled rd cost based on the regular filter is within a certain factor of the best rd cost so far. Also, skip full-rd computation of non splitmv inter modes if the modeled rd cost based on pred error is within the same factor of the best rd cost so far. Also adds some enhancements in the rd search for splitmv mode to speed things up by early breakouts. Negligible impact on performance. Resuts on derfraw300: psnr: -0.013% with the splitmv enhancements, -0.24% with the rd breakout feature on. speedup: 6% with splitmv enhancements, 20% with also residual breakout (tested on football sequence at 600 Kbps) Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc
-
Jingning Han authored
This commit enables 16x16 ADST/DCT forward hybrid transform using SSE2 operations. It reduces the runtime from 5433 cycles to 1621 cycles, at no compression performance loss. Change-Id: I75fd7f1984e9e28846af459f810ff0d6ae125230
-
Ronald S. Bultje authored
Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from 2min4.9 to 2min3.1, i.e. a 1.4% speedup overall. Change-Id: I9b25e87974430cb942caa276410bb2eda815bd83
-
Yaowu Xu authored
Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8
-