Commits · 83c7e13a6bcd1535d9547ef3c89816bf993b458b · BC / public / external / libvpx

17 Jul, 2013 - 5 commits

Do a skip-block check for sub8x8 partitions also. · 83c7e13a
Ronald S. Bultje authored 11 years ago
```
+0.2% SSIM and glbPSNR on derfraw300.

Change-Id: I9cba0bca55e606a22f557c7732b064f738efe84d
```
83c7e13a

Speed up motion estimation using small partitions' result(experiment) · df90d58f

Yunqing Wang authored 11 years ago

Current partition checking starts from small sizes, and then goes up
to large sizes. This experiment uses the small partitions' motion
estimation result, which is already available, to speed up the
large partition's motion estimation. We can decide to skip some
patition checkings if they are unlikely choices. We could use the
motion vector(MV) result as current partition's prediction MV, limit
the search range and reference frame.

Current result at speed 1:
psnr loss: 1.19% for stdhd, 0.287% for derf.
speed gain: 14% for sunflower(hd), 11% for akiyo.

Further improvement will be done later.

Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab

df90d58f

Move uv intra mode selection in rd loop. · 2ee338ce

Paul Wilkins authored 11 years ago

Use an estimate based on DC_PRED for intra uv cost
within the rd loop then only do a full uv mode analysis
if an intra mode is chosen.

Significant speed gains in some cases. Currently only
enabled for speed 2 pending speed/quality tests.

Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0

2ee338ce

Limit transform sizes searched for uv intra. · 6c667f0f

Paul Wilkins authored 11 years ago

Apply limit if search_method == USE_LARGESTALL
to the range of UV tx sizes searched.

Change-Id: I6db29f0dd237285ffc50d75a37e8b68151ad821c

6c667f0f

Skip redundant motion search in 4x4 level rd loop · a142d6fc

Jingning Han authored 11 years ago

This commit makes the encoder to perform motion search only once
per reference frame type for each 4x4/4x8/8x4 block. For bus_cif
at 2000 kbps, the runtime goes from 253812ms -> 217817ms
(14% speed-up) for speed 0.

Change-Id: I5f17599ccc8cfaf93ccb4f98fcb6008af6d79e92

a142d6fc

16 Jul, 2013 - 8 commits

Removing MV_GROUP_UPDATE define and corresponding code. · 3997da0d
Dmitry Kovalev authored 11 years ago
```
Change-Id: I4884cdc2557d25d50c7c4f7e19b1ad8bdb93cd63
```
3997da0d

Cleaning up tile code. · 9482a0bf

Dmitry Kovalev authored 11 years ago

Removing tile_rows and tile_columns from VP9Common, removing redundant
constants MIN_TILE_WIDTH and MAX_TILE_WIDTH, changing signature of
vp9_get_tile_n_bits.

Change-Id: I8ff3104a38179b2c6900df965c144c1d6f602267

9482a0bf

use consistent framerate naming · 9581eb6e
James Zern authored 11 years ago
```
s/frame_rate/framerate/g

Change-Id: I6fc3e088e419c5f46e3a9390dd8a2cad2677a2fc
```
9581eb6e

Rewriting vp9_set_pred_flag_{seg_id, mbskip}. · 863138a2

Dmitry Kovalev authored 11 years ago

Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent
with vp9_get_segment_id without using confusing sub(a, b) macro. Passing
mi_row and mi_col to functions explicitly instead of replying on
mb_to_right_edge and mb_to_bottom_edge.

Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435

863138a2

Minor cleanup in code to fine uv tx_size. · 30d2ea45
Paul Wilkins authored 11 years ago
```
Change-Id: I94b97a966b5efbc9a243048f1f5ddbbdc4b1846e
```
30d2ea45

Change to extend full border only when needed · 5b915ebd

Yaowu Xu authored 11 years ago

This is a short term optimization till we work out a decoder
implementation requiring no frame border extension.

Change-Id: I02d15bfde4d926b50a4e58b393d8c4062d1be70f

5b915ebd

Removing and moving around constant definitions. · ca75f125

Dmitry Kovalev authored 11 years ago

Removing unused and duplicated constants, moving them from *.h to *.c
if possible.

Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f

ca75f125

Inline vp9_quantize() in xform_quant(). · 1ff94fea

Ronald S. Bultje authored 11 years ago

Cycle times:
4x4:    151 to  131 cycles (15% faster)
8x8:    334 to  306 cycles (9% faster)
16x16: 1401 to 1368 cycles (2.5% faster)
32x32: 7403 to 7367 cycles (0.5% faster)

Total encode time of first 50 frames of bus @ 1500kbps (speed 0)
goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup.

Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f

1ff94fea

15 Jul, 2013 - 3 commits

Inline xform_quant() in encode_block_intra(). · 6fb41874

Ronald S. Bultje authored 11 years ago

Also inline some of the block calculations to assist the compiler to
not do silly things like calculating the same offset (or converting
between raster/transform block offset or block, mi and pixel unit)
many, many, many times.

Cycle times:
4x4:     584 ->   505 cycles (16% faster)
8x8:    1651 ->  1560 cycles (6% faster)
16x16:  7897 ->  7704 cycles (2.5% faster)
32x32: 16096 -> 15852 cycles (1.5% faster)

Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the
first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall.

Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80

6fb41874

Skip inter-coded block reconstruction in rd loop · 043e0f9d

Jingning Han authored 11 years ago

Skip the inverse transform and reconstruction of inter-mode coded
blocks in the rate-distortion optimization loop, when skip_encode_sb
feature is turned on. This provides about 1% speed-up at speed 0,
and 1.5% speed-up at speed 1. No performance change in both settings.

Change-Id: I2932718bf4d007163702b61b16b6ff100cf9d007

043e0f9d

Skip duplicate block encoding in the rd loop · faff6ed0

Jingning Han authored 11 years ago

This speed feature allows the encoder to largely remove the spatial
dependency between blocks inside a 64x64 superblock, thereby removing
the need to repeatedly encode superblocks per partition type in the
rate-distortion optimization loop.

A major challenge lies in the intra modes tested in the rate-distortion
optimization loop. The subsequent blocks do not have access to the
reconstructed boundary pixels without the intermediate coding steps.
This was resolved by using the original pixels for intra prediction
in the rd loop, followed by an appropriately designed distortion
modeling on the quantization parameters. Experiments also suggested
that the performance impact is more discernible at lower bit-rate/psnr
settings. Hence a quantizer dependent threshold is applied to deactivate
skip of block coding.

For bus_cif at 2000 kbps,
speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
         performance loss.

speed 1: runtime 65312ms  -> 61536ms, (7...

faff6ed0

14 Jul, 2013 - 1 commit

vp9: remove frames_{since,till}.. from MACROBLOCKD · dc1d2331

James Zern authored 11 years ago

frames_since_golden / frames_till_alt_ref_frame are unused.

Change-Id: I348e7689d4d75412cf4de7703d885be942e4a26b

dc1d2331

13 Jul, 2013 - 1 commit
- Using vp9_copy and vp9_zero instead of custom code. · 42907098
  Dmitry Kovalev authored 11 years ago
```
Change-Id: Id9b6ceeddca3f9b34bfada5c499b1e7a2f42c30b
```
  42907098
12 Jul, 2013 - 8 commits

yv12config: remove YUV_TYPE · 4fc6c88e

James Zern authored 11 years ago

this was never fleshed out in the context of VP8, for which it was
added. for VP9 it has no meaning.

Change-Id: Iba2ecc026d9e947067b96690245d337e51e26eff

4fc6c88e

Adding struct tx_probs and struct tx_counts to cleanup the code. · cc662dd7
Dmitry Kovalev authored 11 years ago
```
Also removing unused declarations from vp9_entropymode.h file.

Change-Id: Ib9c5826db3584a32f6bb3297a76c522b99d83402
```
cc662dd7
Fix a build issue · fb754b18
Yaowu Xu authored 11 years ago
```
Change-Id: I23a75c495ed7ea917d7f312bef0990e20a6b53d9
```
fb754b18
vp9: consistent 'log2' variable naming · 0195fb53
James Zern authored 11 years ago
```
lg2 -> log2

Change-Id: I0602ddff49e42c9c40c29c084d04b7592b9f8edf
```
0195fb53

Some minor cleanups for efficiency · 94c481f9

Deb Mukherjee authored 11 years ago

Implements some of the helper functions more efficiently with
lookups rathers than branches. Modeling function is consolidated
to reduce some computations.

Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
one because there is no need to keep them separate (even though
the semantics are a little different).

No bitstream or output change.

About 0.5% speedup

Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f

94c481f9

Removing redundant code mostly from vp9_pred_common.{h, c}. · dd150e8e
Dmitry Kovalev authored 11 years ago
```
Removing redundant function arguments and curly braces.

Change-Id: I46e02561f33fe02e84a3b19756f03b9504bd6a1b
```
dd150e8e

Remove print_nmvcounts · e6ab476d

Johann authored 11 years ago

For some reason iOS builds take a really long time to sort this
function out.

It's not used anywhere so remove it.

Change-Id: Ia5c8513a0d9c7eb32641cca58ca1c1113e2dd9f4

e6ab476d

Remove unused function block_error(). · ee09dd99
Ronald S. Bultje authored 11 years ago
```
Change-Id: I78a79fc51c2d7cc3c261f35b569155397f3dc0c4
```
ee09dd99

11 Jul, 2013 - 5 commits

Calling is_inter_mode() instead of custom code. · 8c05e590
Dmitry Kovalev authored 11 years ago
```
Change-Id: Iccd4ab95ea51a6d57ed43947f2fd7ad92e8979cf
```
8c05e590

Moving segmentation related vars into separate struct. · c4ad3273

Dmitry Kovalev authored 11 years ago

Adding segmentation struct to vp9_seg_common.h. Struct members are from
macroblockd and VP9Common structs. Moving segmentation related constants
and enums to vp9_seg_common.h.

Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03

c4ad3273

Remove unnecessary tx_type branch in encode_block · b9381b6f

Jingning Han authored 11 years ago

The function encode_block is called only by inter-prediction modes,
hence removing the transform type branching there.

Change-Id: I34a3172e28ce2388835efd0f8781922211bff857

b9381b6f

Speed 2 feature adjustment. · 5290eeab

Paul Wilkins authored 11 years ago

With sf->auto_mv_step_size on it is questionable
whether sf->reduce_first_step_size is worthwhile.
At speed 2 it was not having a big impact.

Even at speed 2 sf->optimize_coefficients = 0 is not
having a big speed imapct so for now I have moved it
down into a higher speed setting.

Change-Id: I8a54de76d486ad37aabce76474889da2768b14c1

5290eeab

Remove unused fwalsh/fdct x86 SIMD implementations. · c13e0bcb
Ronald S. Bultje authored 11 years ago
```
Change-Id: Ia942e56cf322821d42ba06178672791eeee2847e
```
c13e0bcb

10 Jul, 2013 - 9 commits

Removing unused TOKENEXTRA arg from pick_sb_modes function. · 544d8c33
Dmitry Kovalev authored 11 years ago
```
Change-Id: I0543e72fa092eef3976b65e16bb597197c364873
```
544d8c33

Fix tx_type bug in intra4x4 rd loop · 18803f9c

Jingning Han authored 11 years ago

This commit fixed the mis-use of the tx_type for inverse transform
in intra4x4 rate-distortion optimization loop. It improves the
overall coding performance.

Change-Id: I7fe9953175b74890357dbcee33c138573766e980

18803f9c

Adding write_compressed_header function. · 0ac5e4dd
Dmitry Kovalev authored 11 years ago
```
Change-Id: Ic5257fa8278e9b6297de230e4fd26a1e23ad2bb7
```
0ac5e4dd
configure with internal stats not working · 68ef7a6b
Jim Bankoski authored 11 years ago
```
Change-Id: I5dea4570cb05df27a522abf6e7b695998654284a
```
68ef7a6b
remove warnings when NDEBUG is set · 6591cf2f
Jim Bankoski authored 11 years ago
```
Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136
```
6591cf2f

Prunes out full-rd computation based on modeled rd · 53ff43ad

Deb Mukherjee authored 11 years ago

Adds a speed feature to eliminate full-rd computation if the modeled
rd or rd based on a different parameter in the same mode is already
a lot larger than the best rd yet.

Specifically, only search the sharp and smooth filters if the modeled
rd cost based on the  regular filter is within a certain factor of the
best rd cost so far. Also, skip full-rd computation of non splitmv
inter modes if the modeled rd cost based on pred error is within the
same factor of the best rd cost so far.

Also adds some enhancements in the rd search for splitmv mode to
speed things up by early breakouts. Negligible impact on performance.

Resuts on derfraw300:
psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
         breakout feature on.
speedup: 6% with splitmv enhancements, 20% with also residual breakout
         (tested on football sequence at 600 Kbps)

Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc

53ff43ad

SSE2 16x16 ADST/DCT hybrid transform · 11442353

Jingning Han authored 11 years ago

This commit enables 16x16 ADST/DCT forward hybrid transform using SSE2
operations. It reduces the runtime from 5433 cycles to 1621 cycles, at
no compression performance loss.

Change-Id: I75fd7f1984e9e28846af459f810ff0d6ae125230

11442353

Remove memcpy() in handle_inter_mode() filter selection. · b1df674a

Ronald S. Bultje authored 11 years ago

Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: I9b25e87974430cb942caa276410bb2eda815bd83

b1df674a

Add a feature to reduce chrome intra mode search · bed27a96
Yaowu Xu authored 11 years ago
```
Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8
```
bed27a96