Commits · decb1b94deb4420ab540b0963f1ca91ed20137c6 · BC / public / external / libvpx

27 Jul, 2013 - 2 commits

Inverse dimension order in token_cost array. · 118ccdcd

Ronald S. Bultje authored 11 years ago

This allows us to increment the position at the band-level only as
we go from one band to the next; more importantly, that allows us to
use an add instead of multiply instruction, and omit the instruction
altogether if the band doesn't change from one coef to the next, thus
being slightly faster (probably more noticeable on systems where a
multiply is expensive, like arm).

Change-Id: I4343fe35b9f9a47fa00b217bdcbf5f91ff96c381

118ccdcd

Shortcut 8x8/16x16 inverse 2D-DCT · 38fa4871

Jingning Han authored 11 years ago

This commit brought back the shortcut implementation of 8x8/16x16
inverse 2D-DCT. When the eob <= 10, it skips the inverse transform
operations on row 4:7/4:15 in the first round. For bus_cif at 1000
kbps, this provides about 2% speed-up at speed 0.

Change-Id: I453e2d72956467d75be4ad8c04b4482ab889d572

38fa4871

26 Jul, 2013 - 3 commits

Special handle on DC only inverse 8x8 2D-DCT · 325e0aa6

Jingning Han authored 11 years ago

This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.

Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011

325e0aa6

Auto min and max partition size experiment. · fe5e2a91

Paul Wilkins authored 11 years ago

Speed feature experiment to set an upper and lower
partition size limit based on what has been seen
in spatial neighbors.

This seems to gives quite reasonable speed gains in local
(10-15%) and when used with speed 0 the losses are small
(0.25% derf, 0.35% stdhd). However, for now I am only
enabling it on speed 1 as there may be clashes with the existing
temporal partition selection in speed 2.

Using a tighter min / max around the range derived from the
neighbors increases speed further but at the cost of a
bigger quality loss. However,  I think this spatial method could
be combined with data from either the last frame or a variance
method (or both) to refine the range of minimum and maximum
partition size. I.e. consider the min and max from spatial and
temporal neighbors and the variance recommendation.

Change-Id: I1b96bf8b84368d6aad0c7aa600fe141b4f07435f

fe5e2a91

Modify static threshold calculation · 52256cdb

Yunqing Wang authored 11 years ago

Used 3 * standard_deviation in internal threshold calculation
instead of fit curve. This actually approached the algorithm
better.
For comparison, similar tests were done:
The overall psnr loss is less than before.
1. derf set:
when static-thresh = 1, psnr loss is 0.329%;
when static-thresh = 500, psnr loss is 0.970%;
2. stdhd set:
when static-thresh = 1, psnr loss is 0.922%;
when static-thresh = 500, psnr loss is 1.307%;

Similar speedup is achieved. For example,
clip            bitrate  static-thresh psnr    time
akiyo(cif)       500        0          48.952  5.077s(50f)
akiyo            500        500        48.866  4.169s(50f)

parkjoy(1080p)   4000       0          30.388  78.20s(30f)
parkjoy          4000       500        30.367  70.85s(30f)

sunflower(1080p) 4000       0          44.402  74.55s(30f)
sunflower        4000       500        44.414  68.69s(30f)

Change-Id: Ic78833642ce1911dbbd1cb6c899a2d7e2dfcc1f3

52256cdb

25 Jul, 2013 - 7 commits

Add encoding option --static-thresh · d36852b7

Yunqing Wang authored 11 years ago

This option exists in VP8, and it was rewritten in VP9 to support
skipping on different partition levels. After prediction is done,
we can check if the residuals in the partition block will be all
quantized to 0. If this is true, the skip flag is set, and only
prediction data are needed in reconstruction. Based on DCT's energy
conservation property, the skipping check can be estimated in
spatial domain.

The prediction error is calculated and compared to a threshold.
The threshold is determined by the dequant values, and also
adjusted by partition sizes. To be precise, the DC and AC parts
for Y, U, and V planes are checked to decide skipping or not.

Test showed that
1. derf set:
when static-thresh = 1, psnr loss is 0.666%;
when static-thresh = 500, psnr loss is 1.162%;
2. stdhd set:
when static-thresh = 1, psnr loss is 1.249%;
when static-thresh = 500, psnr loss is 1.668%;

For different clips, encoding speedup range is between several
percentage and 20+% when static-thresh <= 500. For example,
clip            bitrate  static-thresh psnr    time
akiyo(cif)       500        0          48.923  5.635s(50f)
akiyo            500        500        48.863  4.402s(50f)

parkjoy(1080p)   4000       0          30.380  77.54s(30f)
parkjoy          4000       500        30.384  69.59s(30f)

sunflower(1080p) 4000       0          44.461  85.2s(30f)
sunflower        4000       500        44.418  78.1s(30f)

Higher static-thresh values give larger speedup with larger
quality loss.

Change-Id: I857031ceb466ff314ab580ac5ec5d18542203c53

d36852b7

General cleanups. · 7131cb0e

Dmitry Kovalev authored 11 years ago

Removing unused constants, macros, and function declarations. Using
ROUND_POWER_OF_TWO macro, vp9_zero, vp9_copy where possible. Moving
#include from *.h to *.c. Merging for loops for motion vectors.

Change-Id: Ic3bf841764a2bb177128bb3a6d7aa8f68229cd13

7131cb0e

Adding lookup table for size group. · 08fd41cc
Dmitry Kovalev authored 11 years ago
```
Change-Id: Ia6144d77ebed66e0739b62e4d673e26a95aa9550
```
08fd41cc

Simplify handling of sub-partition motion vectors · be700e14

Adrian Grange authored 11 years ago

Simplified the code that extracts and uses the motion
vectors for the 4 sub-partitions in rd_pick_partition.

Change-Id: Iaf698ef7ee3aef9edd59015e1ae065dd359b17d9

be700e14

Make coeff_optimize initialized per-plane · 2f58faff

Jingning Han authored 11 years ago

This commit makes the initialization of trellis coeff optimization
a per-plane operation, thereby eliminating the redundant steps in
encode_sby and encode_sbuv. It makes the encoder at speed 0 slightly
faster.

Change-Id: Iffe9faca6a109dafc0dd69dc7273cbdec19b17cd

2f58faff

Removing vp9_adapt_mode_context function. · 47d61f00

Dmitry Kovalev authored 11 years ago

Moving code from vp9_adapt_mode_context to vp9_adapt_mode_probs.

Change-Id: I60829c30b28968cd813551ef3a206dfb98d323c9

47d61f00

fix a bug where flags are not reset · 3e386aef

Yaowu Xu authored 11 years ago

The feature that uses small partition results as a measure to skip
mode evaluation at larger partition requires the flags to be reset.
The reset was missing in the code path that calls rd_use_partition().

Change-Id: Ia0a3a0aee1a862b6e2333d596808db7c48033d50

3e386aef

24 Jul, 2013 - 7 commits

Removing CONFIG_BALANCED_COEFTREE experiment. · fcc34796
Dmitry Kovalev authored 11 years ago
```
Change-Id: I61a8b0101eac3ee2e0621d56151b90c269fd4db4
```
fcc34796

Adding condition inside get_tx_type_{4x4, 8x8, 16x16}. · 9139ee09

Dmitry Kovalev authored 11 years ago

Adding plane type check condition because it was always used outside of
get_tx_type_{4x4, 8x8, 16x16}.

Change-Id: I02f0bbfee8063474865bd903eb25b54d26e07230

9139ee09

Use local variables rather than structure members · 4cfd36d8

Adrian Grange authored 11 years ago

Although local copies of the mode member variables
(mode, ref_frame) were made, they were not used in
all places. Also, made a local copy of the
second_ref_frame member.

Change-Id: I84d8c822e5cb3d8a02fc3de8a4037ca3fea8bfad

4cfd36d8

Save pixels instead of coefficients in intra4x4 RD loop. · 7817d322

Ronald S. Bultje authored 11 years ago

Prevents doing duplicate IDCTs; encoding of first 50 frames of bus
(speed 0) @ 1500kbps goes from 1min4.0 to 1min3.5, i.e. 0.87% faster
overall.

Change-Id: I2df39e29ed9d5ea5e7d2704a34940ba622832ddd

7817d322

Add best_rd breakout in intra4x4 RD loop. · b72ecbb1

Ronald S. Bultje authored 11 years ago

Encoding time of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min5.4 to 1min4.0, i.e. 2.2% faster overall.

Change-Id: I8c32f2aff9a649ce7dd49d910dc5ba16b99c3bc6

b72ecbb1

Correct spelling mistakes · bc8b0529
Adrian Grange authored 11 years ago
```
Change-Id: Id4138293efeac4503b2e01ce7a6c150a5abeef77
```
bc8b0529

Moving counts from FRAME_CONTEXT to new struct FRAME_COUNTS. · 1099a436

Dmitry Kovalev authored 11 years ago

Counts are separate from frame context. We have several frame contexts but
need only one copy of all counts.

Change-Id: I5279b0321cb450bbea7049adaa9275306a7cef7d

1099a436

23 Jul, 2013 - 10 commits

Unify the use of encode_b_args/optimize_block_args · ab77828b

Jingning Han authored 11 years ago

The struct optimize_block_args is defined same as encode_b_args.
Remove this redundant definition, and use encode_b_args consistently.

Change-Id: I1703aeeb3bacf92e98a34f4355202712110173d9

ab77828b

Removing LOW_PRECISION_MV_UPDATE define. · 8d13b0d1
Dmitry Kovalev authored 11 years ago
```
Change-Id: I78d16ee758e1fae0200b746f00031f6d9c6d6ce7
```
8d13b0d1

Rolled-up several for loops into one · 646edbc1

Adrian Grange authored 11 years ago

Several consecutive for loops executed over the same
index range, so I rolled them into one.

Change-Id: I5cfcc8c38c738478965768409cca9d09adf224e1

646edbc1

Removing vp9_is_interpolating_filter array. · db7f5d28

Dmitry Kovalev authored 11 years ago

All filters are interpolating now, so we don't need this array, all
values from this array are evaluated to true.

Change-Id: I9af6d8219ae0eb984063cd15e4e2296374ae4961

db7f5d28

Make xform_quant operations tx_type independent · e9e2fe8e

Jingning Han authored 11 years ago

The xform_quant() module is only used by inter modes, hence removing
the redundant switches therein conditioned on tx_type.

Change-Id: Ib87ce5b2f2e4cbf3ceb133a1108afa173c933a3f

e9e2fe8e

Skip inverse transform when eob is zero · 0359ad7f

Jingning Han authored 11 years ago

When all the transform coefficients were quantized to zero, skip
the inverse transform operation. For bus_cif at 1000 kbps, the
runtime goes from 154967ms -> 149842ms, i.e., about 3% speed-up,
at speed 0.

Change-Id: Ic0a813fff5e28972d4888ee42d8747846a6c3cc6

0359ad7f

pack_inter_mode_mvs cleanup · 7bc294a3

Scott LaVarnway authored 11 years ago

xd->mode_info_context is set to m prior to this call.

Change-Id: Ibc442529961750c29ccf0c6cae08cb2b0431415f

7bc294a3

clean up bw, bh · 86a9dec7

Jim Bankoski authored 11 years ago

many structures use bw and bh and they have different meanings.   This cl attempts
to start this clean up and remove unneccessary 2 step look up log and then
shift operations...

also removed partition type multiple operation code in bitstream.c.

Change-Id: I7e03e552bdfc0939738e430862e3073d30fdd5db

86a9dec7

Renaming of segment constants. · 32042af1

Paul Wilkins authored 11 years ago

Renamed:
  MAX_MB_SEGMENTS to MAX_SEGMENTS
  MB_SEG_TREE_PROBS to SEG_TREE_PROBS

The minimum unit for segmentation in the segment map
is now 8x8 so it is misleading to use MB_ as macro-block
traditionally refers to a 16x16 region.

Change-Id: I0b55a6f0426bb46dd13435fcfa5bae0a30a7fa22

32042af1

vp9: make some static tables const · 3c8cce35
James Zern authored 11 years ago
```
Change-Id: I8bcae51271673da8755c66a51aea005dfe6a3739
```
3c8cce35

22 Jul, 2013 - 8 commits

More optimizations for cost_coeffs(). · e20fcd95

Ronald S. Bultje authored 11 years ago

4x4:    163 ->  123 cycles (33% faster)
8x8:    491 ->  399 cycles (23% faster)
16x16: 1889 -> 1763 cycles (7% faster)
32x32: 8311 -> 8180 cycles (1.6% faster)

Overall encoding time of first 50 frames of bus (speed 0) @ 1500kbps
goes from 1min4.33 to 1min3.00, i.e. 2.11% faster.

Change-Id: Ib52d1dbb5649b14de769d3e7a74af67440b5284f

e20fcd95

Adding update_tx_counts function. · b2fc6fa9

Dmitry Kovalev authored 11 years ago

Moving common encoder/decoder code to update_tx_counts. Also renaming
vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
(twice before).

Change-Id: Ia50247f3893de88ef8e9041b0d44be44a40aaa4d

b2fc6fa9

fix a build error · fc186dca
Yaowu Xu authored 11 years ago
```
Change-Id: I3b05687f439ff6a7c426d2c97a6c58c831fa51ac
```
fc186dca

Diamond search change to accelerate movement · a1e2d50b

Deb Mukherjee authored 11 years ago

Optional change in diamond search to continue in the best move
direction until that move turns worse.

This is still WIP since the exact way the new method is to be used is
under investigation. One option is to make it an option in diamond
search and use it only when motion is large.

Overall slightly positive on derfraw300 +0.02%, stdhdraw +0.13%,
but works a lot better for high motion sequences (ex. football : +1%).

Change-Id: If88e01a6021daa0cda934680cdc70be1ee04f798

a1e2d50b

Optimize operation flow in sub8x8 rd loop · 409e77f2

Jingning Han authored 11 years ago

Stack the rate-distortion statistics in the sub8x8 rd loop. This allows
the encoder to skip the forward transform, quantization, and coeff cost
estimation, in the sub8x8 rd optimization search, if the motion
vector(s) are of integer pixel value, and have been tested in the
previous prediction filter type rd loops of the same block.

This gives about 2% speed-up for bus_cif at 2000 kpbs, for speed 0.
Its efficacy depends how frequently the motion search will select an
integer motion vector.

Change-Id: Iee15d4283ad4adea05522c1d40b198b127e6dd97

409e77f2

Re-order mode search in rd. · 1d189d64

Paul Wilkins authored 11 years ago

Mode search order in rd loop changed to better reflect
observed hit counts.

Also some adjustment of the baseline mode rd thresholds
to reflect the order change and observed frequencies.

Change-Id: I47a131cc83e11551df8add6d6d8d413d78d3a63c

1d189d64

fix left over overflow · 2ac8b50c

Jim Bankoski authored 11 years ago

This cl fixes issues rbultje brought up. that I somehow neglected when I
submitted yaowu's patch.

Change-Id: I07ad18796317822510b96e951c88d29f194a3c2e

2ac8b50c

Fix build error. · 888375d2

Paul Wilkins authored 11 years ago

When CONFIG_POSTPROC is set there was a now
invalid reference to cm->filter_level.

Changed to cpi->mb.e_mbd.lf.filter_level in line with
change Iaf5fb71c33719cdfa1b991f671caf071be9ea035

Change-Id: If746e60044903f7ba8d0d346225b3d015226c7d0

888375d2

21 Jul, 2013 - 1 commit

Skip buffer update in sub8x8 rd loop · c725502b

Jingning Han authored 11 years ago

This commit allows the encoder to skip a few buffer update steps in
rd_pick_best_mbsegmentation, when early breakout has been triggered
in the rd_check_segment_txsize. It provides about 1% speed-up for
bus_cif at 2000 kbps, in the settings of speed 0.

Change-Id: Ica034f10a24dec572b397d8389a2b81020ebc0b9

c725502b

20 Jul, 2013 - 2 commits

added checks to prevent rate/distortion overflow · ea284d62

Yaowu Xu authored 11 years ago

At speed 2, due to the threshold scheme used, it is possible the rate
and distortion assigned with INT_MAX value. The patch added checking
to prevent the INT_MAX value is used in further calculation of RD
scores. The patch also changed the assertion in rd_use_partition() to
be mirror similar assertion in rd_pick_partition().

Change-Id: Idb52c543cc1e10abdf6e6a5d6e9cb535a42214dc

ea284d62

Removing pre probabilities from FRAME_CONTEXT. · 7e703de7

Dmitry Kovalev authored 11 years ago

Using cm->frame_contexts[cm->frame_context_idx] as source of previous
probabilities.

Change-Id: Ie03778acf0e7bebdc3a1f6a51854d4a0712f24a1

7e703de7