Commits · 83c7e13a6bcd1535d9547ef3c89816bf993b458b · BC / public / external / libvpx

17 Jul, 2013 - 5 commits

Do a skip-block check for sub8x8 partitions also. · 83c7e13a
Ronald S. Bultje authored 11 years ago
```
+0.2% SSIM and glbPSNR on derfraw300.

Change-Id: I9cba0bca55e606a22f557c7732b064f738efe84d
```
83c7e13a

Speed up motion estimation using small partitions' result(experiment) · df90d58f

Yunqing Wang authored 11 years ago

Current partition checking starts from small sizes, and then goes up
to large sizes. This experiment uses the small partitions' motion
estimation result, which is already available, to speed up the
large partition's motion estimation. We can decide to skip some
patition checkings if they are unlikely choices. We could use the
motion vector(MV) result as current partition's prediction MV, limit
the search range and reference frame.

Current result at speed 1:
psnr loss: 1.19% for stdhd, 0.287% for derf.
speed gain: 14% for sunflower(hd), 11% for akiyo.

Further improvement will be done later.

Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab

df90d58f

Move uv intra mode selection in rd loop. · 2ee338ce

Paul Wilkins authored 11 years ago

Use an estimate based on DC_PRED for intra uv cost
within the rd loop then only do a full uv mode analysis
if an intra mode is chosen.

Significant speed gains in some cases. Currently only
enabled for speed 2 pending speed/quality tests.

Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0

2ee338ce

Limit transform sizes searched for uv intra. · 6c667f0f

Paul Wilkins authored 11 years ago

Apply limit if search_method == USE_LARGESTALL
to the range of UV tx sizes searched.

Change-Id: I6db29f0dd237285ffc50d75a37e8b68151ad821c

6c667f0f

Skip redundant motion search in 4x4 level rd loop · a142d6fc

Jingning Han authored 11 years ago

This commit makes the encoder to perform motion search only once
per reference frame type for each 4x4/4x8/8x4 block. For bus_cif
at 2000 kbps, the runtime goes from 253812ms -> 217817ms
(14% speed-up) for speed 0.

Change-Id: I5f17599ccc8cfaf93ccb4f98fcb6008af6d79e92

a142d6fc

16 Jul, 2013 - 2 commits
- Minor cleanup in code to fine uv tx_size. · 30d2ea45
  Paul Wilkins authored 11 years ago
```
Change-Id: I94b97a966b5efbc9a243048f1f5ddbbdc4b1846e
```
  30d2ea45
- Removing and moving around constant definitions. · ca75f125
  Dmitry Kovalev authored 11 years ago
```
Removing unused and duplicated constants, moving them from *.h to *.c
if possible.

Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f
```
  ca75f125
15 Jul, 2013 - 1 commit

Skip duplicate block encoding in the rd loop · faff6ed0

Jingning Han authored 11 years ago

This speed feature allows the encoder to largely remove the spatial
dependency between blocks inside a 64x64 superblock, thereby removing
the need to repeatedly encode superblocks per partition type in the
rate-distortion optimization loop.

A major challenge lies in the intra modes tested in the rate-distortion
optimization loop. The subsequent blocks do not have access to the
reconstructed boundary pixels without the intermediate coding steps.
This was resolved by using the original pixels for intra prediction
in the rd loop, followed by an appropriately designed distortion
modeling on the quantization parameters. Experiments also suggested
that the performance impact is more discernible at lower bit-rate/psnr
settings. Hence a quantizer dependent threshold is applied to deactivate
skip of block coding.

For bus_cif at 2000 kbps,
speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
         performance loss.

speed 1: runtime 65312ms  -> 61536ms, (7...

faff6ed0

12 Jul, 2013 - 3 commits

Fix a build issue · fb754b18
Yaowu Xu authored 11 years ago
```
Change-Id: I23a75c495ed7ea917d7f312bef0990e20a6b53d9
```
fb754b18

Some minor cleanups for efficiency · 94c481f9

Deb Mukherjee authored 11 years ago

Implements some of the helper functions more efficiently with
lookups rathers than branches. Modeling function is consolidated
to reduce some computations.

Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into
one because there is no need to keep them separate (even though
the semantics are a little different).

No bitstream or output change.

About 0.5% speedup

Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f

94c481f9

Remove unused function block_error(). · ee09dd99
Ronald S. Bultje authored 11 years ago
```
Change-Id: I78a79fc51c2d7cc3c261f35b569155397f3dc0c4
```
ee09dd99

11 Jul, 2013 - 2 commits

Calling is_inter_mode() instead of custom code. · 8c05e590
Dmitry Kovalev authored 11 years ago
```
Change-Id: Iccd4ab95ea51a6d57ed43947f2fd7ad92e8979cf
```
8c05e590

Moving segmentation related vars into separate struct. · c4ad3273

Dmitry Kovalev authored 11 years ago

Adding segmentation struct to vp9_seg_common.h. Struct members are from
macroblockd and VP9Common structs. Moving segmentation related constants
and enums to vp9_seg_common.h.

Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03

c4ad3273

10 Jul, 2013 - 7 commits

Fix tx_type bug in intra4x4 rd loop · 18803f9c

Jingning Han authored 11 years ago

This commit fixed the mis-use of the tx_type for inverse transform
in intra4x4 rate-distortion optimization loop. It improves the
overall coding performance.

Change-Id: I7fe9953175b74890357dbcee33c138573766e980

18803f9c

remove warnings when NDEBUG is set · 6591cf2f
Jim Bankoski authored 11 years ago
```
Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136
```
6591cf2f

Prunes out full-rd computation based on modeled rd · 53ff43ad

Deb Mukherjee authored 11 years ago

Adds a speed feature to eliminate full-rd computation if the modeled
rd or rd based on a different parameter in the same mode is already
a lot larger than the best rd yet.

Specifically, only search the sharp and smooth filters if the modeled
rd cost based on the  regular filter is within a certain factor of the
best rd cost so far. Also, skip full-rd computation of non splitmv
inter modes if the modeled rd cost based on pred error is within the
same factor of the best rd cost so far.

Also adds some enhancements in the rd search for splitmv mode to
speed things up by early breakouts. Negligible impact on performance.

Resuts on derfraw300:
psnr:    -0.013% with the splitmv enhancements, -0.24% with the rd
         breakout feature on.
speedup: 6% with splitmv enhancements, 20% with also residual breakout
         (tested on football sequence at 600 Kbps)

Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc

53ff43ad

Remove memcpy() in handle_inter_mode() filter selection. · b1df674a

Ronald S. Bultje authored 11 years ago

Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: I9b25e87974430cb942caa276410bb2eda815bd83

b1df674a

Add a feature to reduce chrome intra mode search · bed27a96
Yaowu Xu authored 11 years ago
```
Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8
```
bed27a96

removing case statements around prediction entropy coding · fb027a76

Jim Bankoski authored 11 years ago

Removes SEG_ID
Removes MBSKIP
Removes SWITCHABLE_INTERP
Removes INTRA_INTER
Removes COMP_INTER_INTER
Removes COMP_REF_P
Removes SINGLE_REF_P1
Removes SINGLE_REF_P2
Removes TX_SIZE

Change-Id: Ie4520ae1f65c8cac312432c0616cc80dea5bf34b

fb027a76

Revert "Remove memcpy() in handle_inter_mode() filter selection." · 205efbc1
Yaowu Xu authored 11 years ago
```
This reverts commit fcf7998a.

Change-Id: Ic6532223faec9f1483b78adb2e37b79c7b1a0efb
```
205efbc1

09 Jul, 2013 - 1 commit
- Unbreak lossless. · 059c0ba5
  Ronald S. Bultje authored 11 years ago
```
Change-Id: I8130ec9b5371c65e885f245a5ac73840c23cb4a1
```
  059c0ba5
08 Jul, 2013 - 4 commits

Don't recalculate mv_ref costs for each block/partition. · 8fde07a3

Ronald S. Bultje authored 11 years ago

Changes cost_mv_ref() into doing a LUT into pre-calculated cost
arrays instead. Encode time of first 50 frames of bus (speed 0)
@ 1500kbps goes from 2min11.6 to 2min10.9, i.e. 0.5% faster overall.

Change-Id: If186e92c34c201b29cbbc058785a15c9c09e433a

8fde07a3

Remove memcpy() in handle_inter_mode() filter selection. · fcf7998a

Ronald S. Bultje authored 11 years ago

Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from
2min4.9 to 2min3.1, i.e. a 1.4% speedup overall.

Change-Id: Ibe8b08d159797504c5d0c5122de1b6da3b6595e0

fcf7998a

Make frame-wide filter-type decision fully RD-based. · ed995afb

Ronald S. Bultje authored 11 years ago

Overall, on all test sets, this gains about +0.2% on all metrics.
City is a clip where this really hurts (-1.0% on all metrics), I'm
not quite sure why yet. Maybe interesting to look into in the future.

Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78

ed995afb

Implements several heuristics to prune mode search · d9b62160

Deb Mukherjee authored 11 years ago

Skips mode searches for intra and compound inter modes depending
on the best mode so far and the reference frames. The various
heuristics to be used are selected by bits from a flag. The
previous direction based intra mode search pruning is also absorbed
in this framework.

Specifically the flags and their impact are:

1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
directional modes and TM_PRED if the best so far is
an inter mode)
derfraw300: -0.15%, 10% speedup

2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
mode search if the best so far is not one of the closest
hor/vert/diagonal directions.
derfraw300: -0.05%, about 9% speedup

3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
search if the best so far is an intra mode)
derfraw300: -0.06%, about 7-8% speedup

4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
if the best single ref inter mode does not have the same ref
as one of the two references being tested in the compound mode)
derfraw300: -0.56%, about 10% speedup

Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495

d9b62160

04 Jul, 2013 - 1 commit

Refactoring setup_pre_planes function. · f72e0725

Dmitry Kovalev authored 11 years ago

Removing set_refs, adding set_ref function.

Change-Id: I5635c478b106ae4e57d317f1c83d929644307e63

f72e0725

03 Jul, 2013 - 3 commits

Enable early termination in rd search · 2bd6fe08

Jingning Han authored 11 years ago

This commit allows encoder to detect the cumulative rate-distortion
cost per transformed block inside a partition. If the cumulative
rd cost is already above the best rd value, it terminates the rest
operations and continue to next prediction mode test.

It reduces the runtime of bus at target bit-rate 2000 from 308 second
to 266 second, i.e., about 13% speed-up at no performance penalty.

Change-Id: I5f15a3d8955d97031d5653006027866a00654e7a

2bd6fe08

Fix to comp_inter_joint_search_thresh feature. · f58b44ad

Paul Wilkins authored 11 years ago

When this is 0 (BLOCK_SIZE_AB4X4) we want to do
the inter joint search for all sizes.

Change-Id: Id40cd6fe7790e7e1165352b9cef5e12fa8c0bc88

f58b44ad

Added two new skip experiments. · 72c5778e

Paul Wilkins authored 11 years ago

sf->unused_mode_skip_lvl. Tests modes as normal for all
sizes at or below the given level. At larger sizes it skips
all modes that were not chosen at any smaller size.
Hence setting BLOCK_SIZE_SB64X64 is in effect off.
Setting BLOCK_SIZE_AB4X4 will only consider modes that
were chosen for one or more 4x4 blocks at larger sizes.

sf->reference_masking.
Do a test encode of the NONE partition at one size and create
a reference frame mask based on the best rd choice. In the
full search only allow this reference frame.
Currently it is testing 64x64 and repeats this in the full search.
This does not work well with Jim's Partition code just now and
is disabled by default.

Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd

72c5778e

02 Jul, 2013 - 5 commits

Removing redundant struct from union b_mode_info. · be77f6bb
Dmitry Kovalev authored 11 years ago
```
Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
```
be77f6bb

Speed feature to binary search dir intramodes · 37501d68

Deb Mukherjee authored 11 years ago

This speed feature will skip searching the directional intra prediction
modes D63, D117, D27, D153 if the best intra mode so far is not one of
the diagonal, horizontal or vertical directions closest to the respective
directions being tested. In other words, this implements a sort of
binary search in the angular domain.

Speedup: about 9-10%
Results: -0.05% only on derfraw300.

Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2

37501d68

Tx size selection enhancements · 8d3d2b76

Deb Mukherjee authored 11 years ago

(1) Refines the modeling function and uses that to add some speed
features. Specifically, intead of using a flag use_largest_txfm as
a speed feature, an enum tx_size_search_method is used, of which
two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
new types are added:
USE_LARGESTINTRA (use largest only for intra)
USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
inter)

(2) Another change is that the framework for deciding transform type
is simplified to use a heuristic count based method rather than
an rd based method using txfm_cache. In practice the new method
is found to work just as well - with derf only -0.01 down.
The new method is more compatible with the new framework where
certain rd costs are based on full rd and certain others are
based on modeled rd or are not computed. In this patch the existing
rd based method is still kept for use in the USE_FULL_RD mode.
In the other modes, the count based method is used.
However the recommendation is to remove it eventually since the
benefit is limited, and will remove a lot of complications in
the code

(3) Finally a bug is fixed with the existing use_largest_txfm speed feature
that causes mismatches when the lossless mode and 4x4 WH transform is
forced.

Results on derf:
USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
pretty good compromise)
USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
(currently the benefit of modeling is limited for txfm size selection,
but keeping this enum as a placeholder) .
USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
use_largest_txfm speed feature).

Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936

8d3d2b76

Calculate rd cost per transformed block · b91a1586

Jingning Han authored 11 years ago

Compute the rate-distortion cost per transformed block, and cumulate
the cost through all blocks inside a partition. This allows encoder
to detect if the cumulative rd cost is already above the best rd cost,
thereby enabling early termination in the rate-distortion optimization
search.

Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac

b91a1586

Revert "New motion threshold factor - speed feature." · b7cd01ed

Paul Wilkins authored 11 years ago

This reverts commit 13772781.
Also fixes a spelling mistake.

Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f

b7cd01ed

01 Jul, 2013 - 3 commits

Make get_coef_context() branchless. · 26b6318d

Ronald S. Bultje authored 11 years ago

This should significantly speedup cost_coeffs(). Basically what the
patch does is to make the neighbour arrays padded by one item to
prevent an eob check in get_coef_context(), then it populates each
col/row scan and left/top edge coefficient with two times the same
neighbour - this prevents a single/double context branch in
get_coef_context(). Lastly, it populates neighbour arrays in pixel
order (rather than scan order), so we don't have to dereference the
scantable to get the correct neighbours.

Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.

Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56

26b6318d

Quantize (64-bit only, for now) SSSE3 SIMD. · 7353ceab

Ronald S. Bultje authored 11 years ago

Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.

Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904

7353ceab

New motion threshold factor - speed feature. · 13772781

Paul Wilkins authored 11 years ago

Added a speed feature that focuses only on thresholds
for new motion modes.

Moved sf->comp_inter_joint_search_thresh into speed
1.  This has ~+0.4% impact on quality at speed 0 as
our quality reference baseline.

Slight adjustment to baseline thresholds.

Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5

13772781

29 Jun, 2013 - 1 commit
- fixed a bug where sse is not populated · f853e662
  Yaowu Xu authored 11 years ago
```
Change-Id: I692d800af1f976c84a76f8bd66864c4b39540abc
```
  f853e662
28 Jun, 2013 - 2 commits

Inline vp9_get_coef_context() (and remove vp9_ prefix). · d00b8e5f

Ronald S. Bultje authored 11 years ago

Makes cost_coeffs() a lot faster:
4x4: 236 -> 181 cycles
8x8: 888 -> 588 cycles
16x16: 3550 -> 2483 cycles
32x32: 17392 -> 12010 cycles

Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.

Change-Id: I16b8d595946393c8dc661599550b3f37f5718896

d00b8e5f

Minor change to prevent one level of dereference in cost_coeffs(). · e3ce2b2a

Ronald S. Bultje authored 11 years ago

4x4: 234 -> 236 cycles
8x8: 878 -> 888 cycles
16x16: 3664 -> 3550 cycles
32x32: 18134 -> 17392 cycles

Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78

e3ce2b2a