Commits · ed995afba18ec356fa72772d20d3e2f93635b1e3 · BC / public / external / libvpx

08 Jul, 2013 - 2 commits

Make frame-wide filter-type decision fully RD-based. · ed995afb

Ronald S. Bultje authored 11 years ago

Overall, on all test sets, this gains about +0.2% on all metrics.
City is a clip where this really hurts (-1.0% on all metrics), I'm
not quite sure why yet. Maybe interesting to look into in the future.

Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78

ed995afb

Implements several heuristics to prune mode search · d9b62160

Deb Mukherjee authored 11 years ago

Skips mode searches for intra and compound inter modes depending
on the best mode so far and the reference frames. The various
heuristics to be used are selected by bits from a flag. The
previous direction based intra mode search pruning is also absorbed
in this framework.

Specifically the flags and their impact are:

1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
directional modes and TM_PRED if the best so far is
an inter mode)
derfraw300: -0.15%, 10% speedup

2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
mode search if the best so far is not one of the closest
hor/vert/diagonal directions.
derfraw300: -0.05%, about 9% speedup

3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
search if the best so far is an intra mode)
derfraw300: -0.06%, about 7-8% speedup

4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
if the best single ref inter mode does not have the same ref
as one of the two references being tested in the compound mode)
derfraw300: -0.56%, about 10% speedup

Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495

d9b62160

05 Jul, 2013 - 1 commit
- Merge "Refactor SSE2 8x8 functional units" · a38cf265
  Jingning Han authored 11 years ago
  
  a38cf265
04 Jul, 2013 - 1 commit
- Merge "Fix to comp_inter_joint_search_thresh feature." · ef0ca2de
  Paul Wilkins authored 11 years ago
  
  ef0ca2de
03 Jul, 2013 - 17 commits
- Merge "Adding write_skip_coeff function." · 2ce6b234
  Dmitry Kovalev authored 11 years ago
  
  2ce6b234
- Merge "Enable early termination in rd search" · 68172dbe
  Jingning Han authored 11 years ago
  
  68172dbe
- Merge "Replacing 64 / MI_SIZE with MI_BLOCK_SIZE." · 430bd0c9
  Dmitry Kovalev authored 11 years ago
  
  430bd0c9
- Adding write_skip_coeff function. · dda1835d
  Dmitry Kovalev authored 11 years ago
```
Change-Id: I221126f22ab9067348eb0efb8a73b15a8f49c3fd
```
  dda1835d
- Merge "Inline a few intra predictors" · c86b1889
  Yaowu Xu authored 11 years ago
  
  c86b1889
- Enable early termination in rd search · 2bd6fe08
  Jingning Han authored 11 years ago
```
This commit allows encoder to detect the cumulative rate-distortion
cost per transformed block inside a partition. If the cumulative
rd cost is already above the best rd value, it terminates the rest
operations and continue to next prediction mode test.

It reduces the runtime of bus at target bit-rate 2000 from 308 second
to 266 second, i.e., about 13% speed-up at no performance penalty.

Change-Id: I5f15a3d8955d97031d5653006027866a00654e7a
```
  2bd6fe08
- Replacing 64 / MI_SIZE with MI_BLOCK_SIZE. · 5a21de84
  Dmitry Kovalev authored 11 years ago
```
Change-Id: I32276552b3ea6dc1dce8e298be114cfe1019b31c
```
  5a21de84
- Merge "Adding write_selected_txfm_size function." · 60198a59
  Dmitry Kovalev authored 11 years ago
  
  60198a59
- Inline a few intra predictors · 0f02dc27
  Yaowu Xu authored 11 years ago
```
Change-Id: Ib41f0643fdcc088500e7420708f4e72f1f64c710
```
  0f02dc27
- Refactor SSE2 8x8 functional units · 2cb75c96
  Jingning Han authored 11 years ago
```
These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT
hybrid transform coding.

Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d
```
  2cb75c96
- Merge "Use pmovmskb to skip quantize loops over empty coefficients." · 61fe678f
  Ronald S. Bultje authored 11 years ago
  
  61fe678f
- Merge "Remove unused function vp9_build_inter4x4_predictors_mbuv()." · 98c493a1
  Ronald S. Bultje authored 11 years ago
  
  98c493a1
- Fix to comp_inter_joint_search_thresh feature. · f58b44ad
  Paul Wilkins authored 11 years ago
```
When this is 0 (BLOCK_SIZE_AB4X4) we want to do
the inter joint search for all sizes.

Change-Id: Id40cd6fe7790e7e1165352b9cef5e12fa8c0bc88
```
  f58b44ad
- Added two new skip experiments. · 72c5778e
  Paul Wilkins authored 11 years ago
```
sf->unused_mode_skip_lvl. Tests modes as normal for all
sizes at or below the given level. At larger sizes it skips
all modes that were not chosen at any smaller size.
Hence setting BLOCK_SIZE_SB64X64 is in effect off.
Setting BLOCK_SIZE_AB4X4 will only consider modes that
were chosen for one or more 4x4 blocks at larger sizes.

sf->reference_masking.
Do a test encode of the NONE partition at one size and create
a reference frame mask based on the best rd choice. In the
full search only allow this reference frame.
Currently it is testing 64x64 and repeats this in the full search.
This does not work well with Jim's Partition code just now and
is disabled by default.

Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd
```
  72c5778e
- Merge "Adjust Speed 0 settings." · b0a2871c
  Paul Wilkins authored 11 years ago
  
  b0a2871c
- Merge "Removing redundant struct from union b_mode_info." · 1f6e95e7
  Dmitry Kovalev authored 11 years ago
  
  1f6e95e7
- Merge "Added a speed feature use_square_partition_only" · 16147d4b
  Yaowu Xu authored 11 years ago
  
  16147d4b
02 Jul, 2013 - 19 commits

Removing redundant struct from union b_mode_info. · be77f6bb
Dmitry Kovalev authored 11 years ago
```
Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
```
be77f6bb
Adding write_selected_txfm_size function. · edb060a7
Dmitry Kovalev authored 11 years ago
```
Change-Id: I143b430b7c24a964ccd0ebb75944cf317a072214
```
edb060a7

Added a speed feature use_square_partition_only · 0d7b7c09

Yaowu Xu authored 11 years ago

This commit adds a speed feature where only squared partition are
evaluated in partition picking. Enable this feature in cpu-used 2
reduces encoding time by ~30%.

loss of compression:
-0.9% on cif set
-1.23% on stdhd

Change-Id: Ia6fad11210f0b78365abb889f9245604513be5b9

0d7b7c09

Use pmovmskb to skip quantize loops over empty coefficients. · e5fb4b61

Ronald S. Bultje authored 11 years ago

If none of the 16 coefficients that we quantize per loop iteration
are larger than the zbin, directly skip to the next round of coeffs,
rather than doing a full quantize loop that will eventually result
in 16 zeroes. This incurs a jump cost, but saves a lot of other work.
32x32 quant goes from 1349 -> 1184 cycles. The same approach yielded
no significantly positive results for smaller transforms, so is not
used there (8x8: 103 -> 101 cycles; 16x16: 302 -> 306 cycles).

Change-Id: I8fca17dc2543fc8eed1dbcd5100145e3c3a9b647

e5fb4b61

Remove unused function vp9_build_inter4x4_predictors_mbuv(). · 5b872402
Ronald S. Bultje authored 11 years ago
```
Change-Id: Ibfd2def2c088f4bc541a1de25990d73480b53d4b
```
5b872402

new unit test for cpu-speed · b0520b61

Jim Bankoski authored 11 years ago

Tests q0 ( lossless),  very high bitrate and low bitrates at cpu speed
0, 1 and 2.

Change-Id: I0c5cdca00acd8d01e7b13f124b3b08d4b1ae9f6d

b0520b61

Speed feature to binary search dir intramodes · 37501d68

Deb Mukherjee authored 11 years ago

This speed feature will skip searching the directional intra prediction
modes D63, D117, D27, D153 if the best intra mode so far is not one of
the diagonal, horizontal or vertical directions closest to the respective
directions being tested. In other words, this implements a sort of
binary search in the angular domain.

Speedup: about 9-10%
Results: -0.05% only on derfraw300.

Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2

37501d68

Merge "Clean-up in forward update to use mapping tables" · 66324d50
Deb Mukherjee authored 11 years ago

66324d50

Tx size selection enhancements · 8d3d2b76

Deb Mukherjee authored 11 years ago

(1) Refines the modeling function and uses that to add some speed
features. Specifically, intead of using a flag use_largest_txfm as
a speed feature, an enum tx_size_search_method is used, of which
two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
new types are added:
USE_LARGESTINTRA (use largest only for intra)
USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
inter)

(2) Another change is that the framework for deciding transform type
is simplified to use a heuristic count based method rather than
an rd based method using txfm_cache. In practice the new method
is found to work just as well - with derf only -0.01 down.
The new method is more compatible with the new framework where
certain rd costs are based on full rd and certain others are
based on modeled rd or are not computed. In this patch the existing
rd based method is still kept for use in the USE_FULL_RD mode.
In the other modes, the count based method is used.
However the recommendation is to remove it eventually since the
benefit is limited, and will remove a lot of complications in
the code

(3) Finally a bug is fixed with the existing use_largest_txfm speed feature
that causes mismatches when the lossless mode and 4x4 WH transform is
forced.

Results on derf:
USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
pretty good compromise)
USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
(currently the benefit of modeling is limited for txfm size selection,
but keeping this enum as a placeholder) .
USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
use_largest_txfm speed feature).

Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936

8d3d2b76

Clean-up in forward update to use mapping tables · 9c20cedd

Deb Mukherjee authored 11 years ago

Uses mapping tables instead of complicated modulo/division
operations for prob mapping for forward updates.

No bit-stream or output change.

Change-Id: Ifd9ce8ac1437835c305c94f64c18273c7a68f546

9c20cedd

Merge "Removing unused implicit segmentation code." · 904070ca
Dmitry Kovalev authored 11 years ago

904070ca
Merge "Make get_coef_context() branchless." · 3cc6eb7c
Ronald S. Bultje authored 11 years ago

3cc6eb7c
Merge "Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h." · 3140c443
Dmitry Kovalev authored 11 years ago

3140c443
Merge "Additional vp9_decodemv.c cleanup." · 18fd4360
Dmitry Kovalev authored 11 years ago

18fd4360
Removing unused implicit segmentation code. · a3d2e6c9
Dmitry Kovalev authored 11 years ago
```
Change-Id: I8a2983fb14274a6ac53681fa4cd5d4209cbd2905
```
a3d2e6c9
Merge "Add speed feature to disable splitmv" · f4bee75c
Yunqing Wang authored 11 years ago

f4bee75c

Add speed feature to disable splitmv · b12e060b

Yunqing Wang authored 11 years ago

Added a speed feature in speed 1 to disable splitmv for HD (>=720)
clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim
loss. Encoding speedup is 36%.

(For reference: The test result on derf set showed 2% psnr loss
and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be
enabled for small resolution videos.)

Change-Id: I54f72b94f506c6d404b47c42e71acaa5374d6ee6

b12e060b

Calculate rd cost per transformed block · b91a1586

Jingning Han authored 11 years ago

Compute the rate-distortion cost per transformed block, and cumulate
the cost through all blocks inside a partition. This allows encoder
to detect if the cumulative rd cost is already above the best rd cost,
thereby enabling early termination in the rate-distortion optimization
search.

Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac

b91a1586

Merge "Update quantize SSSE3 SIMD to cover 32x32 transform case also." · 9df24b41
Ronald S. Bultje authored 11 years ago

9df24b41