Commits · 37501d687c509c348bc2bdbe75aa050e2ab0437c · BC / public / external / libvpx

02 Jul, 2013 - 9 commits

Speed feature to binary search dir intramodes · 37501d68

Deb Mukherjee authored 11 years ago

This speed feature will skip searching the directional intra prediction
modes D63, D117, D27, D153 if the best intra mode so far is not one of
the diagonal, horizontal or vertical directions closest to the respective
directions being tested. In other words, this implements a sort of
binary search in the angular domain.

Speedup: about 9-10%
Results: -0.05% only on derfraw300.

Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2

37501d68

Tx size selection enhancements · 8d3d2b76

Deb Mukherjee authored 11 years ago

(1) Refines the modeling function and uses that to add some speed
features. Specifically, intead of using a flag use_largest_txfm as
a speed feature, an enum tx_size_search_method is used, of which
two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
new types are added:
USE_LARGESTINTRA (use largest only for intra)
USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
inter)

(2) Another change is that the framework for deciding transform type
is simplified to use a heuristic count based method rather than
an rd based method using txfm_cache. In practice the new method
is found to work just as well - with derf only -0.01 down.
The new method is more compatible with the new framework where
certain rd costs are based on full rd and certain others are
based on modeled rd or are not computed. In this patch the existing
rd based method is still kept for use in the USE_FULL_RD mode.
In the other modes, the count based method is used.
However the recommendation is to remove it eventually since the
benefit is limited, and will remove a lot of complications in
the code

(3) Finally a bug is fixed with the existing use_largest_txfm speed feature
that causes mismatches when the lossless mode and 4x4 WH transform is
forced.

Results on derf:
USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
pretty good compromise)
USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
(currently the benefit of modeling is limited for txfm size selection,
but keeping this enum as a placeholder) .
USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
use_largest_txfm speed feature).

Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936

8d3d2b76

Clean-up in forward update to use mapping tables · 9c20cedd

Deb Mukherjee authored 11 years ago

Uses mapping tables instead of complicated modulo/division
operations for prob mapping for forward updates.

No bit-stream or output change.

Change-Id: Ifd9ce8ac1437835c305c94f64c18273c7a68f546

9c20cedd

Add speed feature to disable splitmv · b12e060b

Yunqing Wang authored 11 years ago

Added a speed feature in speed 1 to disable splitmv for HD (>=720)
clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim
loss. Encoding speedup is 36%.

(For reference: The test result on derf set showed 2% psnr loss
and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be
enabled for small resolution videos.)

Change-Id: I54f72b94f506c6d404b47c42e71acaa5374d6ee6

b12e060b

Calculate rd cost per transformed block · b91a1586

Jingning Han authored 11 years ago

Compute the rate-distortion cost per transformed block, and cumulate
the cost through all blocks inside a partition. This allows encoder
to detect if the cumulative rd cost is already above the best rd cost,
thereby enabling early termination in the rate-distortion optimization
search.

Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac

b91a1586

Revert "New motion threshold factor - speed feature." · b7cd01ed

Paul Wilkins authored 11 years ago

This reverts commit 13772781.
Also fixes a spelling mistake.

Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f

b7cd01ed

fix the mismatch again in cpu_used 2 · 9e408e35
Yaowu Xu authored 11 years ago
```
Change-Id: Icc4f70f0b0f91c9e7d5d00eedd67841afe2f2679
```
9e408e35

use partitioning from last frame · d4158283

Jim Bankoski authored 11 years ago


This cl converts use partition from last frame to do the following:

if part is none,horz, vert -> try split
if part != none and one of the children is not split - try none


Change-Id: I5b6c659e35f3ac9f11c051b92ba98af6d7e8aa87
Signed-off-by: Jim Bankoski <jimbankoski@google.com>

d4158283

Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h. · 1ac05402
Dmitry Kovalev authored 11 years ago
```
Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c
```
1ac05402

01 Jul, 2013 - 6 commits

Make get_coef_context() branchless. · 26b6318d

Ronald S. Bultje authored 11 years ago

This should significantly speedup cost_coeffs(). Basically what the
patch does is to make the neighbour arrays padded by one item to
prevent an eob check in get_coef_context(), then it populates each
col/row scan and left/top edge coefficient with two times the same
neighbour - this prevents a single/double context branch in
get_coef_context(). Lastly, it populates neighbour arrays in pixel
order (rather than scan order), so we don't have to dereference the
scantable to get the correct neighbours.

Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.

Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56

26b6318d

Update quantize SSSE3 SIMD to cover 32x32 transform case also. · c8defcfd

Ronald S. Bultje authored 11 years ago

Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to
2min10.1, i.e. a 2.3% overall speed increase.

Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87

c8defcfd

Quantize (64-bit only, for now) SSSE3 SIMD. · 7353ceab

Ronald S. Bultje authored 11 years ago

Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
x86-64 only, it needs some minor modifications to be 32bit compatible,
because it uses 15 xmm registers, whereas 32bit only has 8.

Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904

7353ceab

Removing vp9_modecont.{h, c}. · 2ab3bc88

Dmitry Kovalev authored 11 years ago

Moving vp9_default_inter_mode_probs array to vp9_entropymode.c.

Change-Id: I88ebda86ccc07f2a43c6c01d4b37898214cfb6de

2ab3bc88

fix a mismatch in cpuused 2 · 632289b3
Yaowu Xu authored 11 years ago
```
Change-Id: I921c9faba6386535aaf717a54301dd346a9b8540
```
632289b3

New motion threshold factor - speed feature. · 13772781

Paul Wilkins authored 11 years ago

Added a speed feature that focuses only on thresholds
for new motion modes.

Moved sf->comp_inter_joint_search_thresh into speed
1.  This has ~+0.4% impact on quality at speed 0 as
our quality reference baseline.

Slight adjustment to baseline thresholds.

Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5

13772781

29 Jun, 2013 - 4 commits

SSE2 version of vp9_short_fdct32x32_rd. · 466e0cf3

Christian Duvivier authored 11 years ago

43,000 -> 5,750 cycles, about 7.5x faster.

Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0

466e0cf3

Moving encoder subexp encoding functions to subexp.{h, c}. · bb8ccf1c
Dmitry Kovalev authored 11 years ago
```
Change-Id: I83ca53bf6def871f199a382a671f26ad7cbecbca
```
bb8ccf1c

Enable SSE2 4x4 ADST/DCT transform · 1109b6b8

Jingning Han authored 11 years ago

This commit enables SSE2 4x4 foward hybrid transform. The runtime
goes from 249 cycles down to 74 cycles. Overall around 2% speed-up
at no compression performance change.

Change-Id: Iad4d526346e05c7be896466c05500711bb763660

1109b6b8

fixed a bug where sse is not populated · f853e662
Yaowu Xu authored 11 years ago
```
Change-Id: I692d800af1f976c84a76f8bd66864c4b39540abc
```
f853e662

28 Jun, 2013 - 9 commits

Fix switch statement in 8x8 transform · 9def7f72
Jingning Han authored 11 years ago
```
Change-Id: I7c46354c4983feb5f6202c3ab4a1d9534da7e30f
```
9def7f72

Inline vp9_get_coef_context() (and remove vp9_ prefix). · d00b8e5f

Ronald S. Bultje authored 11 years ago

Makes cost_coeffs() a lot faster:
4x4: 236 -> 181 cycles
8x8: 888 -> 588 cycles
16x16: 3550 -> 2483 cycles
32x32: 17392 -> 12010 cycles

Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.

Change-Id: I16b8d595946393c8dc661599550b3f37f5718896

d00b8e5f

Removing CONFIG_DEBUG checks on assertions. · 8e6ce6bb

Dmitry Kovalev authored 11 years ago

Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
ones from vp9_onyx_int.h and vp9_onyxd_int.h.

Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6

8e6ce6bb

Minor change to prevent one level of dereference in cost_coeffs(). · e3ce2b2a

Ronald S. Bultje authored 11 years ago

4x4: 234 -> 236 cycles
8x8: 878 -> 888 cycles
16x16: 3664 -> 3550 cycles
32x32: 18134 -> 17392 cycles

Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78

e3ce2b2a

Some minor optimizations for cost_coeffs(). · 91d223bd

Ronald S. Bultje authored 11 years ago

Cycle timings for first 3 frames of bus (speed 0) at 1500kbps:
4x4: 298 -> 234 cycles
8x8: 1227 -> 878 cycles
16x16: 23426 -> 18134 cycles
32x32: 4906 -> 3664 cycles

Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes
from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster.

Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95

91d223bd

Make coefficient skip condition an explicit RD choice. · af660715

Ronald S. Bultje authored 11 years ago

This commit replaces zrun_zbin_boost, a method of biasing non-zero
coefficients following runs of zero-coefficients to be rounded towards
zero, with an explicit skip-block choice in the RD loop.

The logic is basically that if individual coefficients should be rounded
towards zero (from a RD point of view), the trellis/optimize loop should
take care of it. If whole blocks should be zero (from a RD point of
view), a single RD check is much more efficient than a complete
serialization of the quantization loop.

Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
SIMD for quantize will follow in a separate patch. Results for other
test sets pending.

Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4

af660715

Minor cleanups · 8b9eea0a
Yaowu Xu authored 11 years ago
```
Change-Id: I379617c1c731a686b3f7e032b8805860c1055b12
```
8b9eea0a

Optimize partition search order · 1374a06b

Yaowu Xu authored 11 years ago

This commit change the partition search order to allow checking of
rectangular partition to be done after square partitions. It also
added a speed feature to skip rectangular partition check when
NONE is better than SPLIT in RD sense.

This feature roughly speed up encoder by 1.5X with loss on compression
-0.91% on cif set
-0.56% on stdhd set

Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4

1374a06b

Fix tile independence with both column tiling and static_thresh set. · fd4eed3b
Ronald S. Bultje authored 11 years ago
```
Change-Id: I0b2be0ec2c410a527f88b95a44f24ac967b2dac1
```
fd4eed3b

27 Jun, 2013 - 3 commits

Decoder's code cleanup. · 3231da0a

Dmitry Kovalev authored 11 years ago

Using vp9_set_pred_flag function instead of custom code, adding
decode_tokens function which is now called from decode_atom,
decode_sb_intra, and decode_sb.

Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8

3231da0a

Inline quantize so idiv instruction gets removed from inner loop. · 7a049be6

Ronald S. Bultje authored 11 years ago

Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from
3min15.0 to 3min10.9, i.e. 2.1% faster overall.

Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c

7a049be6

Make intra predictor reference buffer configurable · 861cb06c

Jingning Han authored 11 years ago

This commit enables configurable reference buffer pointer for intra
predictor. This allows later removal of spatial dependency between
blocks inside a 64x64 superblock in the rate-distortion optimization
loop.

Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1

861cb06c

26 Jun, 2013 - 7 commits

Remove unused macro RDTRUNC_8x8 from encodemb.c. · b5468155
Ronald S. Bultje authored 11 years ago
```
Change-Id: I0c097567adab24215d807963ccb34810a2afe007
```
b5468155
Remove empty function vp9_build_block_offsets · bd9bac03
Jingning Han authored 11 years ago
```
This function is empty, hence is removed.

Change-Id: Ia9d01710806bffe0398a6dc9405f8a5a81b27d74
```
bd9bac03

Auto adapt step size feature. · 9f3ab834

Paul Wilkins authored 11 years ago

Also tweaks to other features and experiments with
what is on and off at different speed settings.

Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2

9f3ab834

General cleanup in segmentation-related code. · be07485e

Dmitry Kovalev authored 11 years ago

Using consistent function and variable names.

Change-Id: I2deb3fded8797453a2081836c9ce2e79ade06eb7

be07485e

fixed a compiling problem with MSVC win32 build · 60dc7375

Yaowu Xu authored 11 years ago

The aligned array in parameter list caused win32 build to report
c2719 error. This commit fixed the issue by make the parameter
type a pointer instead of an array.

Change-Id: I4ed654ce4eba2db4995d9cdc136c68e9a6acc992

60dc7375

Start adaptive threshold for each mode at max. · 689957e3

Paul Wilkins authored 11 years ago

Each frame we reset all adaptive thresholds to MAX
rather than base. As modes are picked their thresholds
drop down.

Change-Id: Ia37f03a73003c2d9bfcda57edea07205e9a0e5e8

689957e3

Change meaning of cpi->sf.first_step and rename. · e606cac0

Paul Wilkins authored 11 years ago

Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
and changed its meaning such that it is a delta applied to
reduce the default first step size (>> x) in the motion search
rather than an absolute value.

The default first step size is already changed according to the image
dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
now applies a further correction from the default.

Change-Id: Ia94e08bc24c67b604831f980909af7e982fcd16d

e606cac0

25 Jun, 2013 - 2 commits

Refactor intra predictor block · d19ea386

Jingning Han authored 11 years ago

Remove vp9_intra4x4_predict(). Use the common intra prediction
function for all block sizes.

Change-Id: Ibd19d51dfa3da8bbdfb79ddeb81530b2e2089560

d19ea386

Renaming "nmv" to "mv". · 6fb10f2d
Dmitry Kovalev authored 11 years ago
```
Change-Id: I8299f55c3b930221e52c2237f2ddea65b94fd33b
```
6fb10f2d