Commits · 83c7e13a6bcd1535d9547ef3c89816bf993b458b · BC / public / external / libvpx

17 Jul, 2013 - 1 commit

Speed up motion estimation using small partitions' result(experiment) · df90d58f

Yunqing Wang authored 11 years ago

Current partition checking starts from small sizes, and then goes up
to large sizes. This experiment uses the small partitions' motion
estimation result, which is already available, to speed up the
large partition's motion estimation. We can decide to skip some
patition checkings if they are unlikely choices. We could use the
motion vector(MV) result as current partition's prediction MV, limit
the search range and reference frame.

Current result at speed 1:
psnr loss: 1.19% for stdhd, 0.287% for derf.
speed gain: 14% for sunflower(hd), 11% for akiyo.

Further improvement will be done later.

Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab

df90d58f

16 Jul, 2013 - 2 commits

Cleaning up tile code. · 9482a0bf

Dmitry Kovalev authored 11 years ago

Removing tile_rows and tile_columns from VP9Common, removing redundant
constants MIN_TILE_WIDTH and MAX_TILE_WIDTH, changing signature of
vp9_get_tile_n_bits.

Change-Id: I8ff3104a38179b2c6900df965c144c1d6f602267

9482a0bf

Rewriting vp9_set_pred_flag_{seg_id, mbskip}. · 863138a2

Dmitry Kovalev authored 11 years ago

Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent
with vp9_get_segment_id without using confusing sub(a, b) macro. Passing
mi_row and mi_col to functions explicitly instead of replying on
mb_to_right_edge and mb_to_bottom_edge.

Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435

863138a2

15 Jul, 2013 - 1 commit

Skip duplicate block encoding in the rd loop · faff6ed0

Jingning Han authored 11 years ago

This speed feature allows the encoder to largely remove the spatial
dependency between blocks inside a 64x64 superblock, thereby removing
the need to repeatedly encode superblocks per partition type in the
rate-distortion optimization loop.

A major challenge lies in the intra modes tested in the rate-distortion
optimization loop. The subsequent blocks do not have access to the
reconstructed boundary pixels without the intermediate coding steps.
This was resolved by using the original pixels for intra prediction
in the rd loop, followed by an appropriately designed distortion
modeling on the quantization parameters. Experiments also suggested
that the performance impact is more discernible at lower bit-rate/psnr
settings. Hence a quantizer dependent threshold is applied to deactivate
skip of block coding.

For bus_cif at 2000 kbps,
speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
         performance loss.

speed 1: runtime 65312ms  -> 61536ms, (7...

faff6ed0

14 Jul, 2013 - 1 commit

vp9: remove frames_{since,till}.. from MACROBLOCKD · dc1d2331

James Zern authored 11 years ago

frames_since_golden / frames_till_alt_ref_frame are unused.

Change-Id: I348e7689d4d75412cf4de7703d885be942e4a26b

dc1d2331

13 Jul, 2013 - 1 commit
- Using vp9_copy and vp9_zero instead of custom code. · 42907098
  Dmitry Kovalev authored 11 years ago
```
Change-Id: Id9b6ceeddca3f9b34bfada5c499b1e7a2f42c30b
```
  42907098
12 Jul, 2013 - 2 commits
- Adding struct tx_probs and struct tx_counts to cleanup the code. · cc662dd7
  Dmitry Kovalev authored 11 years ago
```
Also removing unused declarations from vp9_entropymode.h file.

Change-Id: Ib9c5826db3584a32f6bb3297a76c522b99d83402
```
  cc662dd7
- Removing redundant code mostly from vp9_pred_common.{h, c}. · dd150e8e
  Dmitry Kovalev authored 11 years ago
```
Removing redundant function arguments and curly braces.

Change-Id: I46e02561f33fe02e84a3b19756f03b9504bd6a1b
```
  dd150e8e
11 Jul, 2013 - 1 commit

Moving segmentation related vars into separate struct. · c4ad3273

Dmitry Kovalev authored 11 years ago

Adding segmentation struct to vp9_seg_common.h. Struct members are from
macroblockd and VP9Common structs. Moving segmentation related constants
and enums to vp9_seg_common.h.

Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03

c4ad3273

10 Jul, 2013 - 4 commits
- Removing unused TOKENEXTRA arg from pick_sb_modes function. · 544d8c33
  Dmitry Kovalev authored 11 years ago
```
Change-Id: I0543e72fa092eef3976b65e16bb597197c364873
```
  544d8c33
- configure with internal stats not working · 68ef7a6b
  Jim Bankoski authored 11 years ago
```
Change-Id: I5dea4570cb05df27a522abf6e7b695998654284a
```
  68ef7a6b
- remove warnings when NDEBUG is set · 6591cf2f
  Jim Bankoski authored 11 years ago
```
Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136
```
  6591cf2f
- removing case statements around prediction entropy coding · fb027a76
  Jim Bankoski authored 11 years ago
```
Removes SEG_ID
Removes MBSKIP
Removes SWITCHABLE_INTERP
Removes INTRA_INTER
Removes COMP_INTER_INTER
Removes COMP_REF_P
Removes SINGLE_REF_P1
Removes SINGLE_REF_P2
Removes TX_SIZE

Change-Id: Ie4520ae1f65c8cac312432c0616cc80dea5bf34b
```
  fb027a76
08 Jul, 2013 - 4 commits

Don't call encode_sb() for the final of 4-split subpartitions. · a5062cc6

Ronald S. Bultje authored 11 years ago

The resulting reconstruction is never used, thus it just wastes CPU
cycles. Reduces encode time of first 50 frames of bus (speed 0) @
1500kbps from 2min2.0 to 2min1.2, i.e. a 0.65% overall speedup.

Change-Id: I74755ca3aadc21e2be220f486259060bd4088c45

a5062cc6

Make frame-wide filter-type decision fully RD-based. · ed995afb

Ronald S. Bultje authored 11 years ago

Overall, on all test sets, this gains about +0.2% on all metrics.
City is a clip where this really hurts (-1.0% on all metrics), I'm
not quite sure why yet. Maybe interesting to look into in the future.

Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78

ed995afb

Using mi_cols instead of mb_cols. · b7559258

Dmitry Kovalev authored 11 years ago

Eliminating usage of mb-units, switching to mi-units. Adding
ALIGN_POWER_OF_TWO macro.

Change-Id: I2491c969f713207c062011878b57e4e531818607

b7559258

Implements several heuristics to prune mode search · d9b62160

Deb Mukherjee authored 11 years ago

Skips mode searches for intra and compound inter modes depending
on the best mode so far and the reference frames. The various
heuristics to be used are selected by bits from a flag. The
previous direction based intra mode search pruning is also absorbed
in this framework.

Specifically the flags and their impact are:

1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
directional modes and TM_PRED if the best so far is
an inter mode)
derfraw300: -0.15%, 10% speedup

2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
mode search if the best so far is not one of the closest
hor/vert/diagonal directions.
derfraw300: -0.05%, about 9% speedup

3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
search if the best so far is an intra mode)
derfraw300: -0.06%, about 7-8% speedup

4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
if the best single ref inter mode does not have the same ref
as one of the two references being tested in the compound mode)
derfraw300: -0.56%, about 10% speedup

Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495

d9b62160

04 Jul, 2013 - 1 commit

Refactoring setup_pre_planes function. · f72e0725

Dmitry Kovalev authored 11 years ago

Removing set_refs, adding set_ref function.

Change-Id: I5635c478b106ae4e57d317f1c83d929644307e63

f72e0725

03 Jul, 2013 - 2 commits

Replacing 64 / MI_SIZE with MI_BLOCK_SIZE. · 5a21de84
Dmitry Kovalev authored 11 years ago
```
Change-Id: I32276552b3ea6dc1dce8e298be114cfe1019b31c
```
5a21de84

Added two new skip experiments. · 72c5778e

Paul Wilkins authored 11 years ago

sf->unused_mode_skip_lvl. Tests modes as normal for all
sizes at or below the given level. At larger sizes it skips
all modes that were not chosen at any smaller size.
Hence setting BLOCK_SIZE_SB64X64 is in effect off.
Setting BLOCK_SIZE_AB4X4 will only consider modes that
were chosen for one or more 4x4 blocks at larger sizes.

sf->reference_masking.
Do a test encode of the NONE partition at one size and create
a reference frame mask based on the best rd choice. In the
full search only allow this reference frame.
Currently it is testing 64x64 and repeats this in the full search.
This does not work well with Jim's Partition code just now and
is disabled by default.

Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd

72c5778e

02 Jul, 2013 - 5 commits

Removing redundant struct from union b_mode_info. · be77f6bb
Dmitry Kovalev authored 11 years ago
```
Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
```
be77f6bb

Added a speed feature use_square_partition_only · 0d7b7c09

Yaowu Xu authored 11 years ago

This commit adds a speed feature where only squared partition are
evaluated in partition picking. Enable this feature in cpu-used 2
reduces encoding time by ~30%.

loss of compression:
-0.9% on cif set
-1.23% on stdhd

Change-Id: Ia6fad11210f0b78365abb889f9245604513be5b9

0d7b7c09

Tx size selection enhancements · 8d3d2b76

Deb Mukherjee authored 11 years ago

(1) Refines the modeling function and uses that to add some speed
features. Specifically, intead of using a flag use_largest_txfm as
a speed feature, an enum tx_size_search_method is used, of which
two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
new types are added:
USE_LARGESTINTRA (use largest only for intra)
USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
inter)

(2) Another change is that the framework for deciding transform type
is simplified to use a heuristic count based method rather than
an rd based method using txfm_cache. In practice the new method
is found to work just as well - with derf only -0.01 down.
The new method is more compatible with the new framework where
certain rd costs are based on full rd and certain others are
based on modeled rd or are not computed. In this patch the existing
rd based method is still kept for use in the USE_FULL_RD mode.
In the other modes, the count based method is used.
However the recommendation is to remove it eventually since the
benefit is limited, and will remove a lot of complications in
the code

(3) Finally a bug is fixed with the existing use_largest_txfm speed feature
that causes mismatches when the lossless mode and 4x4 WH transform is
forced.

Results on derf:
USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
pretty good compromise)
USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
(currently the benefit of modeling is limited for txfm size selection,
but keeping this enum as a placeholder) .
USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
use_largest_txfm speed feature).

Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936

8d3d2b76

use partitioning from last frame · d4158283

Jim Bankoski authored 11 years ago


This cl converts use partition from last frame to do the following:

if part is none,horz, vert -> try split
if part != none and one of the children is not split - try none


Change-Id: I5b6c659e35f3ac9f11c051b92ba98af6d7e8aa87
Signed-off-by: Jim Bankoski <jimbankoski@google.com>

d4158283

Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h. · 1ac05402
Dmitry Kovalev authored 11 years ago
```
Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c
```
1ac05402

01 Jul, 2013 - 1 commit
- fix a mismatch in cpuused 2 · 632289b3
  Yaowu Xu authored 11 years ago
```
Change-Id: I921c9faba6386535aaf717a54301dd346a9b8540
```
  632289b3
28 Jun, 2013 - 3 commits

Removing CONFIG_DEBUG checks on assertions. · 8e6ce6bb

Dmitry Kovalev authored 11 years ago

Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
ones from vp9_onyx_int.h and vp9_onyxd_int.h.

Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6

8e6ce6bb

Optimize partition search order · 1374a06b

Yaowu Xu authored 11 years ago

This commit change the partition search order to allow checking of
rectangular partition to be done after square partitions. It also
added a speed feature to skip rectangular partition check when
NONE is better than SPLIT in RD sense.

This feature roughly speed up encoder by 1.5X with loss on compression
-0.91% on cif set
-0.56% on stdhd set

Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4

1374a06b

Fix tile independence with both column tiling and static_thresh set. · fd4eed3b
Ronald S. Bultje authored 11 years ago
```
Change-Id: I0b2be0ec2c410a527f88b95a44f24ac967b2dac1
```
fd4eed3b

27 Jun, 2013 - 1 commit

Decoder's code cleanup. · 3231da0a

Dmitry Kovalev authored 11 years ago

Using vp9_set_pred_flag function instead of custom code, adding
decode_tokens function which is now called from decode_atom,
decode_sb_intra, and decode_sb.

Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8

3231da0a

26 Jun, 2013 - 3 commits

Remove empty function vp9_build_block_offsets · bd9bac03
Jingning Han authored 11 years ago
```
This function is empty, hence is removed.

Change-Id: Ia9d01710806bffe0398a6dc9405f8a5a81b27d74
```
bd9bac03

General cleanup in segmentation-related code. · be07485e

Dmitry Kovalev authored 11 years ago

Using consistent function and variable names.

Change-Id: I2deb3fded8797453a2081836c9ce2e79ade06eb7

be07485e

Start adaptive threshold for each mode at max. · 689957e3

Paul Wilkins authored 11 years ago

Each frame we reset all adaptive thresholds to MAX
rather than base. As modes are picked their thresholds
drop down.

Change-Id: Ia37f03a73003c2d9bfcda57edea07205e9a0e5e8

689957e3

24 Jun, 2013 - 1 commit

change to enable use_largest_txform feature · e371cd73

Yaowu Xu authored 11 years ago

for all regular inter frames at speed 1

Change-Id: I0a8b301273ecf2b8730ab1f6b7a05f89f4d498e0

e371cd73

21 Jun, 2013 - 3 commits

Removing find_seg_id and using vp9_get_pred_mi_segid instead. · 40141681
Dmitry Kovalev authored 11 years ago
```
Change-Id: Ia40229903c08f14020e90e94cfdf494aba1be827
```
40141681

Implement SSE2 block_error. · 54b2a596

Ronald S. Bultje authored 11 years ago

Change vp9_block_error() to return a 64bit error variable, change all
callers to expect a 64bit return value (this will prevent overflows,
which we basically don't check for at all right now). Remove duplicate
block_error() function, which fixed that through truncation. Remove
old (incompatible) mmx/sse2 block_error SIMD versions and replace with
a new one that returns a 64bit value.

Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to
3min23, i.e. a 3% overall speedup.

Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68

54b2a596

rename variables to avoid build error in MSVC · ee07a261
Yaowu Xu authored 11 years ago
```
Change-Id: I7960178c95c54d5c4497e44cfc8c493566294b34
```
ee07a261

20 Jun, 2013 - 3 commits

adds force partitioning greater than or less than block size · 9f2a1ae2

Jim Bankoski authored 11 years ago

adds a new speed feature to force partitioning to be greater than
or less than a certain size

Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0

9f2a1ae2

adds a set partitioning to speed features · 18bdf708

Jim Bankoski authored 11 years ago

this feature lets you set a partitioning size to be used by the entire
frame.

Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06

18bdf708

partition by variance using var from last frame · 476d73d2

Jim Bankoski authored 11 years ago

This uses variance to split partition. Variance is calculated using
nearest mv,  always from last ref frame.

Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896

476d73d2