Commits · e378566060e2f962e03d727fd3f184b051d37f5c · BC / public / external / libvpx

07 Sep, 2013 - 1 commit

Fix overflow issue in 16x16 quantization SSSE3 · 09bc942b

Jingning Han authored 11 years ago

The 16x16 transform unit test suggested that the peak coefficient
value can reach 32639. This could cause potential overflow issue
in the SSSE3 implmentation of 16x16 block quantization. This commit
fixes this issue by replacing addition with saturated addition.

Change-Id: I6d5bb7c5faad4a927be53292324bd2728690717e

09bc942b

06 Sep, 2013 - 2 commits

Support a constant quality mode in VP9 · e378a89b

Deb Mukherjee authored 11 years ago

Adds a new end-usage option for constant quality encoding in vpx. This
first version implemented for VP9, encodes all regular inter frames
using the quality specified in the --cq-level= option, while encoding
all key frames and golden/altref frames at a quality better than that.

The current performance on derfraw300 is +0.910% up from bitrate control,
but achieved without multiple recode loops per frame.

The decision for qp for each altref/golden/key frame will be improved
in subsequent patches based on better use of stats from the first pass.
Further, the qp for regular inter frames may also be varied around the
provided cq-level.

Change-Id: I6c4a2a68563679d60e0616ebcb11698578615fb3

e378a89b

New mode_info_context storage · dae17734

Scott LaVarnway authored 11 years ago

mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of a pointer to a MODE_INFO struct and
a "in the image" flag.  The MODE_INFO structs are now stored
as a stream, eliminating unnecessary copies and is a little
more cache friendly.

For the test clips used, the decoder performance improved
by ~4.3% (1080p) and ~9.7% (720p).

Patch Set 2: Re-encoded clips with latest. Now ~1.7% (1080p)
and 5.9% (720p).

Change-Id: I846f29e88610fce2523ca697a9a9ef2a182e9256

dae17734

05 Sep, 2013 - 1 commit

Use saturated addition in SSSE3 of 32x32 quant · 458c2833

Jingning Han authored 11 years ago

The 32x32 forward transform can potentially reach peak coefficient
value close to 32700, while the rounding factor can go upto 610.
This could cause overflow issue in the SSSE3 implementation of 32x32
quantization process.

This commit resolves this issue by replacing the addition operations
with saturated addition operations in 32x32 block quantization.

Change-Id: Id6b98996458e16c5b6241338ca113c332bef6e70

458c2833

04 Sep, 2013 - 2 commits

make vp9 postproc a config option · 79401542

Jim Bankoski authored 11 years ago

Vp9 postproc is disabled for now as its not been shown to help and
may be merged with vp8.

Change-Id: I25620d6cd34c6e10331b18c7b5ef7482e39c6057

79401542

faster accounting of inc_mv · 532179e8

Jim Bankoski authored 11 years ago

Moves counting of mv branches to where we have a new mv, instead of after
the whole frame is summed.

Change-Id: I945d9f6d9199ba2443fe816c92d5849340d17bbd

532179e8

03 Sep, 2013 - 1 commit

Attempt to fix speed 4 · 49317cdd

Paul Wilkins authored 11 years ago

Speed 4 fixed partition size. Use fixed size unless it does not
fit inside image, in which case use the largest size that does.

Change-Id: I250f7a80506750dd82ab355721624a1344247223

49317cdd

01 Sep, 2013 - 1 commit

Fix 32x32 forward transform SSE2 version · 3cf46fa5

Jingning Han authored 11 years ago

This commit fixed the potential overflow issue in the SSE2
implementation of 32x32 forward DCT. It resolved the corrupted
coded frames in the border of scenes.

Change-Id: If87eef2d46209269f74ef27e7295b6707fbf56f9

3cf46fa5

30 Aug, 2013 - 1 commit

Use correct bit cost while static-thresh is on · 0ca7855f

Yunqing Wang authored 11 years ago

While static-thresh is on, we only need to transmit skip
flag if skip = 1. The cost of skip bit is added to the
total rate cost.

Change-Id: I64e73e482bc297eba22907026298a15fa8cc3920

0ca7855f

29 Aug, 2013 - 5 commits

Added per pixel inter rd hit count stats · 1f4bf79d

Paul Wilkins authored 11 years ago

Added some code to output normalized rd hit count stats.
In effect this approximates to the average number of rd
operations/tests per pixel for the sequence.

The results are not quite accurate and I have not bothered
to account for partial SB64s at frame edges and for key frames
However they do give some idea of the number of modes /
prediction methods being tested for each pixel across the
different partition sizes. This indicates how much scope their
is for further gains either by reducing the number of partitions
examined or the modes per partition through heuristics.

Patch 3 moved place where count incremented so partial rd
tests that are aborted with INT_MAX return are also counted.

Example numbers for first 50 frames of Akiyo.
Speed 0 ~84.4 rd operations / pixel
Speed 1 ~28.8
Speed 2 ~11.9

Change-Id: Ib956e787e12f7fa8b12d3a1a2f6cda19a65a6cb8

1f4bf79d

consistently name VP9_COMMON variables #3 · d765df27
James Zern authored 11 years ago
```
stragglers

Change-Id: Ib1e853f9a331b7b66639dc34d79568d84d1930f1
```
d765df27
consistently name VP9_COMMON variables #1 · 924d7451
James Zern authored 11 years ago
```
pc -> cm

Change-Id: If3e83404f574316fdd3b9aace2487b64efdb66f3
```
924d7451

Fix overflow issue in SSSE3 32x32 quantization · abff6788

Jingning Han authored 11 years ago

The 32x32 quantization process can potentially have the intermediate
stacks over 16-bit range, thereby causing enc/dec mismatch. This commit
fixes this overflow issue in the SSSE3 implementation, as well as the
prototype, of 32x32 quantization.

This fixes issue 607 from webm@googlecode.

Change-Id: I85635e6ca236b90c3dcfc40d449215c7b9caa806

abff6788

Fixed potential overflows · aaa7b444

Yaowu Xu authored 11 years ago

The two arrays are typically initialized to INT64_MAX, if they are not
filled with valid values before the addition, the values can overflow
and lead to wrong results.

Change-Id: I515de22cf3e8f55af4b74bdb2c8eb821a02d3059

aaa7b444

28 Aug, 2013 - 3 commits

General code cleanup. · b62ddd5f

Dmitry Kovalev authored 11 years ago

Switching from mi_{width, height}_log2 and b_{width, height}_log2 to
num_8x8_blocks_{wide, high} and num_4x4_blocks_{wide, high}. Removing
redundant code, adding const.

Change-Id: Iaab2207590fd24d0b76999071778d1395dc5cd5d

b62ddd5f

Adds a speed feature for fast 1-loop forw updates · e02dc84c

Deb Mukherjee authored 11 years ago

Incorporates a speed feature for fast forward updates of
coefficients. This feature takes 3 values:
0 - use standard 2-loop version
1 - use a 1-loop version
2 - use a 1-loop version with reduced updates

Results: derfraw300 +0.007% (on speed 0) at feature value = 1
                    -0.160% (on speed 0) at feature value = 2

There is substantial speed up at speeds 2 and above for low
resolution sequences where the entropy updates are a big part
of the overall computations.

Change-Id: Ie96fc50777088a5bd441288bca6111e43d03bcae

e02dc84c

Renaming txfm_size to tx_size. · 851a2fd7
Dmitry Kovalev authored 11 years ago
```
Change-Id: I752e374867d459960995b24d197301d65ad535e3
```
851a2fd7

27 Aug, 2013 - 4 commits

Adding get_entropy_context function. · a93992e7

Dmitry Kovalev authored 11 years ago

Moving common code from encoder and decoder to this function.

Change-Id: I60fa643fb1ddf7ebbff5e83b6c4710137b0195ef

a93992e7

Renaming BLOCK_SIZE_TYPE to BLOCK_SIZE in the encoder. · 7b95f9bf
Dmitry Kovalev authored 11 years ago
```
Change-Id: I62bb07c377f947cb72fac68add7a6b199e42c6b9
```
7b95f9bf

Fix buf alignment in sub8x8 comp inter-inter pred · 2d6aadd7

Jingning Han authored 11 years ago

This commit resolved a mis-alignment issue in compound inter-inter
prediction of sub8x8. This patch follows solution from dkovalev@.

Change-Id: I3cc0cf7e55b84110e0c42ef4b2e6ca7ac3f8f932

2d6aadd7

fixed the reading too many bytes · 9482c079

Yaowu Xu authored 11 years ago

In subpel_avg_variance functions, code similar to the following

punpkldq m2, [addr]

actually reads 8 bytes. For functions that are supposed to work on
buffers only have less 8 bytes a line, this caused valgrind error
of reading uninitialized memory.

Change-Id: I2a4c079dbdbc747829bd9e2ed85f0018ad2a3a34

9482c079

26 Aug, 2013 - 3 commits

Cleaning up model_rd_for_sb_y_tx. · 657ee2d7

Dmitry Kovalev authored 11 years ago

Removing references to plane_block_width and plane_block_height (we are
going to delete the latter ones).

Change-Id: I7982da4d373aebb54d2209dc8886f6192df4d287

657ee2d7

Using num_8x8_* lookup tables instead of mi_*_log2. · b25589c6
Dmitry Kovalev authored 11 years ago
```
Change-Id: I8a246b3d056c98be614d05a90bc261e2441ffc10
```
b25589c6
Fix the reading of too many input pixels · 6c5433c8
Yaowu Xu authored 11 years ago
```
in VP9_get4x4var_mmx

Change-Id: I4b4a8f45f25ebdfad281f169cc87aba5e2d6f227
```
6c5433c8

24 Aug, 2013 - 2 commits

cosmetics: strip 'VP9_' from defines in vp9 only code · c8ba8c51
James Zern authored 11 years ago
```
Change-Id: I481d9bb2fa3ec72b6a83d5f04d545ad8013f295c
```
c8ba8c51

Renaming D27 to D207. · 50ee61db

Dmitry Kovalev authored 11 years ago

I've already renamed d27_predictor to d207_predictor but forgot about the
corresponding constant.

Change-Id: Id312aa80fc5b5a1ab8a709a33418a029552a6857

50ee61db

23 Aug, 2013 - 6 commits

Limit mv range to be based on partition size · 13930cf5

Yaowu Xu authored 11 years ago

Previous change c4048dbd limits the mv search range assuming max block
size of 64x64, this commit change the search range using actual block
size instead.

Change-Id: Ibe07ab02b62bf64bd9f8675d2b997af20a2c7e11

13930cf5

Cleanup in mvref_common.{h, c}. · 21d8e859

Dmitry Kovalev authored 11 years ago

Making code more compact, adding consts, removing redundant arguments,
adding do/while(0) for macros.

Change-Id: Ic9ec0bc58cee0910a5450b7fb8cfbf35fa9d0d16

21d8e859

Added border extension · 656632b7

Yaowu Xu authored 11 years ago

To the source buffer to be encoded as an alt ref frame. This is to fix
the problem of using uninitialized memory in encoder.

See https://code.google.com/p/webm/issues/detail?id=605

Change-Id: I97618a2fc207e08abcf5301b734aa9e3ad695e2c

656632b7

Changes to adaptive inter rd thresholds. · aa5b67ad

Paul Wilkins authored 11 years ago

Values now carried over frame to frame.
Change to algorithm for decreasing threshold after
a hit and to max threshold (now based on speed)

Removed some old commented out code relating to
VP8 adaptive thresholds.

The impact of these changes tested on Akiyo (50 frames)
and measured in terms of unit rd hits is as follows:

Speed 0 84.36 -> 84.67
Speed 1 29.48 -> 22.22
Speed 2 11.76 -> 8.21
Speed 3 12.32 -> 7.21

Encode speed impact is broadly in line with these.

Change-Id: I5b886efee3077a11553fa950d796fd6d00c8cb19

aa5b67ad

Limit Key frame Intra modes checks. · f76f52df

Paul Wilkins authored 11 years ago

Most of the focus so far has been on inter frames.

At high speed settings the key frame is now taking a high %
of the cycles.

This patch puts in some masking to reduce the number
of INTRA modes searched during key frame coding (as already
happens for inter frames) at higher speed settings

TODO: Develop this further with either adaptive rd thresholds
when choosing which intra modes to consider or some other
heuristic.

Impact.
At high speed settings on some clips the key frame was starting
to dominate. In a coding of the first 50 frames of AKIYO at speed
2 limiting the key frame intra modes to DC or TM_PRED resulted in
~30% overall speedup. For Bus the number was lower at ~4-5%.

Change-Id: I7bde68aee04995f9d9beb13a1902143112e341e2

f76f52df

Fix rectangular partition check flag · 84f3b76e

Jingning Han authored 11 years ago

Put rectangular partition check flag change according to the rd
costs of NONE and SPLIT partition types under the speed feature.

Change-Id: If681e1e078a8d43d86961ea4b748da5cd1b6c331

84f3b76e

22 Aug, 2013 - 8 commits

vp9_encodeframe.c cleanup. · 604022d4

Dmitry Kovalev authored 11 years ago

Removing unused get_sbuv_perpixel_variance function, using has_second_ref/
is_inter_block functions, organizing includes.

Change-Id: I016de4af12fbbb8b4ece26a70759b2392651b095

604022d4

check_bsize_coverage cleanup. · 335b1d36
Dmitry Kovalev authored 11 years ago
```
Change-Id: Ib7803857b35c00e317c9deb8630e777e25eb278f
```
335b1d36

Checking scale factors on access. · 3c426572

Dmitry Kovalev authored 11 years ago

It is possible to have invalid scale factors and not access them
during decoding. Error is reported if we really try to use invalid scale
factors.

Change-Id: Ie532d3ea7325ee0c7a6ada08269f804350c80fdf

3c426572

rename LOG2_* defines to *_LOG2 · 40ae02c2

James Zern authored 11 years ago

gets rid of a mix of styles

Change-Id: I3591d312157bc6f53a25438bf047765c671fd8a8

40ae02c2

vp9/encoder: fix last_frame_seg_map mem leak · a5726ac4

James Zern authored 11 years ago

remove duplicate allocation from vp9_create_compressor, it was added to
vp9_alloc_frame_buffers in:

d5bec522 Added resizing & initialization of last frame segment map

Change-Id: I996723226a16a62aff8f9a52ac74e0b73cc98fdf

a5726ac4

Adding vp9_is_scaled function. · 640dea4d
Dmitry Kovalev authored 11 years ago
```
Change-Id: Ieb7077ca3586b9491912027eed450a4f6fd38d30
```
640dea4d

Refactor rd_pick_partition for parameter control · 01a37177

Jingning Han authored 11 years ago

This commit changes the partition search order of superblocks from
{SPLIT, NONE, HORZ, VERT} to {NONE, SPLIT, HORZ, VERT} for
consistency with that of sub8x8 partition search. It enable the use
of early termination in partition search for all block sizes.

For ped_area_1080p 50 frames coded at 4000 kbps, it makes the runtime
goes down from 844305ms -> 818003ms (3% speed-up) at speed 0.

This will further move towards making the in-search partition types
configurable, hence unifying various speed-up approaches.

Some speed 1 and 2 features are turned off during the refactoring
process, including:
disable_split_var_thresh
using_small_partition_info

Stricter constraints are applied to use_square_partition_only for
right/bottom boundary blocks. Will bring back/refine these features
subsequently. At this point, it makes derf set at speed 1 about
0.45% higher in compression performance, and 9% down in run-time.

Change-Id: I3db9f9d1d1a0d6cbe2e50e49bd9eda1cf705f37c

01a37177

Fixes on feature disabling split based on variance · 8b810c7a

Deb Mukherjee authored 11 years ago

Adds a couple of minor fixes, which may be absorbed in Jingning's
patch. Thanks to Guillaume for pointing these out.
Also adjusts the thresholds for speed 1 and 2 to 16 and 32
respectively, to keep quality drops small.

Results:
--------
derfraw300:  threshold = 16, psnr -0.082%, speedup 2-3%
             threshold = 32, psnr -0.218%, speedup 5-6%
stdhdraw250: threshold = 16, psnr -0.031%, speedup 2-3%
             threshold = 32, psnr -0.273%, speedup 5-6%

Change-Id: I4b11ae8296cca6c2a9f644be7e40de7c423b8330

8b810c7a