Commits · 5cfd82bcaf666c1b9cacd8a4899fc703598aa5b0 · BC / public / external / libvpx

07 Feb, 2013 - 2 commits
- Use fdct8x4 instead of fdct4x4 where the block size allows it. · 5cfd82bc
  Ronald S. Bultje authored 12 years ago
```
This allows for faster SIMD implementations in the future (currently
there is no speed impact).

Change-Id: I732647e9148b5dcb44e6bc8728138f0141218329
```
  5cfd82bc
- Use configure checks for various inline keywords. · aac73df1
  Ronald S. Bultje authored 12 years ago
```
Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24
```
  aac73df1
06 Feb, 2013 - 4 commits

Add sse2 versions of sub_pixel_variance{32x32,64x64}. · a788e0fe
Ronald S. Bultje authored 12 years ago
```
7.5% faster overall encoding.

Change-Id: Ie9bb7f9fdf93659eda106404cb342525df1ba02f
```
a788e0fe

Ronald S. Bultje authored 12 years ago

Indentation was off by 2 spaces for this particular block.

Change-Id: I1e587b7ad3eff77ade5521252d20c7bb2daa0f6d

55cafb61

Eliminate tautology · 31cbe2ed

John Koleszar authored 12 years ago

      Unreachable code
  that does nothing anyway
      removed forever.

Change-Id: I14105d2dd9dbc9d558f36464055e350dbeb45488

31cbe2ed

Fix mismatch after merge of the tiling patch. · 278df745
Ronald S. Bultje authored 12 years ago
```
Change-Id: I8ecc178b4d4069e721c7fec6d7631c00e4a3e5d5
```
278df745

05 Feb, 2013 - 5 commits

[WIP] Add column-based tiling. · 1407bdc2

Ronald S. Bultje authored 12 years ago

This patch adds column-based tiling. The idea is to make each tile
independently decodable (after reading the common frame header) and
also independendly encodable (minus within-frame cost adjustments in
the RD loop) to speed-up hardware & software en/decoders if they used
multi-threading. Column-based tiling has the added advantage (over
other tiling methods) that it minimizes realtime use-case latency,
since all threads can start encoding data as soon as the first SB-row
worth of data is available to the encoder.

There is some test code that does random tile ordering in the decoder,
to confirm that each tile is indeed independently decodable from other
tiles in the same frame. At tile edges, all contexts assume default
values (i.e. 0, 0 motion vector, no coefficients, DC intra4x4 mode),
and motion vector search and ordering do not cross tiles in the same
frame.
t log

Tile independence is not maintained between frames ATM, i.e. tile 0 of
frame 1 is free to use motion vectors that point into any tile of frame
0. We support 1 (i.e. no tiling), 2 or 4 column-tiles.

The loopfilter crosses tile boundaries. I discussed this briefly with Aki
and he says that's OK. An in-loop loopfilter would need to do some sync
between tile threads, but that shouldn't be a big issue.

Resuls: with tiling disabled, we go up slightly because of improved edge
use in the intra4x4 prediction. With 2 tiles, we lose about ~1% on derf,
~0.35% on HD and ~0.55% on STD/HD. With 4 tiles, we lose another ~1.5%
on derf ~0.77% on HD and ~0.85% on STD/HD. Most of this loss is
concentrated in the low-bitrate end of clips, and most of it is because
of the loss of edges at tile boundaries and the resulting loss of intra
predictors.

TODO:
- more tiles (perhaps allow row-based tiling also, and max. 8 tiles)?
- maybe optionally (for EC purposes), motion vectors themselves
  should not cross tile edges, or we should emulate such borders as
  if they were off-frame, to limit error propagation to within one
  tile only. This doesn't have to be the default behaviour but could
  be an optional bitstream flag.

Change-Id: I5951c3a0742a767b20bc9fb5af685d9892c2c96f

1407bdc2

Add SSE3 versions for sad{32x32,64x64}x4d functions. · 58c983d1
Ronald S. Bultje authored 12 years ago
```
Overall encoding about 15% faster.

Change-Id: I176a775c704317509e32eee83739721804120ff2
```
58c983d1

rewrite 4x4 idct and fdct · fa36981e

Yaowu Xu authored 12 years ago

This commit changes the 4x4 iDCT to use same algorithm & constants as
other iDCTs. The 4x4 fDCT is also changed to be based on the new iDCT.

Change-Id: Ib1a902693228af903862e1f5a08078c36f2089b0

fa36981e

Change definition of NearestMV. · 81043e8d

Paul Wilkins authored 12 years ago

This commit makes the NearestMV match the chosen
best reference MV. It can be a 0,0 or non zero vector
which means the the compound nearest mv mode can
combine a 0,0 and a non zero vector.

Change-Id: I2213d09996ae2916e53e6458d7d110350dcffd7a

81043e8d

Added vp9_short_idct1_32x32_c · 5780c4cb

Scott LaVarnway authored 12 years ago

and called this function in vp9_dequant_idct_add_32x32_c when
eob == 1.  For the test clip used, the decoder performance improved
by 21+%.  Based on Yaowu's 16 point idct work.

Change-Id: Ib579a90fed531d45777980e04bf0c9b23c093c43

5780c4cb

04 Feb, 2013 - 3 commits

Re-factor code for rd thresholds. · 3ab53876

Paul Wilkins authored 12 years ago

Separate out code to set the main encode speed
related rd thresholds. Some values changed from
the initial defaults for various new modes.

Quality test results pending but even the addition
of some further non-zero defaults helps encode speed
somewhat in limited testing on derf clips.

Adjustment of thresholds for quality / speed tradeoff
to follow.

Change-Id: I117ee473157e151a1b93193d5f393449328de20d

3ab53876

re-write 8 point idct · 1eb79dc1

Yaowu Xu authored 12 years ago

to be consistent with idct16 and idct32.

Change-Id: Ie89dbd32b65c33274b7fecb4b41160fcf1962204

1eb79dc1

a couple of minor fixes · ccaaeb4b

Yaowu Xu authored 12 years ago

fixed a function prototypes to prevent compiler warnings;
removed a function not in use;
un-capitialize "Refstride" to ref_stride

Change-Id: Ib4472b6084f357d96328c6a06e795b6813a9edba

ccaaeb4b

01 Feb, 2013 - 1 commit

Changes 16 point idct · 91e0e801

Yaowu Xu authored 12 years ago

This commit changes the inverse 16 point dct to use the same algorithm
as the one for 32 point idct. In fact, now 16 point dct uses the exact
version of the souce code for even portion of the 32 point idct.

Tests showed current implementation has significant better accuracy
than the previous version. With this implementation and the minor bug
fix on forward 16 point dct, encoding tests showed about 0.2% better
compression of CIF set, test results on std-hd setting pending.

Change-Id: I68224b60c816ba03434e9f08bee147c7e344fb63

91e0e801

31 Jan, 2013 - 2 commits

fix a small bug in 16 point forward dct · ab1cad9b

Yaowu Xu authored 12 years ago

The commit fixes a minor error in 16 point fdct where in a rotation can
produce result of -1 instead of 0.

Change-Id: I45aac4a52bcd06225c6d04e643547a13e1c1aade

ab1cad9b

A fix point implementation of 32x32 idct · 5149d7f7

Yaowu Xu authored 12 years ago

This commit changes the 32x32 idct to use integer only. The algorithm
was taken directly from "A Fast Computational Algorithm for the
Discrete Cosine Tranform" by W. Chen, et al., which was published in
IEEE Transaction on Communication Vol. Com.-25 No. 9, 1977. The signal
flow graph in the original paper is for a 32 point forward dct, the
current implementation of inverse DCT was done by follow the graph in
reversed direction.

With this implementation, the 32 point inverse dct contains a 16 point
inverse dct in its even portion, similarly the 16 point idct further
contains 8 point and 4 point inverse dcts.

As of patch 4, encoding tests showed there is no compression loss when
compared against the floating point baseline. Numbers even showed very
small postives. (cif: .01%, std-hd: .05%).

Change-Id: I2d2d17a424b0b04b42422ef33ec53f5802b0f378

5149d7f7

30 Jan, 2013 - 4 commits

don't code the branch for the predicted seg_id if that flag is false. · 3a4b18bc
Ronald S. Bultje authored 12 years ago
```
Change-Id: Icb6e21dc0c2d9918faa33c8bf70943660df7ad88
```
3a4b18bc

Default superblock skip flag to 32x32 for skip-blocks. · 3febf970

Ronald S. Bultje authored 12 years ago

This is identical to the later decisions made in encode_superblock().
This commit doesn't actually change anything, but makes the mbmi state
more consistent between the RD loop and the final encode result.

Change-Id: I9e735afb7c5a52e5b61728cb88c67ef9b9bf59be

3febf970

Reset skip flag in superblock RD loop. · b90996c5

Ronald S. Bultje authored 12 years ago

This is the superblock equivalent of commit 290b83ab.

Change-Id: Ib3945dd9e992fa9ec1fdea5a11e17a3cc0e37637

b90996c5

Write only visible area (for better comparison with rec.yuv). · 2f6fce3e
Ronald S. Bultje authored 12 years ago
```
Change-Id: I32bf4ee532a15af78619cbcd8a193224029fab50
```
2f6fce3e

29 Jan, 2013 - 3 commits

Fix block pointer corruption in intra8x8 prediction with 4x4 transform. · ffc2e4f4

Ronald S. Bultje authored 12 years ago

The RD loop would change the pointer after the first mode (DC) was tested,
leading to corrupt block objects being provided for the others. This
would essentially render the i8x8 predictor useless.

Change-Id: I16c5906ca64fb34878ac32ce59af8974e4582bb8

ffc2e4f4

Remove eob_max_offset markers. · 93762ca9

Paul Wilkins authored 12 years ago

Remove eob_max_offset markers and replace
with the generic skip_block flag to indicate
to the quantizer that all coeffs to be set to 0
and eob position set to 0;

Change-Id: Id477e8f8d4ec1a5562758904071013c24b76bfd7

93762ca9

Further improvement on compound inter-intra expt · 3b04d467

Deb Mukherjee authored 12 years ago

Adds a special combination mode specific to intra prediciton
mode D45.

Current results with the compound inter/intra experiment:
derf: 0.2%
yt: 0.55%
std-hd: 0.75%
hd: 0.74%

Change-Id: I8976bdf3b9b0b66ab8c5c628bbc62c14fc72ca86

3b04d467

28 Jan, 2013 - 2 commits

Segment Skip Flag · 0ff9b033

Paul Wilkins authored 12 years ago

First step in simplifying the segment mode and
segment EOB flags into a simpler segment skip
flag that implies 0,0 mv and EOB at position 0.

Change-Id: Ib750cac31a7a02dc21082580498efd9f7d8d72a5

0ff9b033

Simplify Zero bin and zero bin run code. · 8e2c03fb

Paul Wilkins authored 12 years ago

Simplification to eliminate a number of very large data
data structures. All zero run, zbin boosts for different
transform sizes are now limited to a maximum run length
of 15 before they max out the boost.

Some further work still needs be done to refactor, rationalize
and optimize the multiple quantizer functions.

The simplification coupled with tweaks to the 16 element array
now used for all transform sizes, has minimal effect on quality.

Change-Id: I6f3948b8ca0418b60d4db9030ff19026a34ed423

8e2c03fb

26 Jan, 2013 - 2 commits

Fix overread/write reported by valgrind if (mb_cols) & 3 != 0. · 9dc9f07f

Ronald S. Bultje authored 12 years ago

We'd backup and restore all cols for a 64x64 SB, but the array wouldn't
be big enough to hold all that data.

Change-Id: Ic68ea721bf07e0b2f3937bd16b0b734bcc743ce1

9dc9f07f

Adding a frame parallel decoding mode · dfd89f2e

Deb Mukherjee authored 12 years ago

Adds a flag to disable features that would inhibit frame parallel
decoding. This includes backward adaptation and MV sorting based
on search in ref frame buffer.

Also includes some minor clean-ups.

Change-Id: I434846717a47b7bcb244b37ea670c5cdf776f14d

dfd89f2e

25 Jan, 2013 - 2 commits

Added eob == 0 check to vp9_dequant_idct_add_32x32_c · 9d4c2653

Scott LaVarnway authored 12 years ago

Added a quick eob == 0 check.  Once the integer version of the dct32x32 is
complete, we can check for other eob cases.

For the 1080p clip used, the decoder performance improved by 4%.

Change-Id: I9390b6ed3c8be0c0c0a0c44c578d9a031d6e026e

9d4c2653

Remove "update_context" variable from VP9_COMP context. · 0a7b3953
Ronald S. Bultje authored 12 years ago
```
The variable is always zero.

Change-Id: Id5cdbecad543bca465a5b1d471badaec7e112c8d
```
0a7b3953

24 Jan, 2013 - 2 commits

Mvref speedup · fcb4a25c

Paul Wilkins authored 12 years ago

Quality / decode speed trade off changes.
Simpler insert method without sort. Quality impact small.

Change-Id: Id0c0941bc508d985405abd06a13ffe7489170b62

fcb4a25c

Adds an error-resilient mode with test · 01cafaab

Deb Mukherjee authored 12 years ago

Adds an error-resilient mode where frames can be continued
to be decoded even when there are errors (due to network losses)
on a prior frame. Specifically, backward updates are turned off
and probabilities of various symbols are reset to defaults at
the beginning of each frame. Further, the last frame's mvs are
not used for the mv reference list, and the sorting of the
initial list based on search on previous frames is turned off
as well.

Also adds a test where an arbitrary set of frames are skipped
from decoding to simulate errors. The test verifies (1) that if
the error frames are droppable - i.e. frame buffer updates have
been turned off - there are no mismatch errors for the remaining
frames after the error frames; and (2) if the error-frames are non
droppable, there are not only no decoding errors but the mismatch
PSNR between the decoder's version of the post-error frames and the
encoder's version is at least 20 dB.

Change-Id: Ie6e2bcd436b1e8643270356d3a930e8989ff52a5

01cafaab

23 Jan, 2013 - 1 commit

Intrinsic version of loopfilter now matches C code · 6a997400

Scott LaVarnway authored 12 years ago

Updated the instrinsic code to match Yaowu's latest loopfilter change.
(I584393906c4f5f948a581d6590959522572743bb)

The decoder performance improved by ~30% for the test clip used.

Change-Id: I026cfc75d5bcb7d8d58be6f0440ac9e126ef39d2

6a997400

18 Jan, 2013 - 2 commits

Use alt-ref frame context for keyframes · 2f24ad9e

John Koleszar authored 12 years ago

This matches the behavior prior to generalizing the frame context
selection, and intuitively makes sense in that the first forward ref
is immediately after the keyframe, so it's quality is improved a bit
by using the keyframe's entropy context rather than the default.

Change-Id: Ia82cef79382b9d8cfafdc44ba0533d4dc3e44053

2f24ad9e

a minor change to a portion of loop filtering · b95ed688

Yaowu Xu authored 12 years ago

The loop filtering used for MB edge or internal edge of a MB using 8x8
tranform was reading 5 pixel each side and writting 3 pixel each side.
With suggestion from Aki and Scott on hardware&software performance,
this commit changed to read 4 pixel each side and write 3 pixel each
side.

Change-Id: I584393906c4f5f948a581d6590959522572743bb

b95ed688

16 Jan, 2013 - 5 commits

Preserve the previous golden frame on golden updates · 26bd81b9

John Koleszar authored 12 years ago

This commit restores the quality lost when the buffer-to-buffer copy
logic was removed. Note that this is specific to the current use of
golden frames and will need rework when RTC functionality is added.

Change-Id: I7324a75acd96eafd9e0f9b8633d782e390d5dc21

26bd81b9

Generalize and increase frame coding contexts · 4b65837b

John Koleszar authored 12 years ago

Previously there were two frame coding contexts tracked, one for normal
frames and one for alt-ref frames. Generalize this by signalling the
context to use in the bitstream, rather than tieing it to the alt ref
refresh bit. Also increase the number of contexts available to 4, which
may be useful for temporal scalability.

Change-Id: I7b66daaddd55c535c20cd16713541fab182b1662

4b65837b

Start to anonymize reference frames · da832a80

John Koleszar authored 12 years ago

Remove lst_fb_idx, gld_fb_idx, alt_fb_idx, refresh_last_frame,
refresh_golden_frame, refresh_alt_ref_frame from common. Gold/Alt are
encode side conventions. From the decoder's perspective, we want to be
dealing with numbered references.

Updates to active_ref 2 signal mode context switches, vestigial from
refresh_alt_ref_frame. This needs some clean up to make sense with
increased numbers of reference frames, as well as reimplementing the
swapping of alt/golden which was previously done using the
buffer-to-buffer copy mechanism removed in an earlier commit.

Change-Id: I7334445158b7666f9295d2a2dd22aa03f4485f58

da832a80

Update encoder to use fb_idx_ref_cnt · 394b0a6a

John Koleszar authored 12 years ago

Do reference counting the same way on the encoder as the decoder does,
rather than maintaining the 'flags' member of YV12_BUFFER_CONFIG.

Change-Id: I91dc210ffca081acaf9d5c09a06e7461b3c3139c

394b0a6a

Remove buffer-to-buffer copy logic · b8e02798

John Koleszar authored 12 years ago

This is the first in a series of commits to add additional reference
frames to the codec. Each frame will be able to update any of the
available references, but copying between references is not
supported.

Change-Id: I5945b5ce6cc3582c495102b4e7eed4f08c44d5a1

b8e02798