Commits · 6dfc95fe63a52945350a7c8a234e87a4a55645db · BC / public / external / libvpx

08 Feb, 2013 - 6 commits

Merge changes Icd1a2a5a,I204d17a1,I3ed92117 into experimental · 6dfc95fe

John Koleszar authored 12 years ago

* changes:
  Initial support for resolution changes on P-frames
  Avoid allocating memory when resizing frames
  Adds a test for the VP8E_SET_SCALEMODE control

6dfc95fe

Merge changes Ife0d8147,I7d469716,Ic9a5615f into experimental · 3de8ee6b

John Koleszar authored 12 years ago

* changes:
  Restore SSSE3 subpixel filters in new convolve framework
  Convert subpixel filters to use convolve framework
  Add 8-tap generic convolver

3de8ee6b

Initial support for resolution changes on P-frames · 393b4856

John Koleszar authored 12 years ago

Allows inter-frames to change resolution. Currently these are
almost equivalent to keyframes, as only intra prediction modes
are allowed, but without the other context resets that occur on
keyframes.

Change-Id: Icd1a2a5af0d9462cc792588427b0a1f5b12e40d3

393b4856

Avoid allocating memory when resizing frames · c03d45de

John Koleszar authored 12 years ago

As long as the new frame is smaller than the size that was originally
allocated, we don't need to free and reallocate the memory allocated.
Instead, do the allocation on the size of the first frame. We could
make this passed in from the application instead, if we wanted to
support external upscaling.

Change-Id: I204d17a130728bbd91155bb4bd863a99bb99b038

c03d45de

Adds a test for the VP8E_SET_SCALEMODE control · 88f99f4e

John Koleszar authored 12 years ago

Tests that the external interface to set the internal codec scaling
works as expected. Also updates the test to pull the height from
the decoded frame size rather than parsing the keyframe header,
in anticipation of allowing resolution changes on non-keyframes.

Change-Id: I3ed92117d8e5288fbbd1e7b618f2f233d0fe2c17

88f99f4e

Restore SSSE3 subpixel filters in new convolve framework · 29d47ac8

John Koleszar authored 12 years ago

This commit adds the 8 tap SSSE3 subpixel filters back into the code
underneath the convolve API. The C code is still called for 4x4
blocks, as well as compound prediction modes. This restores the
encode performance to be within about 8% of the baseline.

Change-Id: Ife0d81477075ae33c05b53c65003951efdc8b09c

29d47ac8

07 Feb, 2013 - 5 commits

move dct/idct constants to a header file · e6ad9ab0

Yaowu Xu authored 12 years ago

also removed some un-unsed functions.

Change-Id: Ie363bcc8d94441d054137d2ef7c4fe59f56027e5

e6ad9ab0

Butterfly ADST based hybrid transform · d15e1da4

Jingning Han authored 12 years ago

Refactor the 8x8 inverse hybrid transform. It is now consistent
with the new inverse DCT. Overall performance loss (due to the
use of this variant ADST, and the rounding errors in the butterfly
implementation) for std-hd is -0.02.

Fixed BUILD warning.

Devise a variant of the original ADST, which allows butterfly
computation structure. This new transform has kernel of the
form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures
using floating-point multiplications was reported in Z. Wang,
"Fast algorithms for the discrete W transform and for the discrete
Fourier transform", IEEE Trans. on ASSP, 1984.

This patch includes the butterfly implementation of the inverse
ADST/DCT hybrid transform of dimension 8x8.

Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d

d15e1da4

Added skip switches for SB32 and SB64 · 29731308

Paul Wilkins authored 12 years ago

Added switches and code to skip/breakout from
doing SB32 and SB64 tests based on whether
the 16x16 MB tests used split modes. Also to
optionally skip 64x64 if 16x16 was chosen over
32x32.

Impact varies depending on clip from a few %
up to almost 50% on encode speed. Only the
split mode breakout is currently enabled.

Change-Id: Ib5836140b064b350ffa3057778ed2cadcc495cf8

29731308

Use fdct8x4 instead of fdct4x4 where the block size allows it. · 5cfd82bc

Ronald S. Bultje authored 12 years ago

This allows for faster SIMD implementations in the future (currently
there is no speed impact).

Change-Id: I732647e9148b5dcb44e6bc8728138f0141218329

5cfd82bc

Use configure checks for various inline keywords. · aac73df1
Ronald S. Bultje authored 12 years ago
```
Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24
```
aac73df1

06 Feb, 2013 - 6 commits
- Add sse2 versions of sub_pixel_variance{32x32,64x64}. · a788e0fe
  Ronald S. Bultje authored 12 years ago
```
7.5% faster overall encoding.

Change-Id: Ie9bb7f9fdf93659eda106404cb342525df1ba02f
```
  a788e0fe
- Merge "Reindent segmentation code." into experimental · a001fe97
  Ronald S. Bultje authored 12 years ago
  
  a001fe97
- Reindent segmentation code. · 55cafb61
  Ronald S. Bultje authored 12 years ago
```
Indentation was off by 2 spaces for this particular block.

Change-Id: I1e587b7ad3eff77ade5521252d20c7bb2daa0f6d
```
  55cafb61
- Eliminate tautology · 31cbe2ed
  John Koleszar authored 12 years ago
```
      Unreachable code
  that does nothing anyway
      removed forever.

Change-Id: I14105d2dd9dbc9d558f36464055e350dbeb45488
```
  31cbe2ed
- Merge "Change definition of NearestMV." into experimental · 8b4e9c59
  Paul Wilkins authored 12 years ago
  
  8b4e9c59
- Fix mismatch after merge of the tiling patch. · 278df745
  Ronald S. Bultje authored 12 years ago
```
Change-Id: I8ecc178b4d4069e721c7fec6d7631c00e4a3e5d5
```
  278df745
05 Feb, 2013 - 13 commits

[WIP] Add column-based tiling. · 1407bdc2

Ronald S. Bultje authored 12 years ago

This patch adds column-based tiling. The idea is to make each tile
independently decodable (after reading the common frame header) and
also independendly encodable (minus within-frame cost adjustments in
the RD loop) to speed-up hardware & software en/decoders if they used
multi-threading. Column-based tiling has the added advantage (over
other tiling methods) that it minimizes realtime use-case latency,
since all threads can start encoding data as soon as the first SB-row
worth of data is available to the encoder.

There is some test code that does random tile ordering in the decoder,
to confirm that each tile is indeed independently decodable from other
tiles in the same frame. At tile edges, all contexts assume default
values (i.e. 0, 0 motion vector, no coefficients, DC intra4x4 mode),
and motion vector search and ordering do not cross tiles in the same
frame.
t log

Tile independence is not maintained between frames ATM, i.e. tile 0 of
frame 1 is free to use motion vect...

1407bdc2

Merge "Add SSE3 versions for sad{32x32,64x64}x4d functions." into experimental · 82286413
Ronald S. Bultje authored 12 years ago

82286413
Merge "fix a build issue with MSVC on windows" into experimental · 9e3e7439
Yaowu Xu authored 12 years ago

9e3e7439
Merge "rewrite 4x4 idct and fdct" into experimental · c9ae73b2
Yaowu Xu authored 12 years ago

c9ae73b2
Add SSE3 versions for sad{32x32,64x64}x4d functions. · 58c983d1
Ronald S. Bultje authored 12 years ago
```
Overall encoding about 15% faster.

Change-Id: I176a775c704317509e32eee83739721804120ff2
```
58c983d1

Convert subpixel filters to use convolve framework · 7a07eea1

John Koleszar authored 12 years ago

Update the code to call the new convolution functions to do subpixel
prediction rather than the existing functions. Remove the old C and
assembly code, since it is unused. This causes a 50% performance
reduction on the decoder, but that will be resolved when the asm for
the new functions is available.

There is no consensus for whether 6-tap or 2-tap predictors will be
supported in the final codec, so these filters are implemented in
terms of the 8-tap code, so that quality testing of these modes
can continue. Implementing the lower complexity algorithms is a
simple exercise, should it be necessary.

This code produces slightly better results in the EIGHTTAP_SMOOTH
case, since the filter is now applied in only one direction when
the subpel motion is only in one direction. Like the previous code,
the filtering is skipped entirely on full-pel MVs. This combination
seems to give the best quality gains, but this may be indicative of a
bug in the encoder's filter selection, since the encoder could
achieve the result of skipping the filtering on full-pel by selecting
one of the other filters. This should be revisited.

Quality gains on derf positive on almost all clips. The only clip
that seemed to be hurt at all datarates was football
(-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR,
0.347% SSIM.

Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff

7a07eea1

Add 8-tap generic convolver · 5ca6a366

John Koleszar authored 12 years ago

This commit introduces a new convolution function which will be used to
replace the existing subpixel interpolation functions. It is much the
same as the existing functions, but allows for changing the filter
kernel on a per-pixel basis, and doesn't bake in knowledge of the
filter to be applied or the size of the resulting block into the
function name.

Replacing the existing subpel filters will come in a later commit.

Change-Id: Ic9a5615f2f456cb77f96741856fc650d6d78bb91

5ca6a366

fix a build issue with MSVC on windows · 77f889b2
Yaowu Xu authored 12 years ago
```
for idct 16x16 unit test

Change-Id: I51da9405c3a4d7bb3f4cdf062aaccaa90b33dca4
```
77f889b2

rewrite 4x4 idct and fdct · fa36981e

Yaowu Xu authored 12 years ago

This commit changes the 4x4 iDCT to use same algorithm & constants as
other iDCTs. The 4x4 fDCT is also changed to be based on the new iDCT.

Change-Id: Ib1a902693228af903862e1f5a08078c36f2089b0

fa36981e

Change definition of NearestMV. · 81043e8d

Paul Wilkins authored 12 years ago

This commit makes the NearestMV match the chosen
best reference MV. It can be a 0,0 or non zero vector
which means the the compound nearest mv mode can
combine a 0,0 and a non zero vector.

Change-Id: I2213d09996ae2916e53e6458d7d110350dcffd7a

81043e8d

Merge "Added vp9_short_idct1_32x32_c" into experimental · 77440d50
Scott LaVarnway authored 12 years ago

77440d50
Merge "Re-factor code for rd thresholds." into experimental · fb4b533d
Paul Wilkins authored 12 years ago

fb4b533d

Added vp9_short_idct1_32x32_c · 5780c4cb

Scott LaVarnway authored 12 years ago

and called this function in vp9_dequant_idct_add_32x32_c when
eob == 1.  For the test clip used, the decoder performance improved
by 21+%.  Based on Yaowu's 16 point idct work.

Change-Id: Ib579a90fed531d45777980e04bf0c9b23c093c43

5780c4cb

04 Feb, 2013 - 5 commits

Re-factor code for rd thresholds. · 3ab53876

Paul Wilkins authored 12 years ago

Separate out code to set the main encode speed
related rd thresholds. Some values changed from
the initial defaults for various new modes.

Quality test results pending but even the addition
of some further non-zero defaults helps encode speed
somewhat in limited testing on derf clips.

Adjustment of thresholds for quality / speed tradeoff
to follow.

Change-Id: I117ee473157e151a1b93193d5f393449328de20d

3ab53876

Added INT16_MIN and INT16_MAX for MSVC builds · dea14332

Yaowu Xu authored 12 years ago

These macros were not defined in earlier version of MSVC

Change-Id: I8270a3abb7c6e9ead1931a653d7e41f877a1017b

dea14332

enable 16x16 iDCT unit test · ebd58089

Yaowu Xu authored 12 years ago

test for forward transform will be enabled later after re-do forward
transform

Change-Id: Ie7c7cf88baf7ecbebbe52fe027e1c3b33d3b9d49

ebd58089

re-write 8 point idct · 1eb79dc1

Yaowu Xu authored 12 years ago

to be consistent with idct16 and idct32.

Change-Id: Ie89dbd32b65c33274b7fecb4b41160fcf1962204

1eb79dc1

a couple of minor fixes · ccaaeb4b

Yaowu Xu authored 12 years ago

fixed a function prototypes to prevent compiler warnings;
removed a function not in use;
un-capitialize "Refstride" to ref_stride

Change-Id: Ib4472b6084f357d96328c6a06e795b6813a9edba

ccaaeb4b

01 Feb, 2013 - 3 commits

Merge "Changes 16 point idct" into experimental · af4c9d2f
Yaowu Xu authored 12 years ago

af4c9d2f
Merge "fix a small bug in 16 point forward dct" into experimental · c1f611be
Yaowu Xu authored 12 years ago

c1f611be

Changes 16 point idct · 91e0e801

Yaowu Xu authored 12 years ago

This commit changes the inverse 16 point dct to use the same algorithm
as the one for 32 point idct. In fact, now 16 point dct uses the exact
version of the souce code for even portion of the 32 point idct.

Tests showed current implementation has significant better accuracy
than the previous version. With this implementation and the minor bug
fix on forward 16 point dct, encoding tests showed about 0.2% better
compression of CIF set, test results on std-hd setting pending.

Change-Id: I68224b60c816ba03434e9f08bee147c7e344fb63

91e0e801

31 Jan, 2013 - 2 commits
- fix a small bug in 16 point forward dct · ab1cad9b
  Yaowu Xu authored 12 years ago
```
The commit fixes a minor error in 16 point fdct where in a rotation can
produce result of -1 instead of 0.

Change-Id: I45aac4a52bcd06225c6d04e643547a13e1c1aade
```
  ab1cad9b
- Merge "A fix point implementation of 32x32 idct" into experimental · c94e55ad
  Yaowu Xu authored 12 years ago
  
  c94e55ad