Commits · 83c7e13a6bcd1535d9547ef3c89816bf993b458b · BC / public / external / libvpx

03 Jul, 2013 - 1 commit

Refactor SSE2 8x8 functional units · 2cb75c96

Jingning Han authored 11 years ago

These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT
hybrid transform coding.

Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d

2cb75c96

29 Jun, 2013 - 1 commit

SSE2 version of vp9_short_fdct32x32_rd. · 466e0cf3

Christian Duvivier authored 11 years ago

43,000 -> 5,750 cycles, about 7.5x faster.

Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0

466e0cf3

25 Jun, 2013 - 1 commit

Add 8x8 dct/adst unit tests · ab362621

Jingning Han authored 11 years ago

This commit enables 8x8 DCT and hybrid transform unit tests. It
also tunes the forward hybrid transform rounding opertions for
more precise round-trip performance.

Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3

ab362621

18 Jun, 2013 - 1 commit

Make fdct32 computation flow within 16bit range · a41a4860

Jingning Han authored 11 years ago

This commit makes use of dual fdct32x32 versions for rate-distortion
optimization loop and encoding process, respectively. The one for
rd loop requires only 16 bits precision for intermediate steps.
The original fdct32x32 that allows higher intermediate precision (18
bits) was retained for the encoding process only.

This allows speed-up for fdct32x32 in the rd loop. No performance
loss observed.

Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3

a41a4860

30 May, 2013 - 1 commit

Changed to use a new variant of WHT · 042e70e4

Yaowu Xu authored 11 years ago

The commit changed to use a new variant of Walsh-Hadamard Transform
by Tim Terriberry. This new variant has the best compression among a
number of variants that developed by Tim.

Change-Id: Icb3a88515463cfc644b17ca046fcd139db2557e9

042e70e4

27 May, 2013 - 1 commit

Reduce WHT complexity. · 95339d68

Timothy B. Terriberry authored 11 years ago

Saves 1 add, 3 shifts (and a shift bias) per 1-D transform.

Change-Id: I1104bb1679fe342b2f9677df8a9cdc0cb9699e7d

95339d68

16 Apr, 2013 - 2 commits

Faster vp9_short_fdct4x4 and vp9_short_fdct8x4. · 5b6d33f9

Christian Duvivier authored 12 years ago

Scalar path is about 1.3x faster (2.1% overall encoder speedup).
SSE2 path is about 5.0x faster (8.4% overall encoder speedup).

Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda

5b6d33f9

Faster vp9_short_fdct4x4 and vp9_short_fdct8x4. · f13b69d0

Christian Duvivier authored 12 years ago

Scalar path is about 1.3x faster (2.1% overall encoder speedup).
SSE2 path is about 5.0x faster (8.4% overall encoder speedup).

Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda

f13b69d0

15 Mar, 2013 - 1 commit

Faster vp9_short_fdct16x16. · 4418b790

Christian Duvivier authored 12 years ago

Scalar path is about 1.5x faster (3.1% overall encoder speedup).
SSE2 path is about 7.2x faster (7.8% overall encoder speedup).

Change-Id: I06da5ad0cdae2488431eabf002b0d898d66d8289

4418b790

13 Mar, 2013 - 1 commit

removed reference to "LLM" and "x8" · 00555263

Yaowu Xu authored 12 years ago

The commit changed the name of files and function to remove obselete
reference to LLM and x8.

Change-Id: I973b20fc1a55149ed68b5408b3874768e6f88516

00555263

28 Feb, 2013 - 1 commit

Faster vp9_short_fdct8x8. · c129203f

Christian Duvivier authored 12 years ago

Scalar path is about 1.4x faster (4% overall encoder speedup).
SSE2 path is about 7x faster (13% overall encoder speedup).

Change-Id: I7e85d8225a914a74c61ea370210414696560094d

c129203f

27 Feb, 2013 - 1 commit

Code cleanup. · 347f3a0a

Dmitry Kovalev authored 12 years ago

Fixing code style, using array lookup instead of switch statements for
forward hybrid transforms (in the same way as for their inverses).
Consistent usage of ROUND_POWER_OF_TWO macro in appropriate places.

Change-Id: I0d3822ae11f928905fdbfbe4158f91d97c71015f

347f3a0a

26 Feb, 2013 - 2 commits

Improve 32x32 forward dct · 66d94ac1

Yaowu Xu authored 12 years ago

The commit improves the 32x32 forward dct implementation:
1. change to use same constants and rounding as other forward dcts
2. select rounding to specifically minimize the roundtrip error, which
improved average 19/block to .77/block using 100000 random input.

Test showed a small but consistent gain on all test sets, about .15%

Change-Id: If0afd6a71880a522f60c1c234be0462092c2eb53

66d94ac1

Changing pitch value meaning for fht and iht transforms. · 9bf3f751

Dmitry Kovalev authored 12 years ago

Pitch now means the number of elements, not the number of bytes.

Change-Id: Idb9f2f012e39b09d596a3cc1802305a80b7c13af

9bf3f751

25 Feb, 2013 - 3 commits

Improving the forward 16x16 ADST/DCT accuracy · 65821d66

Jingning Han authored 12 years ago

Increase the first stage dynamic range by 4 times, and reduce it
back with proper rounding before applying the second stage. Hence
it still fits in the given dynamic range and slightly improves
the key frame coding performance.

Change-Id: Ia4c5907446f20a95dc3de079c314b3ad1221d8aa

65821d66

clean up forward and inverse hybrid transform · 77a3becf

Jingning Han authored 12 years ago

Rebased.

Remove the old matrix multiplication transform computation. The 16x16
ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16
300/0 in vp9/common/vp9_blockd.h.

Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f

77a3becf

optimize forward 16x16 DCT for accuracy · 499fe05d

Yaowu Xu authored 12 years ago

This commit added pre/post scaling for first half of fDCT16x16 to
reduce error, by simulation of 100,000 blocks for random inputs,
the average sse reduced from 2.1/block to 0.0498/block.

also enabled tests for 16x16 fDCT and iDCT

Change-Id: Id2a95f0464c6dd4118797d456237ae90274c0f02

499fe05d

23 Feb, 2013 - 1 commit

optimize 8x8 fdct rounding for accuracy · 22012ee9

Yaowu Xu authored 12 years ago

The commit added a final rounding choice for 8x8 forward dct to get
rid of a sign bias at DC position and improve the accuracry in term
of round trip error for 8x8 fDCT/iDCT.

This commit also enabled forward 8x8 dct test.

Change-Id: Ib67f99b0a24d513e230c7812bc04569d472fdc50

22012ee9

22 Feb, 2013 - 1 commit

Forward butterfly hybrid transform · babbd5d1

Jingning Han authored 12 years ago

This patch includes 4x4, 8x8, and 16x16 forward butterfly ADST/DCT
hybrid transform. The kernel of 4x4 ADST is sin((2k+1)*(n+1)/(2N+1)).
The kernel of 8x8/16x16 ADST is of the form sin((2k+1)*(2n+1)/4N).

Change-Id: I8f1ab3843ce32eb287ab766f92e0611e1c5cb4c1

babbd5d1

20 Feb, 2013 - 1 commit
- Merge lossless experiment · d262e26c
  Yaowu Xu authored 12 years ago
```
Change-Id: I7b7b8d4fda3a23699e0c920d727f8c15d37d43aa
```
  d262e26c
15 Feb, 2013 - 1 commit
- Remove some Y2-related code. · 46dff5d2
  Ronald S. Bultje authored 12 years ago
```
Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78
```
  46dff5d2
14 Feb, 2013 - 1 commit

Rewrote fdct16x16 · 048b9d41

Yunqing Wang authored 12 years ago

Used same algorithm as others.

Change-Id: Ifdac560762aec9735cb4bb6f1dbf549e415c38a0

048b9d41

13 Feb, 2013 - 1 commit

Removal of Hybrid DWT/DCT experiment. · 649be94c

Paul Wilkins authored 12 years ago

Removal of experiment to simplify code base for other
changes.

Change-Id: If0a33952504558511926ad212bc311fc2bffb19a

649be94c

12 Feb, 2013 - 1 commit

Rewrote fdct8x8 · aa295918

Yunqing Wang authored 12 years ago

Use consistent algorithm.

Change-Id: Ib8484821ebc454b9d3380a3d6571798decd037f3

aa295918

08 Feb, 2013 - 1 commit

Integerization of dct32x32 · dbccffe2

Yunqing Wang authored 12 years ago

Test on derf set showed 0.047% overall psnr change.

Change-Id: Id16c276c251a3943850ac9b95e9b09a56cf42b19

dbccffe2

07 Feb, 2013 - 3 commits

move dct/idct constants to a header file · e6ad9ab0

Yaowu Xu authored 12 years ago

also removed some un-unsed functions.

Change-Id: Ie363bcc8d94441d054137d2ef7c4fe59f56027e5

e6ad9ab0

Butterfly ADST based hybrid transform · d15e1da4

Jingning Han authored 12 years ago

Refactor the 8x8 inverse hybrid transform. It is now consistent
with the new inverse DCT. Overall performance loss (due to the
use of this variant ADST, and the rounding errors in the butterfly
implementation) for std-hd is -0.02.

Fixed BUILD warning.

Devise a variant of the original ADST, which allows butterfly
computation structure. This new transform has kernel of the
form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures
using floating-point multiplications was reported in Z. Wang,
"Fast algorithms for the discrete W transform and for the discrete
Fourier transform", IEEE Trans. on ASSP, 1984.

This patch includes the butterfly implementation of the inverse
ADST/DCT hybrid transform of dimension 8x8.

Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d

d15e1da4

Use configure checks for various inline keywords. · aac73df1
Ronald S. Bultje authored 12 years ago
```
Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24
```
aac73df1

05 Feb, 2013 - 1 commit

rewrite 4x4 idct and fdct · fa36981e

Yaowu Xu authored 12 years ago

This commit changes the 4x4 iDCT to use same algorithm & constants as
other iDCTs. The 4x4 fDCT is also changed to be based on the new iDCT.

Change-Id: Ib1a902693228af903862e1f5a08078c36f2089b0

fa36981e

31 Jan, 2013 - 1 commit

fix a small bug in 16 point forward dct · ab1cad9b

Yaowu Xu authored 12 years ago

The commit fixes a minor error in 16 point fdct where in a rotation can
produce result of -1 instead of 0.

Change-Id: I45aac4a52bcd06225c6d04e643547a13e1c1aade

ab1cad9b

14 Jan, 2013 - 2 commits

Fix compiler warnings · 113005b1

Yaowu Xu authored 12 years ago

The warnings caused verify failure with gerrit for several  commits

Change-Id: I030df8638bd69b8783a3ac58e720ff9f0bfd546c

113005b1

Fix unused variable warnings · 76ac5b39

John Koleszar authored 12 years ago

Previous commit does not build cleanly on Jenkins with the DWT/DCT
hybrid experiment enabled (--enable-dwtdcthybrid).

Change-Id: Ia67e8f59d17ef2d5200ec6b90dfe6711ed6835a5

76ac5b39

13 Jan, 2013 - 1 commit

Further enhancements/fixes on dct/dwt hybrid txfm · 516db21c

Deb Mukherjee authored 12 years ago

Fixes some scaling issues. Adds an option to only compute the
dct on the low-low subband for 32x32 and 64x64 blocks using
only a single 16x16 dct after 1 and 2 wavelet decomposition
levels respectively. Also adds an option to use a 8x8 dct
as building block.

Currenlty with the 2/6 filter and with a single 16x16 dct on
the low low band, the reuslts compared to full 32x32 dct is
as follows:
derf: -0.15%
yt: -0.29%
std-hd: -0.18%
hd: -0.6%
These are my current recommended settings, since the 2/6 filter
is very simple.

Results with 8x8 dct are about 0.3% worse.

Change-Id: I00100cdc96e32deced591985785ef0d06f325e44

516db21c

10 Jan, 2013 - 1 commit
- Merge tx32x32 experiment. · aa2effa9
  Ronald S. Bultje authored 12 years ago
```
Change-Id: I615651e4c7b09e576a341ad425cf80c393637833
```
  aa2effa9
08 Jan, 2013 - 1 commit

Adds 64x64 hybrid dct/dwt transform · 4b7304ee

Deb Mukherjee authored 12 years ago

This is to add to the 64x64 transform experiment as an alternative to
a 64x64 DCT.
Two levels of wavelet decomposition is used on a 64x64 block, followed
by 16x16 DCT on the four lowest subbands. The highest three subbands
are left untransformed after the first level DWT.

Change-Id: I3d48d5800468d655191933894df6b46e15adca56

4b7304ee

26 Dec, 2012 - 1 commit

Build fixes to merge vp9-preview into master · 5ebe94f9

John Koleszar authored 12 years ago

Various fixups to resolve issues when building vp9-preview under the more stringent
checks placed on the experimental branch.

Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07

5ebe94f9

13 Dec, 2012 - 1 commit

Further improvements on the hybrid dwt/dct expt · 210dc5b2

Deb Mukherjee authored 12 years ago

Modifies the scanning pattern and uses a floating point 16x16
dct implementation for now to handle scaling better.
Also experiments are in progress with 2/6 and 9/7 wavelets.

Results have improved to within ~0.25% of 32x32 dct for std-hd
and about 0.03% for derf. This difference can probably be bridged by
re-optimizing the entropy stats for these transforms. Currently
the stats used are common between 32x32 dct and dwt/dct.

Experiments are in progress with various scan pattern - wavelet
combinations.

Ideally the subbands should be tokenized separately, and an
experiment will be condcuted next on that.

Change-Id: Ia9cbfc2d63cb7a47e562b2cd9341caf962bcc110

210dc5b2

07 Dec, 2012 - 1 commit

32x32 transform for superblocks. · c456b35f

Ronald S. Bultje authored 12 years ago

This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds
code all over the place to wrap that in the bitstream/encoder/decoder/RD.

Some implementation notes (these probably need careful review):
- token range is extended by 1 bit, since the value range out of this
  transform is [-16384,16383].
- the coefficients coming out of the FDCT are manually scaled back by
  1 bit, or else they won't fit in int16_t (they are 17 bits). Because
  of this, the RD error scoring does not right-shift the MSE score by
  two (unlike for 4x4/8x8/16x16).
- to compensate for this loss in precision, the quantizer is halved
  also. This is currently a little hacky.
- FDCT and IDCT is double-only right now. Needs a fixed-point impl.
- There are no default probabilities for the 32x32 transform yet; I'm
  simply using the 16x16 luma ones. A future commit will add newly
  generated probabilities for all transforms.
- No ADST version. I don't think we'll add one for this level; if an
  ADST is desired, transform-size selection can scale back to 16x16
  or lower, and use an ADST at that level.

Additional notes specific to Debargha's DWT/DCT hybrid:
- coefficient scale is different for the top/left 16x16 (DCT-over-DWT)
  block than for the rest (DWT pixel differences) of the block. Therefore,
  RD error scoring isn't easily scalable between coefficient and pixel
  domain. Thus, unfortunately, we need to compute the RD distortion in
  the pixel domain until we figure out how to scale these appropriately.

Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b

c456b35f

27 Nov, 2012 - 1 commit

Add vp9_ prefix to all vp9 files · fcccbcbb

John Koleszar authored 12 years ago

Support for gyp which doesn't support multiple objects in the same
static library having the same basename.

Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc

fcccbcbb

25 Nov, 2012 - 1 commit

removed the idct rtcd idct calls · 510557e2

Jim Bankoski authored 12 years ago

More cleanup to do after this,  but this is a good chunk of removing rtcd.

Change-Id: I551db75e341a0a85c3ad650df1e9a60dc305681a

510557e2