Commits · 82d504b50f5dbc81ba1e1e1c1b07bb76dddde43f · BC / public / external / libvpx

25 Jun, 2013 - 2 commits

Use aligned buffer operations in 8x8/16x16 2D-DCT · 82d504b5

Jingning Han authored 11 years ago

This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles.

Change-Id: I137758b81cd127b936175284310e81378db64552

82d504b5

Enable sse2 implmentation of 8x8 ADST/DCT · a32a086d

Jingning Han authored 11 years ago

This commit makes use of the butterfly structure to enable the sse2
version implementation of 8x8 ADST/DCT hybrid transform coding.

The runtime of hybrid transform module goes down from 1170 cycles
to 245 cycles. Overall speed-up around 1.5%.

Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f

a32a086d

21 Jun, 2013 - 5 commits
- Merge "Get some speed back for cpuused 1" · 869d7706
  Yaowu Xu authored 11 years ago
  
  869d7706
- Get some speed back for cpuused 1 · 45e25a78
  Yaowu Xu authored 11 years ago
```
and remove unused code.

Change-Id: If380440c4450294b5450b7a9eeb94a376846ec01
```
  45e25a78
- Merge "rename variables to avoid build error in MSVC" · 61721181
  Yaowu Xu authored 11 years ago
  
  61721181
- rename variables to avoid build error in MSVC · ee07a261
  Yaowu Xu authored 11 years ago
```
Change-Id: I7960178c95c54d5c4497e44cfc8c493566294b34
```
  ee07a261
- Merge "Implement sse2 and ssse3 versions for all sub_pixel_variance sizes." · e6cd5ed3
  Yaowu Xu authored 11 years ago
  
  e6cd5ed3
20 Jun, 2013 - 26 commits

Merge "clean out libvpx-srcs.txt if built" · 84490a1f
Jim Bankoski authored 11 years ago

84490a1f
clean out libvpx-srcs.txt if built · 975df8c7
Jim Bankoski authored 11 years ago
```
Change-Id: Idfd69e66e8982275eb00d8007a55efd1a4f86a98
```
975df8c7
Merge "Revert "test_libvpx: disable pthreads in gtest"" · 43d04ef9
James Zern authored 11 years ago

43d04ef9
Fix win64 warning. · c259af4f
Frank Galligan authored 11 years ago
```
- size_t vs int.

Change-Id: Ib47ebd932a4b69db9f52a43000bb69d0a96b9134
```
c259af4f

Revert "test_libvpx: disable pthreads in gtest" · f2dc3825

James Zern authored 11 years ago

This reverts commit 90a9900a

Seems to break the Mac build:
src/include/gtest/internal/gtest-port.h:1208:: pthread_mutex_lock(&mutex_)failed with error 22
Abort trap: 6

Change-Id: Icbe31161d7c27f1b0a28d33409e7712430bbf0ae

f2dc3825

Merge "Add unit tests for 4x4 ADST" · 4f4713b4
Jingning Han authored 11 years ago

4f4713b4
Merge "Cast value to avoid size_t/int warning on win64" · 0373e517
Johann authored 11 years ago

0373e517
Merge "Renaming 'nmv' to 'mv' for several functions." · 8283d893
Dmitry Kovalev authored 11 years ago

8283d893
Merge "Function decomposition inside vp9_decodemv.c file." · 77186ee6
Dmitry Kovalev authored 11 years ago

77186ee6

Improving model rd with variance and quant step · 7947a33d

Deb Mukherjee authored 11 years ago

Improves the rd modeling function and implements them using interpolation
from a table which is a little faster. Also uses sse as input to the
modeling function rather than var - since there is no dc prediction
used and as a result the sse works a little better.

derfraw300: +0.05%
Speedup: ~1%

Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff

7947a33d

Cast value to avoid size_t/int warning on win64 · d94aee68

Johann authored 11 years ago

dboolhuff.c(50) : warning C4267: 'initializing' : conversion from
'size_t' to 'int'

Change-Id: I6b85759efb2fa19f362f406623d8a7583a55c036

d94aee68

adds force partitioning greater than or less than block size · 9f2a1ae2

Jim Bankoski authored 11 years ago

adds a new speed feature to force partitioning to be greater than
or less than a certain size

Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0

9f2a1ae2

adds a set partitioning to speed features · 18bdf708

Jim Bankoski authored 11 years ago

this feature lets you set a partitioning size to be used by the entire
frame.

Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06

18bdf708

partition by variance using var from last frame · 476d73d2

Jim Bankoski authored 11 years ago

This uses variance to split partition. Variance is calculated using
nearest mv,  always from last ref frame.

Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896

476d73d2

convert all speed things to speed features · 1f94b976
Jim Bankoski authored 11 years ago
```
Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a
```
1f94b976
new partition via variance · 727fa7b1
Jim Bankoski authored 11 years ago
```
Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427
```
727fa7b1

fix to set up new speed feature · 0fad6a9d

Jim Bankoski authored 11 years ago

This uses the speed feature functionality for code.

Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8

0fad6a9d

don't copy partitions for key frames or altrefs · df2314cf

Jim Bankoski authored 11 years ago

force us to go through slow partitioning for keyframes, altref and
overlays.

Change-Id: I1a286361bf74083e71973575a7296be46eb98742

df2314cf

Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. · 8fb6c581

Ronald S. Bultje authored 11 years ago

Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 ->
3min58). Specific changes to timings for each function compared to
original assembly-optimized versions (or just new version timings if
no previous assembly-optimized version was available):

sse2   4x4:    99 ->   82 cycles
sse2   4x8:           128 cycles
sse2   8x4:           121 cycles
sse2   8x8:   149 ->  129 cycles
sse2   8x16:  235 ->  245 cycles (?)
sse2  16x8:   269 ->  203 cycles
sse2  16x16:  441 ->  349 cycles
sse2  16x32:          641 cycles
sse2  32x16:          643 cycles
sse2  32x32: 1733 -> 1154 cycles
sse2  32x64:         2247 cycles
sse2  64x32:         2323 cycles
sse2  64x64: 6984 -> 4442 cycles

ssse3  4x4:           100 cycles (?)
ssse3  4x8:           103 cycles
ssse3  8x4:            71 cycles
ssse3  8x8:           147 cycles
ssse3  8x16:          158 cycles
ssse3 16x8:   188 ->  162 cycles
ssse3 16x16:  316 ->  273 cycles
ssse3 16x32:          535 cycles
ssse3 32x16:          564 cycles
ssse3 32x32:          973 cycles
ssse3 32x64:         1930 cycles
ssse3 64x32:         1922 cycles
ssse3 64x64:         3760 cycles

Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d

8fb6c581

disable speed > 1 speed corrections in firstpass · f954490b
Jim Bankoski authored 11 years ago
```
need to rework these

Change-Id: I17dc2c88d2faadd2f8fb117c52c25f04ea2e9856
```
f954490b

new debug modes code · 2c6bdbbc

Jim Bankoski authored 11 years ago

The new print out includes skips and has prefixed sections so you can
grep to find things like transforms chosen on each frame.

Change-Id: I195043424647d9514cfc3ff6720a5b20d010fa1b

2c6bdbbc

Merge "copy partitioning from last fame" · fbcce4dd
Jim Bankoski authored 11 years ago

fbcce4dd
copy partitioning from last fame · f033b44e
Jim Bankoski authored 11 years ago
```
Change-Id: I26e80ede80cb4389378a95afa95d229092a9859a
```
f033b44e

Add unit tests for 4x4 ADST · 362809df

Jingning Han authored 11 years ago

Enable sign bias check and round-trip error unit tests for 4x4 hybrid
transform modules.

Change-Id: Icd3d839f098d4b92b00ff76eac146765b039d0d3

362809df

Merge "test_libvpx: disable pthreads in gtest" · db938c29
John Koleszar authored 11 years ago

db938c29

Removed a number of unnecessary check on ref_frame · 6e3b34bd

Yaowu Xu authored 11 years ago

Since intra block decoding is handled by decode_sb_intra() separately.

Change-Id: I42d757884714084c92fc23ec5d35d4dc946f4b15

6e3b34bd

19 Jun, 2013 - 6 commits

Function decomposition inside vp9_decodemv.c file. · 15eaba10
Dmitry Kovalev authored 11 years ago
```
Change-Id: Iab96e6a50aec543c63e15cd134f9d5f01ca7ceff
```
15eaba10

test_libvpx: disable pthreads in gtest · 90a9900a

James Zern authored 11 years ago

currently threading is internal to libvpx so thread safety is unneeded
in libgtest -- visual studio builds already operate in this way as they
do not have pthread.h available by default.

this removes an unconditional link to libpthread using $(extralibs)
should libvpx require it.

Change-Id: Ieae1d693406653a54b54fba818c598836797d33b

90a9900a

Merge "Add two-pass quantization" · 36568357
Yunqing Wang authored 11 years ago

36568357

Add two-pass quantization · b5bf7b13

Yunqing Wang authored 11 years ago

Optimized the quantization function by making it a two-pass
process. The first pass does a quick checking of the transform
coefficients against the base ZBIN, and only keep the good
enough set of coefficients for quantization. A skipping
check is added. If all coefficients are within the base ZBIN, no
quantization is needed. The second pass is the actual quantization
pass, which only processes the coefficient subset determined
in first pass. This reduces the computation. Furthermore, an
alternitive method is used for large transform size, which often
has sparse nonzero quantized coefficients.

Overall, the encoder speedup is about 4%. The quantization function
itself gets 20% faster.

Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22

b5bf7b13

Remove unnecessary copying of probs. · 12180c83
Yaowu Xu authored 11 years ago
```
Change-Id: Ic924f07c6ab0c929c6cdf11880d3c625806e272c
```
12180c83
Renaming 'nmv' to 'mv' for several functions. · 87e1fa76
Dmitry Kovalev authored 11 years ago
```
Change-Id: I183a38997a9d01e4a1b869e92509f6915216fa09
```
87e1fa76

18 Jun, 2013 - 1 commit
- Merge "tests: clear system state after non-API calls" · 2319b7aa
  John Koleszar authored 11 years ago
  
  2319b7aa