Commits · 45e49e6e197b236e1fef4c51c3d28da0d6d421b8 · BC / public / external / libvpx

20 Sep, 2011 - 3 commits

Move neon only arm functions under arm/neon. · bd0c3409

Fritz Koenig authored 13 years ago

These files don't contain generic arm code, so should
only be compiled by neon.

Change-Id: Ie712823aa04d4235e7cfe7a3b725e73ee4c3e564

bd0c3409

NEON FDCT updated to match current C code · 0c2529a8

Tero Rintaluoma authored 13 years ago

- Removed fast_fdct4x4_neon and fast_fdct8x4_neon
- Uses now short_fdct4x4 and short_fdct8x4
- Gives ~1-2% speed-up on Cortex-A8/A9

Change-Id: Ib62f2cb2080ae719f8fa1d518a3a5e71278a41ec

0c2529a8

Fixed armv5te multiplications · 3c19bc3f

Tero Rintaluoma authored 13 years ago

Rd and Rm registers should be different in 'mul'. This register
combination results in unpredictable behaviour. GCC will give
a warning and RVCT an error in this case.

Restriction applies only to armv5 targets and not for armv6 and above.

Change-Id: I378d17c51e1f16a6820814fbed43e115aaabb03e

3c19bc3f

19 Sep, 2011 - 2 commits

Updated ARMv6 forward transforms to match C · 4c3ad66b

Tero Rintaluoma authored 13 years ago

- Updated walsh transform to match C
  (based on Change Id24f3392)
- Changed fast_fdct4x4 and 8x4 to short_fdct4x4 and 8x4
  correspondingly

Change-Id: I704e862f40e315b0a79997633c7bd9c347166a8e

4c3ad66b

NEON walsh transform updated to match C · 2a4b2a00

Tero Rintaluoma authored 13 years ago

Modified original patch If2f07220885c4c3a0cae0dace34ea0e36124f001
according to comments. Scheduled code a little bit to prevent some
interlocks.

Change-Id: I338f02b881098782f82af63d97f042b85e63e902

2a4b2a00

16 Sep, 2011 - 3 commits

enable selecting&transmitting to for intra mode entropy · 1d44e7ce

Yaowu Xu authored 13 years ago

This commit added a 3 bit index to the bitstream, the index is used to
look into the intra mode coding entropy context table. The commit uses
the mode stats to calculate the cost of transmitting modes using 8
possible entropy distributions, and selects the distribution that
provides the lowest cost to do the actual mode coding.

Initial test show this provides additional .2%~.3% gain over quantizer
adaptive intra mode coding. So the adaptive intra mode coding provides
a total of .5%(psnr) to .6% gain(ssim) combined for all-key-encoding

To build and test, configure with
--enable-experimental --enable-qimode

Change-Id: I7c41cd8bfb352bc1fe7c5da1848a58faea5ed74a

1d44e7ce

add quantizer adaptive intra mb mode encoding · aac2c126

Yaowu Xu authored 13 years ago

make intra mode coding entropy distribution adaptive to baseQindex, an
encoding test on hd clips with all key frame shows universal gain on
all clips in both .2%(psnr) and (ssim).3%.

To build and test, configure with
--enable-experimental --enable-qimode

Change-Id: Iaa69241b984d4fdd8baa6d77ee78c0140f5ac00a

aac2c126

add 8x8 intra prediction modes · ca6b85aa

Yaowu Xu authored 13 years ago

Patch 1 to Patch 3 is an initial implementation of 8x8 intra prediction
modes, here are with the following assumptions:
a. 8x8 has 4 prediction modes DC, H, V and TM
b. UV 4x4 block use the same mode as corresponding 8x8 area
c. i8x8 modes are enabled for key frame only for now
Patch 4:
d. removed debug code from previous patches
Patch 5:
e. added stats code to collect entropy stats and further cleaned up
Patch 6:
f. changed mode stats code to collect finer stats of modes
Patch 7:
g. normalized i8x8 modes distribution to total at 256 (8bits).
Patch 8:
h. fixed a bug in decoder and removed debug printf output.
Patch 9:
i. more cleanups to address paul's comment
Patch 10:
j. messy rebase/merges to bring the commit up to date.

Tests on HD clips encoded with all key frame showing consistent gain
on all clips and all metrics:~0.5%(psnr) and 0.6%(ssim):
http://www.corp.google.com/~yaowu/no_crawl/i8x8hd_allkey_fixedq.html

To build and test, configure with:
--enable-experimental --enable-i8x8

Change-Id: I9813fe07ae48cab5fdb5d904bca022514ad01e7f

ca6b85aa

15 Sep, 2011 - 1 commit

Segment Feature Signaling · ceb51742

Paul Wilkins authored 13 years ago

Plumbing for tuning new segment features on and off.

Change-Id: If86cd6f103296b73030e8af7cf85c5b9bbffdbaf

ceb51742

13 Sep, 2011 - 4 commits

Reverse coding order for segment features: · 1741cc7a

Paul Wilkins authored 13 years ago

Code all the features for one segment (grouped together)
then all for the next etc. etc. rather than grouping the
data by feature.

Change-Id: I2a65193b3a70aca78f92e855e35d8969d857b6dd

1741cc7a

Fixed encoder crash · 5bc7b3a6

Scott LaVarnway authored 13 years ago

caused by the "Removed bmi copy to/from BLOCKD" commit.

Change-Id: I9fae71bdc34c8ecc07bb81cd3ccf498b91ce3ec7

5bc7b3a6

Change to segment_feature_data[][] structure. · 1c24442a

Paul Wilkins authored 13 years ago

This data structure is  now [Segment ID][Features]
rather than [Features][Segment_ID]

I propose as a separate modification to make the experimental
bit stream reflect this such that all the features for a segment
are coded together.

Change-Id: I581e4e3ca2033bdbdef3d9300977a8202f55b4fb

1c24442a

Segment Features: · dfbc61f3

Paul Wilkins authored 13 years ago

Some basic plumbing added for a range of segment level features.
MB_LVL_* changed to SEG_LVL_* to better reflect meaning.

Change-Id: Iac96da36990aa0e40afc0d86e990df337fd0c50b

dfbc61f3

31 Aug, 2011 - 1 commit
- Skip computation of distortion in vp8_pick_inter_mode if active_map is used · 0e05f2c6
  Alpha Lam authored 13 years ago
```
If a block is marked to be inactive then set distortion to 0.

Change-Id: Ib415f19642a2ff7b5cf5cfaedd60ebbd79732272
```
  0e05f2c6
30 Aug, 2011 - 1 commit

Recalculate zbin_extra only if regular quantizer is being used · bc9293b8

Alpha Lam authored 13 years ago

vp8_update_zbin_extra() is called all the time even though the fast
quantizer doesn't use it. Skip this call if fast quantizer is used.

Change-Id: Ia711c38431930cc2486cf59b8466060ef0e9d9db

bc9293b8

25 Aug, 2011 - 1 commit

Minor modification on key frame decision · 1f20202e

Yunqing Wang authored 13 years ago

This change makes sure that no key frame recoding in real-time mode
even if CONFIG_REALTIME_ONLY is not configured.

Change-Id: Ifc34141f3217a6bb63cc087d78b111fadb35eec2

1f20202e

24 Aug, 2011 - 2 commits

Quiet warning by removing unused variable. · 4797a972

Fritz Koenig authored 13 years ago

fwd_boost_score was not being computed or
referenced, so remove declaration.

Change-Id: Iece36cde1ec113e3c6afaff1407d24cdf12bd0a8

4797a972

Removed bmi copy to/from BLOCKD · b870947d

Scott LaVarnway authored 13 years ago

for SPLITMV and B_PRED modes.  Modified code to use the bmi
found in mode_info_context instead of BLOCKD.  On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock.  This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)

Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7

b870947d

23 Aug, 2011 - 1 commit

Use local labels for jumps/loops in x86 assembly. · c5f890af

Fritz Koenig authored 13 years ago

Prepend . to local labels in assembly code.  This
allows non unique labels within a file.  Also
makes profiling information more informative
by keeping the function name with the loop name.

Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f

c5f890af

22 Aug, 2011 - 2 commits

Reclassify optimized ssim calculations as SSE2. · 694d4e77

Fritz Koenig authored 13 years ago

Calculations were incorrectly classified as either
SSE3 or SSSE3.  Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.

Change-Id: I48ad0218af0cc51c5078070a08511dee43ecfe09

694d4e77

Revert "Reclasify optimized ssim calculations as SSE2." · 734b1b20
Fritz Koenig authored 13 years ago
```
This reverts commit 01376858
```
734b1b20

19 Aug, 2011 - 2 commits

Reclasify optimized ssim calculations as SSE2. · 01376858

Fritz Koenig authored 13 years ago

Calculations were incorrectly classified as either
SSE3 or SSSE3.  Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.

Change-Id: I29f5c2ead342b2086a468029c15e2c1d948b5d97

01376858

Copy less when active map is in use · 4e8d35a4

Alpha Lam authored 13 years ago

When active map is specified and the current frame is not a key frame,
golden frame nor a altref frame then copy only those active regions.

This significantly reduces encoding time by as much as 19% on the test
system where realtime encoding is used. This is particularly useful
when the frame size is large (e.g. 2560x1600) and there's only a few
action macroblocks.

Change-Id: If394a813ec2df5a0201745d1348dbde4278f7ad4

4e8d35a4

17 Aug, 2011 - 1 commit

Small boost to every other frame. · 744f4823

Paul Wilkins authored 13 years ago

Instead of a single mid GF boost apply a few extra bits to
every other frame. This gives a very small average metrics
improvement on both derf and YT sets.

Also use min GF interval as min KF interval.

Change-Id: Iee238b8cae0ffaed850a5a944ac825cee18da485

744f4823

16 Aug, 2011 - 1 commit

Faster vp8_default_coef_probs · 19987dcb

Scott LaVarnway authored 13 years ago

Copies from a generated table instead of building the
default coeff probabilities during runtime.

Change-Id: I4d9551ea3a2d7d4a4f7ce9eda006495221a8de50

19987dcb

12 Aug, 2011 - 1 commit

Revert "Improved 1-pass CBR rate control" · e9613170

John Koleszar authored 13 years ago

This reverts commit b5ea2fbc. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.

Conflicts:

	vp8/encoder/onyx_if.c
	vp8/encoder/ratectrl.c

Change-Id: I159f53d1bf0e24c035754ab3ded8ccfd58fd04af

e9613170

03 Aug, 2011 - 2 commits

Fix source buffer selection · 238dae86

John Koleszar authored 13 years ago

This patch fixes a bug in the interaction between the recode loop and
spatial resampling. If the codec was in a spatial resampling state,
and a subsequent iteration of the recode loop disables resampling,
then the source buffer must be reset to the unscaled source.

Change-Id: I4e4cd47b943f6cd26a47449dc7f4255b38e27c77

238dae86

Adjust half-pixel only search · b9f19f89

Yunqing Wang authored 13 years ago

Changed motion search in vp8_find_best_half_pixel_step() to be the
same as in vp8_find_best_sub_pixel_step(), which checks 5 points
instead of 8 points. This only affects real-time mode with
cpu-used >=9. Tests showed it gives 2% encoding speedup with
a quality loss(psnr) of up to 0.5%.

Change-Id: I16049cad1535002346d46cfdfad345bfc3dc5146

b9f19f89

01 Aug, 2011 - 1 commit
- Fix building with --disable-postproc · 06c3d5bb
  John Koleszar authored 13 years ago
```
Change-Id: I7e6bc28e7974a376da747300744e0dd5dc1d21e9
```
  06c3d5bb
29 Jul, 2011 - 1 commit

Correctly track sharpness in vp8cx_pick_filter_level_fast · 1f71d2e2

John Koleszar authored 13 years ago

Make sure to update last_sharpness_level from the current
sharpness_level whenever it changes.

Change-Id: I0258d2f5b11a407abf6176a8d4c4994d925943f0

1f71d2e2

27 Jul, 2011 - 2 commits

Preload reference area in sub-pixel motion search (real-time mode) · 2f2302f8

Yunqing Wang authored 13 years ago

This change implemented same idea in change "Preload reference area
to an intermediate buffer in sub-pixel motion search." The changes
were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
_pixel_step() functions which are called when speed >= 5. Test
result (using tulip clip):

1. On Core2 Quad machine(Linux)
rt mode, speed (-5 ~ -8), encoding speed gain: 2% ~ 3%
rt mode, speed (-9 ~ -11), encoding speed gain: 1% ~ 2%
rt mode, speed (-12 ~ -14), no noticeable encoding speed gain

2. On Xeon machine(Linux)
Test on speed (-5 ~ -14) didn't show noticeable speed change.

Change-Id: I21bec2d6e7fbe541fcc0f4c0366bbdf3e2076aa2

2f2302f8

Fix range checks in motion search · bde2afbe

Yunqing Wang authored 13 years ago

There were some situations that the start motion vectors were
out of range. This fix adjusted range checks to make sure they
are checked and clamped.

Change-Id: Ife83b7fed0882bba6d1fa559b6e63c054fd5065d

bde2afbe

26 Jul, 2011 - 1 commit

cosmetics: consistently use [u]int64_t · b45065d3

James Zern authored 13 years ago

Removes mixed usage of (unsigned) long long and INT64.
Fixes Issue #208.

Change-Id: I220d3ed5ce4bb1280cd38bb3715f208ce23cf83a

b45065d3

25 Jul, 2011 - 1 commit

Specify size for argument pushed to stack · fe270dd5

Yunqing Wang authored 13 years ago

The change fixes building error on Win64.

Change-Id: I63d25b26220c4da8a98ca2e36530cbb802468e6b

fe270dd5

22 Jul, 2011 - 2 commits

fix sharpness bug and clean up · a04ed0e8

Johann authored 13 years ago

sharpness was not recalculated in vp8cx_pick_filter_level_fast

remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.

always use cm->sharpness_level. the extra indirection was annoying.

don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init

move function declarations to their proper header

Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db

a04ed0e8

Preload reference area to an intermediate buffer in sub-pixel motion search · 20bd1446

Yunqing Wang authored 13 years ago

In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
  3.4%   at --rt --cpu-used =-4
  2.8%   at --rt --cpu-used =-3
  2.3%   at --rt --cpu-used =-2
  2.2%   at --rt --cpu-used =-1

Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.

Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.

Make this change exclusively for x86 platforms.

Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f

20bd1446

21 Jul, 2011 - 1 commit

fix more merge issues · 8c31484e

Yaowu Xu authored 13 years ago

With this fix, the experimental branch now builds and encodes correctly
with the following two configure options respectively:
--enable-experimental --enable-t8x8
--enable-experimental

Change-Id: I3147c33c503fe713a85fd371e4f1a974805778bf

8c31484e

20 Jul, 2011 - 3 commits

fixed a number of problems caused by auto merges · 1c24eb2b

Yaowu Xu authored 13 years ago

The auto merge process pull and merge commits from public git or master
branch. These automerges while worked well most time, but has created
a few problems. This commit fixed several issues existed long before
the latest 8x8 transform commit.

Change-Id: I895ca99713231b1aec521d57db5d9839f74aacfa

1c24eb2b

Increase chrow row alignment to 16 bytes. · 7d1b37cd

Timothy B. Terriberry authored 13 years ago

This is done by expanding luma row to 32-byte alignment, since
 there is currently a bunch of code that assumes that
 uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c,
 common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm,
 encoder/temporal_filter.c, and possibly others; I haven't done a
 full audit).
It also uses replaces the hardcoded border of 16 in a number of
 encoder buffers with VP8BORDERINPIXELS (currently 32), as the
 chroma rows start at an offset of border/2.
Together, these two changes have the nice advantage that simply
 dumping the frame memory as a contiguous blob produces a valid,
 if padded, image.

Change-Id: Iaf5ea722ae5c82d5daa50f6e2dade9de753f1003

7d1b37cd

Add 8x8 transform to experimental branch · 08f64718

Deb Mukherjee authored 14 years ago

Please refer to previous commit messages for detailed info:
https://on2-git.corp.google.com/g/#change,5940
https://on2-git.corp.google.com/g/#change,6045

Change-Id: I8b16992f2f69c5a808ad40a3e32ef589cce7c59d

08f64718