Commits · 9da67da04a1e150630196e5457133365bd9c8618 · BC / public / external / libvpx

18 Jul, 2013 - 4 commits

Fix bug where we don't choose any mode in RD selection. · 247197d5

Ronald S. Bultje authored 11 years ago

This could happen during golden overlay frame coding from a previous
alt-ref frame if the special overlay code was triggered.

Change-Id: I3056d0c547cd26903b260ef93c94026e96bd9868

247197d5

Fix horz loopfilter loops · 7fd5d8e6

Frank Galligan authored 11 years ago

If count was greater than 1 the src pointer would be off on
the second loop.

Change-Id: I8e09037e68dc4ae92076a8067f7b6dacbbef8263

7fd5d8e6

Fix bug which skips zeromv even if near/nearest is not 0,0. · deb74560
Ronald S. Bultje authored 11 years ago
```
Change-Id: Id4f454831f3f11099f39c30246adeaa52857d08d
```
deb74560

Use mv_check_bounds in sub8x8 rd loop · ced3c201

Jingning Han authored 11 years ago

Make the use of mv_check_bounds consistent for mvs of both ref_frame[0]
and ref_frame[1].

Change-Id: I1ca24865cc7232ca9cbe5db566c53abad1592211

ced3c201

17 Jul, 2013 - 17 commits

Remove unnecessary calling of vp9_init_quantizer() · 3798db88

Yunqing Wang authored 11 years ago

vp9_init_quantizer() is called in vp9_create_compressor(), and
should not be called in vp9_set_speed_features().

Change-Id: Ic2f1f4b0531b9d46bb841d7e1d8da9812207dad6

3798db88

Add a best_yrd shortcut in splitmv mode search. · c6917528

Ronald S. Bultje authored 11 years ago

Encoding of first 50 frames of bus (speed 0) @ 1500kbps goes from
1min6.2 to 1min5.9, i.e. 0.5% faster overall.

Change-Id: I59d8a3b2f0a75010fa041d5e2646c8caac5bd683

c6917528

Remove unnecessary buffer copy in idct4x4. · bd6ce712
hkuang authored 11 years ago
```
Change-Id: I386066b9bcfb4bffb582e6827af36ca0181f6a83
```
bd6ce712

Skip redundant nearest/near/zero encodes in splitmv. · 161c9956

Ronald S. Bultje authored 11 years ago

Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from
1min7.3 to 1min6.2, i.e. 1.7% faster overall.

Change-Id: I19d2deacfbffadd61d32551cee9586757ab4a987

161c9956

changed mode checking order · 42facc29
Yaowu Xu authored 11 years ago
```
Change-Id: Ic4c4b363ed840935e42f495f13ea5e601a56f1b2
```
42facc29

Skip nearest/near/zero redundant encodes. · 8fea880b

Ronald S. Bultje authored 11 years ago

Encode of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min12.8
to 1min7.3, i.e. 8% faster.

Change-Id: Ia22d1c7b687316c553cc60eacae988b24e175b62

8fea880b

Enable disable_splitmv feature for other speeds · 10e83b07

Yunqing Wang authored 11 years ago

Added disable_splitmv feature at other speed levels. For speed 3 or
above, always turn it on.

Change-Id: Ibb36f0a7ef12a34b4f8d0f9cb6193eab43b34360

10e83b07

Removing experimental code from vp9_entropymv.c. · 8452c345
Dmitry Kovalev authored 11 years ago
```
Change-Id: I340d06e3bc32c78358654496503cccd4196cbe2e
```
8452c345

Best_rd breakout in rd partition search. · 9f427bfe

Ronald S. Bultje authored 11 years ago

About 15% faster for bus (speed 0) first 50 frames @ 1500kbps, which
goes from 1min36 to 1min24. Results become slightly better (+0.2% on
derf/yt, +0.4% on hd), probably because of a bugfix for skipmode in
super_block_yrd(). Overall speed change (on derfraw300) is roughly
-13%. This can probably be improved further by caching best_yrd
between partition searches. Also, we might be able to get more
speedups by always doing PARTITION_NONE before PARTITIONS_SPLIT, not
just at the sb8x8 level.

Change-Id: I83736949ebd5b4a3b400ee688d7661913fefc98b

9f427bfe

Do a skip-block check for sub8x8 partitions also. · 83c7e13a
Ronald S. Bultje authored 11 years ago
```
+0.2% SSIM and glbPSNR on derfraw300.

Change-Id: I9cba0bca55e606a22f557c7732b064f738efe84d
```
83c7e13a

Speed up motion estimation using small partitions' result(experiment) · df90d58f

Yunqing Wang authored 11 years ago

Current partition checking starts from small sizes, and then goes up
to large sizes. This experiment uses the small partitions' motion
estimation result, which is already available, to speed up the
large partition's motion estimation. We can decide to skip some
patition checkings if they are unlikely choices. We could use the
motion vector(MV) result as current partition's prediction MV, limit
the search range and reference frame.

Current result at speed 1:
psnr loss: 1.19% for stdhd, 0.287% for derf.
speed gain: 14% for sunflower(hd), 11% for akiyo.

Further improvement will be done later.

Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab

df90d58f

vp9_convolve8_neon placeholder · 59dc4e9c

Johann authored 11 years ago

Call the individually optimized horizontal and vertical functions. This
implementation abuses the temp buffer.

This will be replaced with a custom optimized function.

Over 2x speedup.

Change-Id: I5b908d2a73d264e9810d6022bbff73207a3055dd

59dc4e9c

Move uv intra mode selection in rd loop. · 2ee338ce

Paul Wilkins authored 11 years ago

Use an estimate based on DC_PRED for intra uv cost
within the rd loop then only do a full uv mode analysis
if an intra mode is chosen.

Significant speed gains in some cases. Currently only
enabled for speed 2 pending speed/quality tests.

Change-Id: Ie851a12400d5483bce47ec0e3ccb8516041e91c0

2ee338ce

Limit transform sizes searched for uv intra. · 6c667f0f

Paul Wilkins authored 11 years ago

Apply limit if search_method == USE_LARGESTALL
to the range of UV tx sizes searched.

Change-Id: I6db29f0dd237285ffc50d75a37e8b68151ad821c

6c667f0f

Adding read_comp_pred function. · 851a9111

Dmitry Kovalev authored 11 years ago

Removing old debug code from vp9_decodemv.c.

Change-Id: I51a6d5fe6a2f6583a1555e692bb1ee5a5b315d6c

851a9111

Skip redundant motion search in 4x4 level rd loop · a142d6fc

Jingning Han authored 11 years ago

This commit makes the encoder to perform motion search only once
per reference frame type for each 4x4/4x8/8x4 block. For bus_cif
at 2000 kbps, the runtime goes from 253812ms -> 217817ms
(14% speed-up) for speed 0.

Change-Id: I5f17599ccc8cfaf93ccb4f98fcb6008af6d79e92

a142d6fc

Removing two unused arguments from vp9_inc_mv signature. · 41ae3d02
Dmitry Kovalev authored 11 years ago
```
Change-Id: Ieffea49eb7a5e5092f21f8694c546aff69b07c6d
```
41ae3d02

16 Jul, 2013 - 17 commits

Changing signature of vp9_get_pred_probs_tx_size. · 5b65a71c

Dmitry Kovalev authored 11 years ago

Removing VP9_COMMON* argument and adding struct tx_probs* instead of
MACROBLOCKD*.

Change-Id: Idf61074631a90ec51eac22c8dcd977f44ac0757c

5b65a71c

Removing MV_GROUP_UPDATE define and corresponding code. · 3997da0d
Dmitry Kovalev authored 11 years ago
```
Change-Id: I4884cdc2557d25d50c7c4f7e19b1ad8bdb93cd63
```
3997da0d

Cleaning up tile code. · 9482a0bf

Dmitry Kovalev authored 11 years ago

Removing tile_rows and tile_columns from VP9Common, removing redundant
constants MIN_TILE_WIDTH and MAX_TILE_WIDTH, changing signature of
vp9_get_tile_n_bits.

Change-Id: I8ff3104a38179b2c6900df965c144c1d6f602267

9482a0bf

Loop filter code cleanup. · 2de3c8d2

Dmitry Kovalev authored 11 years ago

Cosmetic code changes, renaming 'flat' local var to 'mask', removing
unused field 'blim' from loopfilter_info_n and loop_filter_info structs.

Change-Id: I51e6ccf727fe361ad9a08e29e1201aa7abd4987f

2de3c8d2

use consistent framerate naming · 9581eb6e
James Zern authored 11 years ago
```
s/frame_rate/framerate/g

Change-Id: I6fc3e088e419c5f46e3a9390dd8a2cad2677a2fc
```
9581eb6e

delete vp9_loopfilter_sse2.asm · 50015f6e

James Zern authored 11 years ago

sse2 functions are provided by vp9_loopfilter_intrin_sse2.c

Change-Id: I40454d26034e3ef915eeaf889937fe7d1b519b9b

50015f6e

vp9_loopfilter_intrin_sse2: cosmetics: fix indent · 8f4787a3
James Zern authored 11 years ago
```
Change-Id: I892e76d5ad1443b2ea0d1a7839fe26afe9c68ffb
```
8f4787a3

delete x86/vp9_loopfilter_x86.h · af582542

James Zern authored 11 years ago

also remove prototype_loopfilter{,_block} defines from vp9_loopfilter.h

Change-Id: I865ab3f9436c7b1ca166f76630328abf01389405

af582542

SSE2 16x16 inverse ADST/DCT hybrid transform · d05f66aa

Jingning Han authored 11 years ago

This commit enables SSE2 implementation of 16x16 inverse ADST/DCT
hybrid transform. The runtime goes from 5742 cycles -> 1821 cycles.
This provides about 1% encoding speed-up at speed 0.

Change-Id: I1678d0988bf30b9efd524877705bbb3645edb17b

d05f66aa

Replace generated quant tables with static lookup tables. · e965cccc

Ronald S. Bultje authored 11 years ago

This prevents possible float rounding issues between architectures.

Change-Id: I6ed260aebd49feb4cfb5596a5370c44be5f72167

e965cccc

Moving vp9_kf_default_bmode_probs to vp9_entropymode.c. · baf0c959
Dmitry Kovalev authored 11 years ago
```
Removing vp9_modelcontext.c.

Change-Id: If2316c58dead2708d9f95b52d9494ba4c1dd7427
```
baf0c959

Rewriting vp9_set_pred_flag_{seg_id, mbskip}. · 863138a2

Dmitry Kovalev authored 11 years ago

Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent
with vp9_get_segment_id without using confusing sub(a, b) macro. Passing
mi_row and mi_col to functions explicitly instead of replying on
mb_to_right_edge and mb_to_bottom_edge.

Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435

863138a2

Minor cleanup in code to fine uv tx_size. · 30d2ea45
Paul Wilkins authored 11 years ago
```
Change-Id: I94b97a966b5efbc9a243048f1f5ddbbdc4b1846e
```
30d2ea45

Fix above context pointers · 5efd9609

John Koleszar authored 11 years ago

In the prior code, the above context pointers used for entropy
decoding were initialized on the first frame, and not updated when
the frame size changed. The per-frame code which initializes the
contexts assumes that the contexts are contiguous, leading to an
incomplete initialization when the frame is smaller. This commit
updates the pointers so that the context is contigous whenever
the frame size changes.

Change-Id: I08b53e3a30c8289491212311682ff1b8028cff6c

5efd9609

Change to extend full border only when needed · 5b915ebd

Yaowu Xu authored 11 years ago

This is a short term optimization till we work out a decoder
implementation requiring no frame border extension.

Change-Id: I02d15bfde4d926b50a4e58b393d8c4062d1be70f

5b915ebd

Removing and moving around constant definitions. · ca75f125

Dmitry Kovalev authored 11 years ago

Removing unused and duplicated constants, moving them from *.h to *.c
if possible.

Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f

ca75f125

Inline vp9_quantize() in xform_quant(). · 1ff94fea

Ronald S. Bultje authored 11 years ago

Cycle times:
4x4:    151 to  131 cycles (15% faster)
8x8:    334 to  306 cycles (9% faster)
16x16: 1401 to 1368 cycles (2.5% faster)
32x32: 7403 to 7367 cycles (0.5% faster)

Total encode time of first 50 frames of bus @ 1500kbps (speed 0)
goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup.

Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f

1ff94fea

15 Jul, 2013 - 2 commits

Consistent naming for loop-filter filters. · e973b4e2

Dmitry Kovalev authored 11 years ago

Renaming flatmask4 to flat_mask4, flatmask5 to flat_mask5, hevmask to
hev_mask, filter to filter4, mbfilter to filter8, wide_mbfilter to
filter16.

Change-Id: Ic61c73e59c2eee505257584867aafac99833cea1

e973b4e2

Inline xform_quant() in encode_block_intra(). · 6fb41874

Ronald S. Bultje authored 11 years ago

Also inline some of the block calculations to assist the compiler to
not do silly things like calculating the same offset (or converting
between raster/transform block offset or block, mi and pixel unit)
many, many, many times.

Cycle times:
4x4:     584 ->   505 cycles (16% faster)
8x8:    1651 ->  1560 cycles (6% faster)
16x16:  7897 ->  7704 cycles (2.5% faster)
32x32: 16096 -> 15852 cycles (1.5% faster)

Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the
first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall.

Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80

6fb41874