- 08 Oct, 2012 2 commits
-
-
Janne Grunau authored
-
Janne Grunau authored
Rename the called dsp init functions to *_init_x86.
-
- 10 Sep, 2012 1 commit
-
-
Diego Biurrun authored
-
- 28 Aug, 2012 3 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Diego Biurrun authored
-
- 27 Aug, 2012 1 commit
-
-
Diego Biurrun authored
-
- 24 Aug, 2012 3 commits
-
-
Diego Biurrun authored
-
Diego Biurrun authored
-
Diego Biurrun authored
-
- 21 Aug, 2012 1 commit
-
-
Diego Biurrun authored
-
- 16 Aug, 2012 1 commit
-
-
Diego Biurrun authored
-
- 15 Aug, 2012 1 commit
-
-
Diego Biurrun authored
-
- 12 Aug, 2012 1 commit
-
-
Diego Biurrun authored
-
- 02 Aug, 2012 1 commit
-
-
Mans Rullgard authored
These functions are not faster than other mmx implementations on any hardware I have been able to test on, and they are horribly inaccurate. There is thus no reason to ever use them. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 01 Aug, 2012 1 commit
-
-
Ronald S. Bultje authored
64-bit CPUs always have SSE available, thus there is no need to compile in the 3dnow functions. This results in smaller binaries.
-
- 18 Jul, 2012 2 commits
-
-
Mans Rullgard authored
This moves all VP3-specific function pointers from dsputil to a new vp3dsp context. There is no reason to ever use the VP3 IDCT where an MPEG2 IDCT is expected or vice versa. Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
Mans Rullgard authored
Signed-off-by:
Mans Rullgard <mans@mansr.com>
-
- 25 Jun, 2012 1 commit
-
-
Mans Rullgard authored
-
- 12 Apr, 2012 1 commit
-
-
Diego Biurrun authored
-
- 26 Mar, 2012 1 commit
-
-
Diego Biurrun authored
-
- 25 Mar, 2012 1 commit
-
-
Diego Biurrun authored
-
- 23 Feb, 2012 1 commit
-
-
Christophe GISQUET authored
The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 02 Feb, 2012 1 commit
-
-
Ronald S. Bultje authored
This will be useful to test more aggressively for failures to mark XMM registers as clobbered in Win64 builds, and prevent regressions thereof. Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>
-
- 30 Jan, 2012 3 commits
-
-
Christophe Gisquet authored
Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are multiples of 512 (which is often the case when the values round up nicely). *_TIMER report for the 16x16 and 8x8 cases: C: 9015 decicycles in 16, 524257 runs, 31 skips 2656 decicycles in 8, 524271 runs, 17 skips MMX: 4156 decicycles in 16, 262090 runs, 54 skips 1206 decicycles in 8, 262131 runs, 13 skips MMX on fast-path: 2760 decicycles in 16, 524222 runs, 66 skips 995 decicycles in 8, 524252 runs, 36 skips SSE2: 2163 decicycles in 16, 262131 runs, 13 skips 832 decicycles in 8, 262137 runs, 7 skips SSE2 with fast path: 1783 decicycles in 16, 524276 runs, 12 skips 711 decicycles in 8, 524283 runs, 5 skips SSSE3: 2117 decicycles in 16, 262136 runs, 8 skips 814 decicycles in 8, 262143 runs, 1 skips SSSE3 with fast path: 1315 decicycles in 16, 524285 runs, 3 skips 578 decicycles in 8, 524286 runs, 2 skips This means around a 4% speedup for some sequences. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
Diego Biurrun authored
-
Ronald S. Bultje authored
-
- 29 Jan, 2012 1 commit
-
-
Ronald S. Bultje authored
-
- 12 Jan, 2012 1 commit
-
-
Christophe GISQUET authored
When decoding coefficients, detect whether the block is DC-only, and take advantage of this knowledge to perform DC-only inverse transform. This is achieved by: - first, changing the 108x4 element modulo_three_table into a 108 element table (kind of base4), and accessing each value using mask and shifts. - then, checking low bits for 0 (as they represent the presence of higher frequency coefficients) Also provide x86 SIMD code for the DC-only inverse transform. Signed-off-by:
Kostya Shishkov <kostya.shishkov@gmail.com>
-
- 09 Jan, 2012 1 commit
-
-
Vitor Sessak authored
Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 19 Dec, 2011 1 commit
-
-
Diego Biurrun authored
-
- 14 Dec, 2011 1 commit
-
-
Diego Biurrun authored
-
- 11 Oct, 2011 1 commit
-
-
Ronald S. Bultje authored
~3.0-3.5x as fast as original C version, 1.6x as fast overall.
-
- 11 Aug, 2011 1 commit
-
-
Kostya Shishkov authored
Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 03 Jul, 2011 1 commit
-
-
Daniel Kang authored
Mainly ported from 8-bit H.264 qpel. Some code ported from x264. LGPL ok by author. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 21 Jun, 2011 1 commit
-
-
Daniel Kang authored
Mainly ported from 8-bit H.264 weight/biweight. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 18 Jun, 2011 1 commit
-
-
Daniel Kang authored
Mainly ported from 8-bit H.264 MC Chroma. Signed-off-by:
Ronald S. Bultje <rsbultje@gmail.com>
-
- 05 Jun, 2011 1 commit
-
-
Daniel Kang authored
Parts are inspired from the 8-bit H.264 predict code in Libav. Other parts ported from x264 with relicensing permission from author. Signed-off-by:
Diego Biurrun <diego@biurrun.de>
-
- 31 May, 2011 1 commit
-
-
Daniel Kang authored
Ports the majority of IDCT functions for 10-bit H.264. Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author. Signed-off-by:
Ronald S. Bultje <rbultje@google.com>
-
- 21 May, 2011 1 commit
-
-
Vitor Sessak authored
-