1. 11 Sep, 2012 1 commit
  2. 07 Sep, 2012 3 commits
  3. 06 Sep, 2012 1 commit
  4. 30 Aug, 2012 2 commits
  5. 28 Aug, 2012 1 commit
  6. 08 Aug, 2012 1 commit
  7. 07 Aug, 2012 1 commit
  8. 03 Aug, 2012 2 commits
    • Diego Biurrun's avatar
      x86: build: replace mmx2 by mmxext · 239fdf1b
      Diego Biurrun authored
      Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
      So switching to a consistent naming scheme beforehand is sensible.
      The name "mmxext" is more official and widespread and also the name
      of the CPU flag, as reported e.g. by the Linux kernel.
      239fdf1b
    • Diego Biurrun's avatar
      x86: Use consistent 3dnowext function and macro name suffixes · ca844b7b
      Diego Biurrun authored
      Currently there is a wild mix of 3dn2/3dnow2/3dnowext.  Switching to
      "3dnowext", which is a more common name of the CPU flag, as reported
      e.g. by the Linux kernel, unifies this.
      ca844b7b
  9. 02 Aug, 2012 1 commit
  10. 28 Jul, 2012 1 commit
  11. 25 Jul, 2012 2 commits
    • Ronald S. Bultje's avatar
      x86/dsputil: put inline asm under HAVE_INLINE_ASM. · 79195ce5
      Ronald S. Bultje authored
      
      
      This allows compiling with compilers that don't support gcc-style
      inline assembly.
      Signed-off-by: default avatarDerek Buitenhuis <derek.buitenhuis@gmail.com>
      79195ce5
    • Yang Wang's avatar
      dsputil_mmx: fix incorrect assembly code · 845e92fd
      Yang Wang authored
      
      
      In ff_put_pixels_clamped_mmx(), there are two assembly code blocks.
      In the first block (in the unrolled loop), the instructions
      "movq 8%3, %%mm1 \n\t", and so forth, have problems.
      
      From above instruction, it is clear what the programmer wants: a load from
      p + 8. But this assembly code doesn’t guarantee that. It only works if the
      compiler puts p in a register to produce an instruction like this:
      "movq 8(%edi), %mm1". During compiler optimization, it is possible that the
      compiler will be able to constant propagate into p. Suppose p = &x[10000].
      Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction
      becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8.
      
      This will cause a segmentation fault.
      
      This error was fixed in the second block of the assembly code, but not in
      the unrolled loop.
      
      How to reproduce:
          This error is exposed when we build using Intel C++ Compiler, with
          IPO+PGO optimization enabled. Crashed when decoding an MJPEG video.
      Signed-off-by: default avatarMichael Niedermayer <michaelni@gmx.at>
      Signed-off-by: default avatarDerek Buitenhuis <derek.buitenhuis@gmail.com>
      845e92fd
  12. 19 Jul, 2012 1 commit
  13. 18 Jul, 2012 1 commit
  14. 23 Jun, 2012 2 commits
  15. 08 Jun, 2012 1 commit
  16. 21 May, 2012 1 commit
  17. 10 May, 2012 1 commit
    • Christophe Gisquet's avatar
      rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC · 110d0cdc
      Christophe Gisquet authored
      
      
      Code mostly inspired by vp8's MC, however:
      - its MMX2 horizontal filter is worse because it can't take advantage of
        the coefficient redundancy
      - that same coefficient redundancy allows better code for non-SSSE3 versions
      
      Benchmark (rounded to tens of unit):
              V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
      C       445    358   985    1785    1559    3280
      MMX*    219    271   478     714     929    1443
      SSE2    131    158   294     425     515     892
      SSSE3   120    122   248     387     390     763
      
      End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
      all loop filter functions now take around 55% of decoding time, while luma MC
      dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
      Signed-off-by: default avatarDiego Biurrun <diego@biurrun.de>
      110d0cdc
  18. 28 Apr, 2012 1 commit
  19. 21 Apr, 2012 2 commits
  20. 04 Apr, 2012 1 commit
  21. 25 Mar, 2012 4 commits
  22. 05 Mar, 2012 1 commit
  23. 15 Feb, 2012 1 commit
  24. 30 Jan, 2012 1 commit
    • Christophe Gisquet's avatar
      x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf · 6b039003
      Christophe Gisquet authored
      
      
      While pshufb allows emulating bswap on XMM registers for SSSE3, more
      shuffling is needed for SSE2. Alignment is critical, so specific codepaths
      are provided for this case.
      
      For the huffyuv sequence "angels_480-huffyuvcompress.avi":
      C (using bswap instruction): ~ 55k cycles
      SSE2:                        ~ 40k cycles
      SSSE3 using unaligned loads: ~ 35k cycles
      SSSE3 using aligned loads:   ~ 30k cycles
      Signed-off-by: default avatarDiego Biurrun <diego@biurrun.de>
      6b039003
  25. 29 Jan, 2012 1 commit
  26. 25 Jan, 2012 1 commit
  27. 14 Dec, 2011 1 commit
  28. 22 Nov, 2011 1 commit
  29. 11 Nov, 2011 1 commit
  30. 07 Nov, 2011 1 commit