1. 30 Oct, 2012 1 commit
  2. 05 Oct, 2012 1 commit
  3. 13 Sep, 2012 1 commit
  4. 12 Sep, 2012 1 commit
  5. 11 Sep, 2012 1 commit
  6. 07 Sep, 2012 3 commits
  7. 06 Sep, 2012 1 commit
  8. 30 Aug, 2012 2 commits
  9. 28 Aug, 2012 1 commit
  10. 08 Aug, 2012 1 commit
  11. 07 Aug, 2012 1 commit
  12. 03 Aug, 2012 2 commits
    • Diego Biurrun's avatar
      x86: build: replace mmx2 by mmxext · 239fdf1b
      Diego Biurrun authored
      Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
      So switching to a consistent naming scheme beforehand is sensible.
      The name "mmxext" is more official and widespread and also the name
      of the CPU flag, as reported e.g. by the Linux kernel.
      239fdf1b
    • Diego Biurrun's avatar
      x86: Use consistent 3dnowext function and macro name suffixes · ca844b7b
      Diego Biurrun authored
      Currently there is a wild mix of 3dn2/3dnow2/3dnowext.  Switching to
      "3dnowext", which is a more common name of the CPU flag, as reported
      e.g. by the Linux kernel, unifies this.
      ca844b7b
  13. 02 Aug, 2012 1 commit
  14. 28 Jul, 2012 1 commit
  15. 25 Jul, 2012 2 commits
    • Ronald S. Bultje's avatar
      x86/dsputil: put inline asm under HAVE_INLINE_ASM. · 79195ce5
      Ronald S. Bultje authored
      
      
      This allows compiling with compilers that don't support gcc-style
      inline assembly.
      Signed-off-by: default avatarDerek Buitenhuis <derek.buitenhuis@gmail.com>
      79195ce5
    • Yang Wang's avatar
      dsputil_mmx: fix incorrect assembly code · 845e92fd
      Yang Wang authored
      
      
      In ff_put_pixels_clamped_mmx(), there are two assembly code blocks.
      In the first block (in the unrolled loop), the instructions
      "movq 8%3, %%mm1 \n\t", and so forth, have problems.
      
      From above instruction, it is clear what the programmer wants: a load from
      p + 8. But this assembly code doesn’t guarantee that. It only works if the
      compiler puts p in a register to produce an instruction like this:
      "movq 8(%edi), %mm1". During compiler optimization, it is possible that the
      compiler will be able to constant propagate into p. Suppose p = &x[10000].
      Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction
      becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8.
      
      This will cause a segmentation fault.
      
      This error was fixed in the second block of the assembly code, but not in
      the unrolled loop.
      
      How to reproduce:
          This error is exposed when we build using Intel C++ Compiler, with
          IPO+PGO optimization enabled. Crashed when decoding an MJPEG video.
      Signed-off-by: default avatarMichael Niedermayer <michaelni@gmx.at>
      Signed-off-by: default avatarDerek Buitenhuis <derek.buitenhuis@gmail.com>
      845e92fd
  16. 19 Jul, 2012 1 commit
  17. 18 Jul, 2012 1 commit
  18. 23 Jun, 2012 2 commits
  19. 08 Jun, 2012 1 commit
  20. 21 May, 2012 1 commit
  21. 10 May, 2012 1 commit
    • Christophe Gisquet's avatar
      rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC · 110d0cdc
      Christophe Gisquet authored
      
      
      Code mostly inspired by vp8's MC, however:
      - its MMX2 horizontal filter is worse because it can't take advantage of
        the coefficient redundancy
      - that same coefficient redundancy allows better code for non-SSSE3 versions
      
      Benchmark (rounded to tens of unit):
              V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
      C       445    358   985    1785    1559    3280
      MMX*    219    271   478     714     929    1443
      SSE2    131    158   294     425     515     892
      SSSE3   120    122   248     387     390     763
      
      End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
      all loop filter functions now take around 55% of decoding time, while luma MC
      dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
      Signed-off-by: default avatarDiego Biurrun <diego@biurrun.de>
      110d0cdc
  22. 28 Apr, 2012 1 commit
  23. 21 Apr, 2012 2 commits
  24. 04 Apr, 2012 1 commit
  25. 25 Mar, 2012 4 commits
  26. 05 Mar, 2012 1 commit
  27. 15 Feb, 2012 1 commit
  28. 30 Jan, 2012 1 commit
    • Christophe Gisquet's avatar
      x86 dsputil: provide SSE2/SSSE3 versions of bswap_buf · 6b039003
      Christophe Gisquet authored
      
      
      While pshufb allows emulating bswap on XMM registers for SSSE3, more
      shuffling is needed for SSE2. Alignment is critical, so specific codepaths
      are provided for this case.
      
      For the huffyuv sequence "angels_480-huffyuvcompress.avi":
      C (using bswap instruction): ~ 55k cycles
      SSE2:                        ~ 40k cycles
      SSSE3 using unaligned loads: ~ 35k cycles
      SSSE3 using aligned loads:   ~ 30k cycles
      Signed-off-by: default avatarDiego Biurrun <diego@biurrun.de>
      6b039003
  29. 29 Jan, 2012 1 commit
  30. 25 Jan, 2012 1 commit