1. 20 Dec, 2012 1 commit
  2. 26 Nov, 2012 1 commit
  3. 25 Nov, 2012 2 commits
  4. 16 Nov, 2012 1 commit
  5. 13 Nov, 2012 1 commit
  6. 31 Oct, 2012 3 commits
  7. 30 Oct, 2012 1 commit
  8. 05 Oct, 2012 1 commit
  9. 13 Sep, 2012 1 commit
  10. 12 Sep, 2012 1 commit
  11. 11 Sep, 2012 1 commit
  12. 07 Sep, 2012 3 commits
  13. 06 Sep, 2012 1 commit
  14. 30 Aug, 2012 2 commits
  15. 28 Aug, 2012 1 commit
  16. 08 Aug, 2012 1 commit
  17. 07 Aug, 2012 1 commit
  18. 03 Aug, 2012 2 commits
    • Diego Biurrun's avatar
      x86: build: replace mmx2 by mmxext · 239fdf1b
      Diego Biurrun authored
      Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
      So switching to a consistent naming scheme beforehand is sensible.
      The name "mmxext" is more official and widespread and also the name
      of the CPU flag, as reported e.g. by the Linux kernel.
      239fdf1b
    • Diego Biurrun's avatar
      x86: Use consistent 3dnowext function and macro name suffixes · ca844b7b
      Diego Biurrun authored
      Currently there is a wild mix of 3dn2/3dnow2/3dnowext.  Switching to
      "3dnowext", which is a more common name of the CPU flag, as reported
      e.g. by the Linux kernel, unifies this.
      ca844b7b
  19. 02 Aug, 2012 1 commit
  20. 28 Jul, 2012 1 commit
  21. 25 Jul, 2012 2 commits
    • Ronald S. Bultje's avatar
      x86/dsputil: put inline asm under HAVE_INLINE_ASM. · 79195ce5
      Ronald S. Bultje authored
      
      
      This allows compiling with compilers that don't support gcc-style
      inline assembly.
      Signed-off-by: default avatarDerek Buitenhuis <derek.buitenhuis@gmail.com>
      79195ce5
    • Yang Wang's avatar
      dsputil_mmx: fix incorrect assembly code · 845e92fd
      Yang Wang authored
      
      
      In ff_put_pixels_clamped_mmx(), there are two assembly code blocks.
      In the first block (in the unrolled loop), the instructions
      "movq 8%3, %%mm1 \n\t", and so forth, have problems.
      
      From above instruction, it is clear what the programmer wants: a load from
      p + 8. But this assembly code doesn’t guarantee that. It only works if the
      compiler puts p in a register to produce an instruction like this:
      "movq 8(%edi), %mm1". During compiler optimization, it is possible that the
      compiler will be able to constant propagate into p. Suppose p = &x[10000].
      Then operand 3 can become 10000(%edi), where %edi holds &x. And the instruction
      becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8.
      
      This will cause a segmentation fault.
      
      This error was fixed in the second block of the assembly code, but not in
      the unrolled loop.
      
      How to reproduce:
          This error is exposed when we build using Intel C++ Compiler, with
          IPO+PGO optimization enabled. Crashed when decoding an MJPEG video.
      Signed-off-by: default avatarMichael Niedermayer <michaelni@gmx.at>
      Signed-off-by: default avatarDerek Buitenhuis <derek.buitenhuis@gmail.com>
      845e92fd
  22. 19 Jul, 2012 1 commit
  23. 18 Jul, 2012 1 commit
  24. 23 Jun, 2012 2 commits
  25. 08 Jun, 2012 1 commit
  26. 21 May, 2012 1 commit
  27. 10 May, 2012 1 commit
    • Christophe Gisquet's avatar
      rv40dsp x86: MMX/MMX2/3DNow/SSE2/SSSE3 implementations of MC · 110d0cdc
      Christophe Gisquet authored
      
      
      Code mostly inspired by vp8's MC, however:
      - its MMX2 horizontal filter is worse because it can't take advantage of
        the coefficient redundancy
      - that same coefficient redundancy allows better code for non-SSSE3 versions
      
      Benchmark (rounded to tens of unit):
              V8x8  H8x8  2D8x8  V16x16  H16x16  2D16x16
      C       445    358   985    1785    1559    3280
      MMX*    219    271   478     714     929    1443
      SSE2    131    158   294     425     515     892
      SSSE3   120    122   248     387     390     763
      
      End result is overall around a 15% speedup for SSSE3 version (on 6 sequences);
      all loop filter functions now take around 55% of decoding time, while luma MC
      dsp functions are around 6%, chroma ones are 1.3% and biweight around 2.3%.
      Signed-off-by: default avatarDiego Biurrun <diego@biurrun.de>
      110d0cdc
  28. 28 Apr, 2012 1 commit
  29. 21 Apr, 2012 2 commits
  30. 04 Apr, 2012 1 commit