1. 30 Apr, 2013 1 commit
  2. 19 Apr, 2013 1 commit
  3. 10 Apr, 2013 1 commit
  4. 08 Apr, 2013 1 commit
  5. 18 Feb, 2013 1 commit
  6. 06 Feb, 2013 2 commits
  7. 27 Jan, 2013 1 commit
  8. 24 Jan, 2013 1 commit
  9. 23 Jan, 2013 1 commit
  10. 20 Jan, 2013 1 commit
  11. 06 Jan, 2013 1 commit
  12. 20 Dec, 2012 1 commit
  13. 25 Nov, 2012 1 commit
  14. 08 Oct, 2012 3 commits
  15. 10 Sep, 2012 1 commit
  16. 28 Aug, 2012 3 commits
  17. 27 Aug, 2012 1 commit
  18. 24 Aug, 2012 3 commits
  19. 21 Aug, 2012 1 commit
  20. 16 Aug, 2012 1 commit
  21. 15 Aug, 2012 1 commit
  22. 12 Aug, 2012 1 commit
  23. 02 Aug, 2012 1 commit
  24. 01 Aug, 2012 1 commit
  25. 18 Jul, 2012 2 commits
  26. 25 Jun, 2012 1 commit
  27. 12 Apr, 2012 1 commit
  28. 26 Mar, 2012 1 commit
  29. 25 Mar, 2012 1 commit
  30. 23 Feb, 2012 1 commit
  31. 02 Feb, 2012 1 commit
  32. 30 Jan, 2012 1 commit
    • Christophe Gisquet's avatar
      rv40: x86 SIMD for biweight · e5c9de2a
      Christophe Gisquet authored
      
      
      Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are
      multiples of 512 (which is often the case when the values round up nicely).
      
      *_TIMER report for the 16x16 and 8x8 cases:
      C:
      9015 decicycles in 16, 524257 runs, 31 skips
      2656 decicycles in 8, 524271 runs, 17 skips
      MMX:
      4156 decicycles in 16, 262090 runs, 54 skips
      1206 decicycles in 8, 262131 runs, 13 skips
      MMX on fast-path:
      2760 decicycles in 16, 524222 runs, 66 skips
      995 decicycles in 8, 524252 runs, 36 skips
      SSE2:
      2163 decicycles in 16, 262131 runs, 13 skips
      832 decicycles in 8, 262137 runs, 7 skips
      SSE2 with fast path:
      1783 decicycles in 16, 524276 runs, 12 skips
      711 decicycles in 8, 524283 runs, 5 skips
      SSSE3:
      2117 decicycles in 16, 262136 runs, 8 skips
      814 decicycles in 8, 262143 runs, 1 skips
      SSSE3 with fast path:
      1315 decicycles in 16, 524285 runs, 3 skips
      578 decicycles in 8, 524286 runs, 2 skips
      
      This means around a 4% speedup for some sequences.
      Signed-off-by: default avatarDiego Biurrun <diego@biurrun.de>
      e5c9de2a