1. 24 Mar, 2011 1 commit
    • Johann's avatar
      use asm_offsets with vp8_regular_quantize_b_sse2 · 8edaf6e2
      Johann authored
      remove helper function and avoid shadowing all the arguments to the
      stack on 64bit systems
      
      when running with --good --cpu-used=0:
      ~2% on linux x86 and x86_64
      ~2% on win32 x86 msys and visual studio
      more on darwin10 x86_64
      significantly more on
      x86_64-win64-vs9
      
      Change-Id: Ib7be12edf511fbf2922f191afd5b33b19a0c4ae6
      8edaf6e2
  2. 21 Mar, 2011 1 commit
    • John Koleszar's avatar
      Remove unused vp8_get4x4sse_cs_mmx declaration · 2cbd9620
      John Koleszar authored
      This declaration did not match the prototype_sad() prototype, but was
      unused in this translation unit, so it is removed instead. Fixes
      issue 290.
      
      Change-Id: I168854f88a85f73ca9aaf61d1e5dc0f43fc3fdb3
      2cbd9620
  3. 18 Mar, 2011 1 commit
    • John Koleszar's avatar
      Increase static linkage, remove unused functions · 429dc676
      John Koleszar authored
      A large number of functions were defined with external linkage, even
      though they were only used from within one file. This patch changes
      their linkage to static and removes the vp8_ prefix from their names,
      which should make it more obvious to the reader that the function is
      contained within the current translation unit. Functions that were
      not referenced were removed.
      
      These symbols were identified by:
      
        $ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
          | sort | grep '^ *1 '
      
      Change-Id: I59609f58ab65312012c047036ae1e0634f795779
      429dc676
  4. 11 Mar, 2011 2 commits
  5. 09 Mar, 2011 2 commits
  6. 08 Mar, 2011 3 commits
  7. 28 Feb, 2011 1 commit
  8. 22 Feb, 2011 1 commit
  9. 18 Feb, 2011 1 commit
  10. 14 Feb, 2011 1 commit
    • Yunqing Wang's avatar
      Improve vp8_sad16x16_sse3 function · 2debd5b5
      Yunqing Wang authored
      In real-time mode, vp8_sad16x16 function is called heavily in
      motion search part. Improvement of this function gives 1.2%
      encoding performance gain (real-time mode, tulip clip).
      
      Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
      2debd5b5
  11. 10 Feb, 2011 1 commit
    • John Koleszar's avatar
      Fix relative include paths · 02321de0
      John Koleszar authored
      Allow compiling without adding vp8/{common,encoder,decoder} to the
      include paths.
      
      Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
      02321de0
  12. 21 Jan, 2011 1 commit
  13. 18 Jan, 2011 1 commit
    • Attila Nagy's avatar
      Fix encoder real-time only configuration. · cb791aaa
      Attila Nagy authored
      Remove allocation/deallocation of stats storage.
      Remove full search functions in machine specific encoder inits.
      Remove last pass validation in  validate_config.
      
      Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
      cb791aaa
  14. 14 Jan, 2011 1 commit
    • Johann's avatar
      update sse2 regular quantizer · 15f9bea7
      Johann authored
      about ~5% gain on 32bit. disabled for 64bit
      
      unset executable bit on ssse3 version (cosmetic)
      
      Change-Id: I1a5860839eb294ce4261f819caea2dcfa78e57ca
      15f9bea7
  15. 11 Jan, 2011 1 commit
    • Johann's avatar
      use unaligned load · f50f2fd2
      Johann authored
      source buffer is not guaranteed to be aligned for odd size buffers
      
      Change-Id: Id0b1fd40ba3bd6c994bcfada788feccd2b53c5a9
      f50f2fd2
  16. 06 Jan, 2011 1 commit
    • Johann's avatar
      x86 sse2 temporal_filter_apply · 8b0cf5f7
      Johann authored
      count can be reduced to short because the max number of filtered frames
      is set to 15. the max value for any frame is 32 (modifier = 16,
      filter_weight = 2). 15*32 = 480 which requires 9 bits
      
      this function goes from about 7000 us / 1000 iterations for the C code
      to < 275 us / 1000 iterations for sse2 for block_size = 16 and from
      about 1800 us / 1000 iters to < 100 us / 1000 iters for block_size = 8
      
      Change-Id: I64a32607f58a2d33c39286f468b04ccd457d9e6e
      8b0cf5f7
  17. 28 Dec, 2010 1 commit
    • Scott LaVarnway's avatar
      Use the fast quantizer for inter mode selection · 516ea846
      Scott LaVarnway authored
      Use the fast quantizer for inter mode selection and the
      regular quantizer for the rest of the encode for good quality,
      speed 1.  Both performance and quality were improved.  The
      quality gains will make up for the quality loss mentioned in
      I9dc089007ca08129fb6c11fe7692777ebb8647b0.
      
      Change-Id: Ia90bc9cf326a7c65d60d31fa32f6465ab6984d21
      516ea846
  18. 13 Dec, 2010 1 commit
    • John Koleszar's avatar
      remove unused temporal preproc code · b1aa54ab
      John Koleszar authored
      This code is unused, as the current preproc implementation uses the
      same spatial filter that postproc uses.
      
      Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7
      b1aa54ab
  19. 09 Dec, 2010 1 commit
  20. 15 Nov, 2010 1 commit
  21. 11 Nov, 2010 1 commit
  22. 10 Nov, 2010 1 commit
    • Fritz Koenig's avatar
      FDCT optimizations. · 5f0e0617
      Fritz Koenig authored
      Fixed up the fdct for mmx and 8x4 sse2 to match them
      most recent changes.
      
      Change-Id: Ibee2d6c536fe14dcf75cd6eb1c73f4848a56d719
      5f0e0617
  23. 01 Nov, 2010 1 commit
    • Scott LaVarnway's avatar
      SSSE3 version of fast quantizer · ff4a71f4
      Scott LaVarnway authored
      (test clip: tulip)
      For good quality mode with speed=1, this gave the encoder
      a small (2 - 3%) performance boost.
      
      Change-Id: I8a1d4269465944ac0819986c2f0be4b0a2ee0b35
      ff4a71f4
  24. 28 Oct, 2010 2 commits
  25. 27 Oct, 2010 2 commits
    • Yunqing Wang's avatar
      Full search SAD function optimization in SSE4.1 · 71ecb5d7
      Yunqing Wang authored
      Use mpsadbw, and calculate 8 sad at once. Function list:
      vp8_sad16x16x8_sse4
      vp8_sad16x8x8_sse4
      vp8_sad8x16x8_sse4
      vp8_sad8x8x8_sse4
      vp8_sad4x4x8_sse4
      
      (test clip: tulip)
      For best quality mode, this gave encoder a 5% performance boost.
      For good quality mode with speed=1, this gave encoder a 3%
      performance boost.
      
      Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
      71ecb5d7
    • John Koleszar's avatar
      Fix half-pixel variance RTCD functions · a0ae3682
      John Koleszar authored
      This patch fixes the system dependent entries for the half-pixel
      variance functions in both the RTCD and non-RTCD cases:
      
        - The generic C versions of these functions are now correct.
          Before all three cases called the hv code.
      
        - Wire up the ARM functions in RTCD mode
      
        - Created stubs for x86 to call the optimized subpixel functions
          with the correct parameters, rather than falling back to C
          code.
      
      Change-Id: I1d937d074d929e0eb93aacb1232cc5e0ad1c6184
      a0ae3682
  26. 26 Oct, 2010 1 commit
    • John Koleszar's avatar
      add missing GET_GOT/RESTORE_GOT pairs · b523dd51
      John Koleszar authored
      These functions made global references but did not set up the GOT,
      causing compilation failures in PIC mode.
      
      Change-Id: Iac473bf46733f87eb2e001cd736af4acf73fa51d
      b523dd51
  27. 22 Oct, 2010 1 commit
    • Timothy B. Terriberry's avatar
      Convert [4][4] matrices to [16] arrays. · 8f75ea6b
      Timothy B. Terriberry authored
      Most of the code that actually uses these matrices indexes them as
       if they were a single contiguous array, and coverity produces
       reports about the resulting accesses that overflow the static
       bounds of the first row.
      This is perfectly legal in C, but converting them to actual [16]
       arrays should eliminate the report, and removes a good deal of
       extraneous indexing and address operators from the code.
      
      Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
      8f75ea6b
  28. 21 Oct, 2010 3 commits
    • Yunqing Wang's avatar
      Add MMWORD PTR/XMMWORD PTR in subtract_sse2.asm · 4cefb443
      Yunqing Wang authored
      Change-Id: Ia649b500ef020225d8bbf611799d0f47658dc2ac
      4cefb443
    • Fritz Koenig's avatar
      Remove stack shadowing for x86-64 · 15acc84f
      Fritz Koenig authored
      x86-64 passes most arguments in registers.  There is no need to
      push them to the stack before using them.
      
      Change-Id: I13c683f1358782682ecafaf1df3fb0af23b978ea
      15acc84f
    • Yunqing Wang's avatar
      Rewrite vp8_short_walsh4x4_sse2() · fc94ffce
      Yunqing Wang authored
      This rewriting reflects changes made in commit "Improve the
      accuracy of forward walsh-hadamard transform". Since this function
      is not called much, only a small encoder performance gain (~0.5% )
      is seen.
      
      Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
      fc94ffce
  29. 18 Oct, 2010 1 commit
    • Yunqing Wang's avatar
      Add SSE2 subtract functions · 4db20765
      Yunqing Wang authored
      Instead of doing 8-bit data unpack and 16-bit subtraction, use
      psubb to do 16 8-bit subtractions and pcmpgtb to preserve the
      sign information. This does not bring noticable gain since
      these functions are not called frequently.
      
      Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
      4db20765
  30. 14 Oct, 2010 1 commit
  31. 13 Oct, 2010 1 commit
  32. 12 Oct, 2010 1 commit