1. 22 Jul, 2011 1 commit
    • Yunqing Wang's avatar
      Preload reference area to an intermediate buffer in sub-pixel motion search · 20bd1446
      Yunqing Wang authored
      In sub-pixel motion search, the search range is small(+/- 3 pixels).
      Preload whole search area from reference buffer into a 32-byte
      aligned buffer. Then in search, load reference data from this buffer
      instead. This keeps data in cache, and reduces the crossing cache-
      line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
      showed encoder speed improvement:
        3.4%   at --rt --cpu-used =-4
        2.8%   at --rt --cpu-used =-3
        2.3%   at --rt --cpu-used =-2
        2.2%   at --rt --cpu-used =-1
      
      Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
      Test on Xeon machine also showed less improvement, since unaligned
      data access latency is greatly reduced in newer cores.
      
      Next, I will apply similar idea to other 2 sub-pixel search functions
      for encoding speed > 4.
      
      Make this change exclusively for x86 platforms.
      
      Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
      20bd1446
  2. 08 Jul, 2011 1 commit
  3. 30 Jun, 2011 1 commit
    • Yunqing Wang's avatar
      Bug fix in motion vector limit calculation · b7480454
      Yunqing Wang authored
      Motion vector limits are calculated using right shifts, which
      could give wrong results for negative numbers. James Berry's
      test on one clip showed encoder produced some artifacts. This
      change fixed that.
      
      Change-Id: I035fc02280b10455b7f6eb388f7c2e33b796b018
      b7480454
  4. 17 Jun, 2011 1 commit
    • Yunqing Wang's avatar
      Remove unnecessary bounds checking in motion search · 2cd1c285
      Yunqing Wang authored
      The starting points are always within the limits, and bounds
      checking on these points is not needed. For speed < 5, the
      encoded result changes a little because different treatment
      is taken while starting point equals the bounds.
      
      Change-Id: I09a402d310f51e305a3519f1601b1d17b05c6152
      2cd1c285
  5. 06 Jun, 2011 1 commit
  6. 01 Jun, 2011 2 commits
    • Yaowu Xu's avatar
      further clean up of errorperbit and sadperbit · 5b2fb329
      Yaowu Xu authored
      this commit makes the usage errorperbit and sadperbit consistent for
      encoding modes and passes. Removed all different magic weight factors
      associated with errorperbit. Now 1/2 is used for both sadperbit16 and
      sadperbit4, the /2 operation is merged into initializations of the 2
      variables.
      
      Tests on cif set show .23%, 0.18% and 0.19% gain by avg psnr, overall
      psnr and ssim respectively.
      
      Change-Id: Ifa285c3e065ce0a5a77addfc9f95aabf54ee270d
      5b2fb329
    • Yaowu Xu's avatar
      remove some magic weights associated with sad_per_bit · 50916c6a
      Yaowu Xu authored
      sad_per_bit has been used for a number of motion vector search routines
      with different magic weights: 1, 1/2 and 1/4. This commit remove these
      magic numbers and use 1/2 for all motion search routines, also reformat
      a number of source code lines to within 80 column limit.
      
      Test on cif set shows overall effect is neutral on all metrics. <=0.01%
      
      Change-Id: I8a382821fa4cffc9c0acf8e8431435a03df74885
      50916c6a
  7. 25 May, 2011 1 commit
    • Yaowu Xu's avatar
      fix the mix use of errorperbit and sadperbit · d8c525b8
      Yaowu Xu authored
      error_per_bit and sad_per_bit were designed as estimates of a bit worth
      of sum squared error and sum absolute difference respectively. Under
      this assumption, error_per_bit should be used in combination with 2nd
      order errors (variance or sum squared error) while sad_per_bit should
      be used in combination with 1st order SADs in motion estimation. There
      were a few places where sad_per_bit has been misused with variances,
      this commit changes to use error_per_bit for those places, also changes
      parameter names to properly indicate which constant is being used.
      
      On cif set, the change has a universal gain by all metrics: 0.13% by
      average/overall psnr and 0.1% by ssim.
      
      Change-Id: I4850fdcc3fd6886b30f784bd843f13dd401215fb
      d8c525b8
  8. 23 May, 2011 1 commit
  9. 12 May, 2011 3 commits
    • Scott LaVarnway's avatar
      Removed mv_bits_sadcost · 71a7501b
      Scott LaVarnway authored
      This sad cost is being generated but never used.
      
      Change-Id: I562eebdcb792b743770954feca365b5b37491ecd
      71a7501b
    • Scott LaVarnway's avatar
      Using int_mv instead of MV · 6b25501b
      Scott LaVarnway authored
      The compiler produces better assembly when using int_mv
      for assignments.  The compiler shifts and ors the two 16bit
      values when assigning MV.
      
      Change-Id: I52ce4bc2bfbfaf3f1151204b2f21e1e0654f960f
      6b25501b
    • Yunqing Wang's avatar
      Modification and issue fix in full-pixel refining search · b4da1f83
      Yunqing Wang authored
      Further modification and wrong implementation fix which caused
      refining_search and refining_searchx4 result mismatching.
      
      Change-Id: I80cb3a44bf5824413fd50c972e383eebb75f9b6f
      b4da1f83
  10. 09 May, 2011 1 commit
    • Yunqing Wang's avatar
      Use diamond search to replace full search in full-pixel refining search · cb7b1fb1
      Yunqing Wang authored
      In NEWMV mode, currently, full search is used as the refining search
      after n-step search. By replacing it with an iterative diamond search
      of radius 1 largely reduced the computation complexity, but still
      maintained the same encoding quality since the refining search is
      done for every macroblock instead of only a small precentage of
      macroblocks while using full search.
      
      Tests on the test set showed a 3.4% encoding speed increase with none
      psnr & ssim loss.
      
      Change-Id: Ife907d7eb9544d15c34f17dc6e4cfd97cb743d41
      cb7b1fb1
  11. 03 May, 2011 1 commit
    • Yunqing Wang's avatar
      Modify HEX search · 04ec930a
      Yunqing Wang authored
      Changed 8-neighbor searching to 4-neighour searching, and continued
      searching until the center point is the best match.
      
      Test on test set showed 1.3% encoding speed improvement as well as
      0.1% PSNR and SSIM improvement at speed=-5 (rt mode).
      
      Will continue to improve it.
      
      Change-Id: If4993b1907dd742b906fd3f86fee77cc5932ee9a
      04ec930a
  12. 18 Apr, 2011 1 commit
    • Yunqing Wang's avatar
      Use sub-pixel search's SSE in mode selection · b8f0b599
      Yunqing Wang authored
      Passed SSE from sub-pixel search back to pick_inter_mode
      function, which is compared with the encode_breakout to
      see if we could skip evaluating the remaining modes.
      
      Change-Id: I4a86442834f0d1b880a19e21ea52d17d505f941d
      b8f0b599
  13. 14 Apr, 2011 1 commit
    • Yunqing Wang's avatar
      Reduce unnecessary distortion computation · 918fb548
      Yunqing Wang authored
      In vp8_pick_inter_mode(), for NEWMV mode, use the error result got
      from motion search as distortion. This helps performance in real-
      time mode.
      
      Change-Id: I398c4e46cc5381f7d874e748cf78827ef0e0860c
      918fb548
  14. 11 Apr, 2011 1 commit
  15. 06 Apr, 2011 1 commit
  16. 01 Apr, 2011 1 commit
    • Yunqing Wang's avatar
      Use full-pixel MV in mvsadcost calculation · 3d681581
      Yunqing Wang authored
      MV sad cost error is only used in full-pixel motion search,
      which only need full-pixel resolution instead of quarter-pixel
      resolution. This change reduced mvsadcost table size, and
      removed unneccessary pamameter passing since this table is
      constant once it is generated.
      
      Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
      3d681581
  17. 18 Mar, 2011 1 commit
    • John Koleszar's avatar
      Increase static linkage, remove unused functions · 429dc676
      John Koleszar authored
      A large number of functions were defined with external linkage, even
      though they were only used from within one file. This patch changes
      their linkage to static and removes the vp8_ prefix from their names,
      which should make it more obvious to the reader that the function is
      contained within the current translation unit. Functions that were
      not referenced were removed.
      
      These symbols were identified by:
      
        $ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
          | sort | grep '^ *1 '
      
      Change-Id: I59609f58ab65312012c047036ae1e0634f795779
      429dc676
  18. 11 Mar, 2011 1 commit
  19. 10 Feb, 2011 1 commit
    • Yunqing Wang's avatar
      Improve motion search in real-time mode · 41e6eceb
      Yunqing Wang authored
      Applied better MV prediction in real-time mode, which improves
      the encoding quality.
      
      Used quarter-pixel search instead of iterative sub-pixel search
      for speed >=5 to improve encoding performance.
      
      Tests on the test set showed:
      1. For speed=-5, quality improvement: 1.7% on AvgPSNR and 2.1%
      on SSIM, performance improvement: 3.6% (This counts in the
      performance lose caused by MV prediction calculation in "Improve
      MV prediction in vp8_pick_inter_mode() for speed>3").
      2. For speed=-8, quality improvement: 2.1% on AvgPSNR and 2.5%
      on SSIM. but, 6.9% performance decrease because of MV prediction
      calculation. This should be improved later.
      
      Change-Id: I349a96c452bd691081d8c8e3e54419e7f477bebd
      41e6eceb
  20. 18 Jan, 2011 1 commit
    • Attila Nagy's avatar
      Fix encoder real-time only configuration. · cb791aaa
      Attila Nagy authored
      Remove allocation/deallocation of stats storage.
      Remove full search functions in machine specific encoder inits.
      Remove last pass validation in  validate_config.
      
      Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
      cb791aaa
  21. 14 Dec, 2010 2 commits
    • Yunqing Wang's avatar
      Fix a bug in motion search code(2) · 08706a3e
      Yunqing Wang authored
      This fix added MV range checks for NEWMV mode as suggested by Jim.
      To reduce unnecessary MV range checks, I tried Yaowu's suggestion.
      Update UMV borders in NEWMV mode to also cover MV range check.
      Also, in this way, every MV that is valid gets checked in diamond
      search function.
      
      Change-Id: I95a89ce0daf6f178c454448f13d4249f19b30f3a
      08706a3e
    • Yunqing Wang's avatar
      Fix a bug in motion search code · 7fb0f868
      Yunqing Wang authored
      The MV's range is 256. Since the new motion search uses a different
      starting MV than the center ref MV, a MV range checking needs to
      be done to avoid corruption.
      
      Change-Id: I8ae0721d1bd203639e13891e2e54a2e87276f306
      7fb0f868
  22. 03 Dec, 2010 1 commit
    • Yunqing Wang's avatar
      Improve MV prediction accuracy to achieve performance gain · c3bbb291
      Yunqing Wang authored
      Add vp8_mv_pred() to better predict starting MV for NEWMV
      mode in vp8_rd_pick_inter_mode(). Set different search
      ranges according to MV prediction accuracy, which improves
      encoder performance without hurting the quality. Also,
      as Yaowu suggested, using diamond search result as full
      search starting point and therefore adjusting(reducing)
      full search range helps the performance.
      
      Change-Id: Ie4a3c8df87e697c1f4f6e2ddb693766bba1b77b6
      c3bbb291
  23. 27 Oct, 2010 2 commits
    • Yunqing Wang's avatar
      Full search SAD function optimization in SSE4.1 · 71ecb5d7
      Yunqing Wang authored
      Use mpsadbw, and calculate 8 sad at once. Function list:
      vp8_sad16x16x8_sse4
      vp8_sad16x8x8_sse4
      vp8_sad8x16x8_sse4
      vp8_sad8x8x8_sse4
      vp8_sad4x4x8_sse4
      
      (test clip: tulip)
      For best quality mode, this gave encoder a 5% performance boost.
      For good quality mode with speed=1, this gave encoder a 3%
      performance boost.
      
      Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
      71ecb5d7
    • John Koleszar's avatar
      Add half-pixel variance RTCD functions · 209d82ad
      John Koleszar authored
      NEON has optimized 16x16 half-pixel variance functions, but they
      were not part of the RTCD framework. Add these functions to RTCD,
      so that other platforms can make use of this optimization in the
      future and special-case ARM code can be removed.
      
      A number of functions were taking two variance functions as
      parameters. These functions were changed to take a single
      parameter, a pointer to a struct containing all the variance
      functions for that block size. This provides additional flexibility
      for calling additional variance functions (the half-pixel special
      case, for example) and by initializing the table for all block sizes,
      we don't have to construct this function pointer table for each
      macroblock.
      
      Change-Id: I78289ff36b2715f9a7aa04d5f6fbe3d23acdc29c
      209d82ad
  24. 26 Oct, 2010 2 commits
    • John Koleszar's avatar
      make arm hex search the generic implementation · 96cf6588
      John Koleszar authored
      The ARM version of vp8_hex_search() is a faster implementation
      of the same algorithm. Since it doesn't use any ARM specific
      code, it can be made the default implementation. This removes
      a linking error.
      
      Change-Id: I77d10f2c16b2515bff4522c350004e03b7659934
      96cf6588
    • John Koleszar's avatar
      arm: remove duplicate functions · d330a587
      John Koleszar authored
      These functions were true duplicates of functions present in the
      generic code. This fixes some of the link errors when building
      with --enable-shared --enable-pic.
      
      Change-Id: Idff26599d510d954e439207883607ad6b74df20c
      d330a587
  25. 14 Oct, 2010 1 commit
    • Yunqing Wang's avatar
      Improve bounds checking in vp8_diamond_search_sadx4() · d6da7b8e
      Yunqing Wang authored
      In order to know if all 4/8 neighbor points are within the bounds,
      4 bounds checking are enough instead of checking 4 bounds for
      each points (16/32 checkings). This improvement reduces cost of
      vp8_diamond_search_sadx4() by 30%, and gives encoder a 1.5%
      performance gain (test options: 1 pass, good, speed=4).
      
      Change-Id: Ie8da29d18a6ecfc9829e74ac02f6fa70e042331a
      d6da7b8e
  26. 09 Sep, 2010 1 commit
  27. 24 Jun, 2010 2 commits
    • Fritz Koenig's avatar
      vp8cx : bestsad declared and initialized incorrectly. · a5906668
      Fritz Koenig authored
      bestsad needs to be a int and set to INT_MAX because at the end
      of the function it is compared to INT_MAX to determine if there
      was a match in the function.
      
      Change-Id: Ie80e88e4c4bb4a1ff9446079b794d14d5a219788
      a5906668
    • Fritz Koenig's avatar
      vp8cx : bestsad declared and initialized incorrectly. · cecdd73d
      Fritz Koenig authored
      bestsad should be an int initialized to INT_MAX.  The optimized
      SAD function expects a signed value for bestsad to use for comparison
      and early loop termination.  When no match is made, which is
      determined by a comparison of bestsad to INT_MAX, INT_MAX is returned.
      cecdd73d
  28. 18 Jun, 2010 1 commit
    • John Koleszar's avatar
      cosmetics: trim trailing whitespace · 94c52e4d
      John Koleszar authored
      When the license headers were updated, they accidentally contained
      trailing whitespace, so unfortunately we have to touch all the files
      again.
      
      Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
      94c52e4d
  29. 04 Jun, 2010 1 commit
  30. 24 May, 2010 1 commit
  31. 18 May, 2010 1 commit