1. 07 Sep, 2013 - 1 commit
    • Jingning Han's avatar
      Fix overflow issue in 16x16 quantization SSSE3 · 09bc942b
      Jingning Han authored
      The 16x16 transform unit test suggested that the peak coefficient
      value can reach 32639. This could cause potential overflow issue
      in the SSSE3 implmentation of 16x16 block quantization. This commit
      fixes this issue by replacing addition with saturated addition.
      
      Change-Id: I6d5bb7c5faad4a927be53292324bd2728690717e
      09bc942b
  2. 06 Sep, 2013 - 2 commits
    • Deb Mukherjee's avatar
      Support a constant quality mode in VP9 · e378a89b
      Deb Mukherjee authored
      Adds a new end-usage option for constant quality encoding in vpx. This
      first version implemented for VP9, encodes all regular inter frames
      using the quality specified in the --cq-level= option, while encoding
      all key frames and golden/altref frames at a quality better than that.
      
      The current performance on derfraw300 is +0.910% up from bitrate control,
      but achieved without multiple recode loops per frame.
      
      The decision for qp for each altref/golden/key frame will be improved
      in subsequent patches based on better use of stats from the first pass.
      Further, the qp for regular inter frames may also be varied around the
      provided cq-level.
      
      Change-Id: I6c4a2a68563679d60e0616ebcb11698578615fb3
      e378a89b
    • Scott LaVarnway's avatar
      New mode_info_context storage · dae17734
      Scott LaVarnway authored
      mode_info_context was stored as a grid of MODE_INFO structs.
      The grid now constists of a pointer to a MODE_INFO struct and
      a "in the image" flag.  The MODE_INFO structs are now stored
      as a stream, eliminating unnecessary copies and is a little
      more cache friendly.
      
      For the test clips used, the decoder performance improved
      by ~4.3% (1080p) and ~9.7% (720p).
      
      Patch Set 2: Re-encoded clips with latest. Now ~1.7% (1080p)
      and 5.9% (720p).
      
      Change-Id: I846f29e88610fce2523ca697a9a9ef2a182e9256
      dae17734
  3. 05 Sep, 2013 - 1 commit
    • Jingning Han's avatar
      Use saturated addition in SSSE3 of 32x32 quant · 458c2833
      Jingning Han authored
      The 32x32 forward transform can potentially reach peak coefficient
      value close to 32700, while the rounding factor can go upto 610.
      This could cause overflow issue in the SSSE3 implementation of 32x32
      quantization process.
      
      This commit resolves this issue by replacing the addition operations
      with saturated addition operations in 32x32 block quantization.
      
      Change-Id: Id6b98996458e16c5b6241338ca113c332bef6e70
      458c2833
  4. 04 Sep, 2013 - 2 commits
    • Jim Bankoski's avatar
      make vp9 postproc a config option · 79401542
      Jim Bankoski authored
      Vp9 postproc is disabled for now as its not been shown to help and
      may be merged with vp8.
      
      Change-Id: I25620d6cd34c6e10331b18c7b5ef7482e39c6057
      79401542
    • Jim Bankoski's avatar
      faster accounting of inc_mv · 532179e8
      Jim Bankoski authored
      Moves counting of mv branches to where we have a new mv, instead of after
      the whole frame is summed.
      
      Change-Id: I945d9f6d9199ba2443fe816c92d5849340d17bbd
      532179e8
  5. 03 Sep, 2013 - 1 commit
    • Paul Wilkins's avatar
      Attempt to fix speed 4 · 49317cdd
      Paul Wilkins authored
      Speed 4 fixed partition size. Use fixed size unless it does not
      fit inside image, in which case use the largest size that does.
      
      Change-Id: I250f7a80506750dd82ab355721624a1344247223
      49317cdd
  6. 01 Sep, 2013 - 1 commit
    • Jingning Han's avatar
      Fix 32x32 forward transform SSE2 version · 3cf46fa5
      Jingning Han authored
      This commit fixed the potential overflow issue in the SSE2
      implementation of 32x32 forward DCT. It resolved the corrupted
      coded frames in the border of scenes.
      
      Change-Id: If87eef2d46209269f74ef27e7295b6707fbf56f9
      3cf46fa5
  7. 30 Aug, 2013 - 1 commit
  8. 29 Aug, 2013 - 5 commits
    • Paul Wilkins's avatar
      Added per pixel inter rd hit count stats · 1f4bf79d
      Paul Wilkins authored
      Added some code to output normalized rd hit count stats.
      In effect this approximates to the average number of rd
      operations/tests per pixel for the sequence.
      
      The results are not quite accurate and I have not bothered
      to account for partial SB64s at frame edges and for key frames
      However they do give some idea of the number of modes /
      prediction methods being tested for each pixel across the
      different partition sizes. This indicates how much scope their
      is for further gains either by reducing the number of partitions
      examined or the modes per partition through heuristics.
      
      Patch 3 moved place where count incremented so partial rd
      tests that are aborted with INT_MAX return are also counted.
      
      Example numbers for first 50 frames of Akiyo.
      Speed 0 ~84.4 rd operations / pixel
      Speed 1 ~28.8
      Speed 2 ~11.9
      
      Change-Id: Ib956e787e12f7fa8b12d3a1a2f6cda19a65a6cb8
      1f4bf79d
    • James Zern's avatar
      consistently name VP9_COMMON variables #3 · d765df27
      James Zern authored
      stragglers
      
      Change-Id: Ib1e853f9a331b7b66639dc34d79568d84d1930f1
      d765df27
    • James Zern's avatar
      consistently name VP9_COMMON variables #1 · 924d7451
      James Zern authored
      pc -> cm
      
      Change-Id: If3e83404f574316fdd3b9aace2487b64efdb66f3
      924d7451
    • Jingning Han's avatar
      Fix overflow issue in SSSE3 32x32 quantization · abff6788
      Jingning Han authored
      The 32x32 quantization process can potentially have the intermediate
      stacks over 16-bit range, thereby causing enc/dec mismatch. This commit
      fixes this overflow issue in the SSSE3 implementation, as well as the
      prototype, of 32x32 quantization.
      
      This fixes issue 607 from webm@googlecode.
      
      Change-Id: I85635e6ca236b90c3dcfc40d449215c7b9caa806
      abff6788
    • Yaowu Xu's avatar
      Fixed potential overflows · aaa7b444
      Yaowu Xu authored
      The two arrays are typically initialized to INT64_MAX, if they are not
      filled with valid values before the addition, the values can overflow
      and lead to wrong results.
      
      Change-Id: I515de22cf3e8f55af4b74bdb2c8eb821a02d3059
      aaa7b444
  9. 28 Aug, 2013 - 3 commits
    • Dmitry Kovalev's avatar
      General code cleanup. · b62ddd5f
      Dmitry Kovalev authored
      Switching from mi_{width, height}_log2 and b_{width, height}_log2 to
      num_8x8_blocks_{wide, high} and num_4x4_blocks_{wide, high}. Removing
      redundant code, adding const.
      
      Change-Id: Iaab2207590fd24d0b76999071778d1395dc5cd5d
      b62ddd5f
    • Deb Mukherjee's avatar
      Adds a speed feature for fast 1-loop forw updates · e02dc84c
      Deb Mukherjee authored
      Incorporates a speed feature for fast forward updates of
      coefficients. This feature takes 3 values:
      0 - use standard 2-loop version
      1 - use a 1-loop version
      2 - use a 1-loop version with reduced updates
      
      Results: derfraw300 +0.007% (on speed 0) at feature value = 1
                          -0.160% (on speed 0) at feature value = 2
      
      There is substantial speed up at speeds 2 and above for low
      resolution sequences where the entropy updates are a big part
      of the overall computations.
      
      Change-Id: Ie96fc50777088a5bd441288bca6111e43d03bcae
      e02dc84c
    • Dmitry Kovalev's avatar
      Renaming txfm_size to tx_size. · 851a2fd7
      Dmitry Kovalev authored
      Change-Id: I752e374867d459960995b24d197301d65ad535e3
      851a2fd7
  10. 27 Aug, 2013 - 4 commits
  11. 26 Aug, 2013 - 3 commits
  12. 24 Aug, 2013 - 2 commits
  13. 23 Aug, 2013 - 6 commits
    • Yaowu Xu's avatar
      Limit mv range to be based on partition size · 13930cf5
      Yaowu Xu authored
      Previous change c4048dbd limits the mv search range assuming max block
      size of 64x64, this commit change the search range using actual block
      size instead.
      
      Change-Id: Ibe07ab02b62bf64bd9f8675d2b997af20a2c7e11
      13930cf5
    • Dmitry Kovalev's avatar
      Cleanup in mvref_common.{h, c}. · 21d8e859
      Dmitry Kovalev authored
      Making code more compact, adding consts, removing redundant arguments,
      adding do/while(0) for macros.
      
      Change-Id: Ic9ec0bc58cee0910a5450b7fb8cfbf35fa9d0d16
      21d8e859
    • Yaowu Xu's avatar
      Added border extension · 656632b7
      Yaowu Xu authored
      To the source buffer to be encoded as an alt ref frame. This is to fix
      the problem of using uninitialized memory in encoder.
      
      See https://code.google.com/p/webm/issues/detail?id=605
      
      Change-Id: I97618a2fc207e08abcf5301b734aa9e3ad695e2c
      656632b7
    • Paul Wilkins's avatar
      Changes to adaptive inter rd thresholds. · aa5b67ad
      Paul Wilkins authored
      Values now carried over frame to frame.
      Change to algorithm for decreasing threshold after
      a hit and to max threshold (now based on speed)
      
      Removed some old commented out code relating to
      VP8 adaptive thresholds.
      
      The impact of these changes tested on Akiyo (50 frames)
      and measured in terms of unit rd hits is as follows:
      
      Speed 0 84.36 -> 84.67
      Speed 1 29.48 -> 22.22
      Speed 2 11.76 -> 8.21
      Speed 3 12.32 -> 7.21
      
      Encode speed impact is broadly in line with these.
      
      Change-Id: I5b886efee3077a11553fa950d796fd6d00c8cb19
      aa5b67ad
    • Paul Wilkins's avatar
      Limit Key frame Intra modes checks. · f76f52df
      Paul Wilkins authored
      Most of the focus so far has been on inter frames.
      
      At high speed settings the key frame is now taking a high %
      of the cycles.
      
      This patch puts in some masking to reduce the number
      of INTRA modes searched during key frame coding (as already
      happens for inter frames) at higher speed settings
      
      TODO: Develop this further with either adaptive rd thresholds
      when choosing which intra modes to consider or some other
      heuristic.
      
      Impact.
      At high speed settings on some clips the key frame was starting
      to dominate. In a coding of the first 50 frames of AKIYO at speed
      2 limiting the key frame intra modes to DC or TM_PRED resulted in
      ~30% overall speedup. For Bus the number was lower at ~4-5%.
      
      Change-Id: I7bde68aee04995f9d9beb13a1902143112e341e2
      f76f52df
    • Jingning Han's avatar
      Fix rectangular partition check flag · 84f3b76e
      Jingning Han authored
      Put rectangular partition check flag change according to the rd
      costs of NONE and SPLIT partition types under the speed feature.
      
      Change-Id: If681e1e078a8d43d86961ea4b748da5cd1b6c331
      84f3b76e
  14. 22 Aug, 2013 - 8 commits
    • Dmitry Kovalev's avatar
      vp9_encodeframe.c cleanup. · 604022d4
      Dmitry Kovalev authored
      Removing unused get_sbuv_perpixel_variance function, using has_second_ref/
      is_inter_block functions, organizing includes.
      
      Change-Id: I016de4af12fbbb8b4ece26a70759b2392651b095
      604022d4
    • Dmitry Kovalev's avatar
      check_bsize_coverage cleanup. · 335b1d36
      Dmitry Kovalev authored
      Change-Id: Ib7803857b35c00e317c9deb8630e777e25eb278f
      335b1d36
    • Dmitry Kovalev's avatar
      Checking scale factors on access. · 3c426572
      Dmitry Kovalev authored
      It is possible to have invalid scale factors and not access them
      during decoding. Error is reported if we really try to use invalid scale
      factors.
      
      Change-Id: Ie532d3ea7325ee0c7a6ada08269f804350c80fdf
      3c426572
    • James Zern's avatar
      rename LOG2_* defines to *_LOG2 · 40ae02c2
      James Zern authored
      gets rid of a mix of styles
      
      Change-Id: I3591d312157bc6f53a25438bf047765c671fd8a8
      40ae02c2
    • James Zern's avatar
      vp9/encoder: fix last_frame_seg_map mem leak · a5726ac4
      James Zern authored
      remove duplicate allocation from vp9_create_compressor, it was added to
      vp9_alloc_frame_buffers in:
      
      d5bec522 Added resizing & initialization of last frame segment map
      
      Change-Id: I996723226a16a62aff8f9a52ac74e0b73cc98fdf
      a5726ac4
    • Dmitry Kovalev's avatar
      Adding vp9_is_scaled function. · 640dea4d
      Dmitry Kovalev authored
      Change-Id: Ieb7077ca3586b9491912027eed450a4f6fd38d30
      640dea4d
    • Jingning Han's avatar
      Refactor rd_pick_partition for parameter control · 01a37177
      Jingning Han authored
      This commit changes the partition search order of superblocks from
      {SPLIT, NONE, HORZ, VERT} to {NONE, SPLIT, HORZ, VERT} for
      consistency with that of sub8x8 partition search. It enable the use
      of early termination in partition search for all block sizes.
      
      For ped_area_1080p 50 frames coded at 4000 kbps, it makes the runtime
      goes down from 844305ms -> 818003ms (3% speed-up) at speed 0.
      
      This will further move towards making the in-search partition types
      configurable, hence unifying various speed-up approaches.
      
      Some speed 1 and 2 features are turned off during the refactoring
      process, including:
      disable_split_var_thresh
      using_small_partition_info
      
      Stricter constraints are applied to use_square_partition_only for
      right/bottom boundary blocks. Will bring back/refine these features
      subsequently. At this point, it makes derf set at speed 1 about
      0.45% higher in compression performance, and 9% down in run-time.
      
      Change-Id: I3db9f9d1d1a0d6cbe2e50e49bd9eda1cf705f37c
      01a37177
    • Deb Mukherjee's avatar
      Fixes on feature disabling split based on variance · 8b810c7a
      Deb Mukherjee authored
      Adds a couple of minor fixes, which may be absorbed in Jingning's
      patch. Thanks to Guillaume for pointing these out.
      Also adjusts the thresholds for speed 1 and 2 to 16 and 32
      respectively, to keep quality drops small.
      
      Results:
      --------
      derfraw300:  threshold = 16, psnr -0.082%, speedup 2-3%
                   threshold = 32, psnr -0.218%, speedup 5-6%
      stdhdraw250: threshold = 16, psnr -0.031%, speedup 2-3%
                   threshold = 32, psnr -0.273%, speedup 5-6%
      
      Change-Id: I4b11ae8296cca6c2a9f644be7e40de7c423b8330
      8b810c7a