1. 29 Nov, 2017 1 commit
  2. 27 Nov, 2017 1 commit
    • Marco's avatar
      vp9-svc: Fix to the layer buffer settings. · cbe62b9c
      Marco authored
      For the case when the number of temporal layers > 1,
      the buffer levels (starting/optimal_buffer_level,
      and maximum_buffer_size) were not scaled properly.
      
      In vp9_update_layer_context_change_config():
      when setting the layer-buffer levels, fix is to scale
      the layer-target_bandwidth by the target_bandwidth
      (which is the full stream bandwidth) instead of the
      spatial_layer_target.
      
      This is needed because prior to the call
      vp9_update_layer_context_change_config(), set_rc_buffer_sizes()
      is called which sets the buffer levels based on target bandwidth
      (which is the full bandwidth for the SVC stream).
      
      This fix properly sets the layer-buffer levels based on the
      layer-bandwidth, and leads to better rate targeting.
      
      Small/neutral change in avgPSNR/SSIM metrics on RTC set.
      
      Change-Id: Ic0f4f7f3487c37b9a9adb4781ae5edfed7140a57
      cbe62b9c
  3. 21 Nov, 2017 1 commit
  4. 17 Nov, 2017 1 commit
    • Marco's avatar
      vp9-svc: Enbale scale partition reference frames. · 559166ac
      Marco authored
      For reference frames: enable scale partition for
      superblocks with low source sad or if bsize on lower-resoln
      is at least 32x32.
      
      Keep feature disabled for base temporal layer.
      
      Small regression in avgPNSR/SSIM metrics, ~0.5-1%.
      Speedup ~2-3% on mac for SVC (3 spatial/3 temporal layers) at speed 7.
      
      Change-Id: I5987eb7763845b680059128b538bb5188be0cca5
      559166ac
  5. 16 Nov, 2017 2 commits
    • paulwilkins's avatar
      Disable allow_partition_search_skip for speed 2. · 44473e7e
      paulwilkins authored
      When allow_partition_search_skip  is set the two pass code
      can optionally skip the partition search in the rd loop if the image
      appears static (based on selection of 0,0 motion).
      
      Unfortunately 0,0 motion does not necessarily mean that there are
      no meaningful changes or that motion or intra modes will not be selected
      in the second pass.
      
      Disabling "allow_partition_search_skip" may hurt the encode speed a little
      for a small number of clips but can have a big impact on compression.
      The most notable example of this in our test sets is "bridge_close_cif"
      where this change gives a gains of 18%, 12% and 16% in opsnr, ssim and
      psnr-hvs.
      
      Change-Id: I765e288b5c0cd82bce00a148e7653a21e9203024
      44473e7e
    • Jerome Jiang's avatar
      vp9 svc: Rework/fix scale partitioning on boundary. · 1aea1675
      Jerome Jiang authored
      Enable partition copy on boundary and scale blocks along the boundary.
      Rename copy_partition_svc to scale_partition_svc.
      
      Do not copy if the block crosses the boundary.
      
      Change-Id: I37a04d48f11b15c4ea67facd7631193ec2f62150
      1aea1675
  6. 15 Nov, 2017 4 commits
    • paulwilkins's avatar
      Code cleanup. · 05302360
      paulwilkins authored
      Removal of parameters to and code in calc_frame_boost() that is no
      longer required.
      
      No change to results from previous patch.
      
      Change-Id: Ic92da35613fdc247d22fddf24d09679fc5329017
      05302360
    • paulwilkins's avatar
      Remove decay_accumulator clause from alt ref breakout. · 03c1a827
      paulwilkins authored
      The decay accumulator clause covers similar ground to the
      new clause that tests the accumulated second reference error
      so it has been removed to reduce complexity.
      
      Change-Id: I4ec1cce32d72bd4ee463ad7def2831a68447d525
      03c1a827
    • paulwilkins's avatar
      Add clause to alt ref group breakout. · 607e45f4
      paulwilkins authored
      Add a clause to the breakout test for alt ref groups that
      examines the size of the accumulated second reference
      frame error compared to the cost of intra coding.
      
      This clause causes a reduction in the average group length for many
      clips. Alongside the change to the group length the minimum
      boost is increased.
      
      On balance the results are positive for psnr and psnr-hvs
      but is negative for ssim/fast ssim for the smaller image formats.
      
      Strong gains on some harder clips (eg ducks take off (midres) ~20%,
      husky (lowres) 6-17%. Most of the negative cases are lower motion
      clips. Subsequent patch hopefully will help with those.
      
      Change-Id: Ic1f5dbb9153d5089e58b1540470e799f91a65dc4
      607e45f4
    • Marco's avatar
      vp9-svc: Fix flag for usage of reuse-lowres partition · b3c93d60
      Marco authored
      Fix/cleaup the conditioning for usage of the reuse-lowres
      partition feature.
      
      Replace the non-reference condition with the top temporal
      layer, and put this condition in the speed feature.
      
      This prevents doing update_partition_svc() on every
      VGA frame, instead it will now only do update for VGA in
      the top temporal layer frames.
      
      Also this makes it easier to test/enable this feature
      for lower layer temporal frames.
      
      Change-Id: Ia897afbc6fe5c84c5693e310bcaa6a87ce017be5
      b3c93d60
  7. 13 Nov, 2017 2 commits
    • paulwilkins's avatar
      New content type to improve grain retention. · a73cee28
      paulwilkins authored
      For new VP9 only content type adjust  the rate distortion and ARF
      filter based on the relative spatial variance of the source and
      reconstruction.
      
      In regards to the RD loop the method favors modes where the
      reconstruction variance is similar to the source variance. However it
      is currently only applied to regions where the source variance is quite
      low.
      
      For very low variance blocks it applies a further bias against intra
      coding and large prediction block sizes (the later in particular limit
      the usefulness of the loop filter).
      
      The final part of this change is to lower the strength of the ARF
      filter for blocks where the source has very low spatial variance, to
      encourage some low amplitude texture or noise to pass through
      the filter.
      
      This change improves the retention of film grain and fine noise /
      texture in spatially flat regions, but as expected causes a significant
      drop in PSNR on many clips. This is to be expected because similar
      but misaligned noise or texture will give a lower PSNR than a flat
      noise free reconstruction. However, it is worth noting that most clips
      show a strong gain in FAST SSIM.
      
      The features are enabled on the vpxenc command line by setting
      --tune-content=film.
      
      VPX_ENCODER_ABI_VERSION bumped for this change and cvbr.
      
      Change-Id: I26a4e4edfa3dc5cacead82fa701fe7a9118ccd0a
      a73cee28
    • paulwilkins's avatar
      Small parameter clean up. · 55fc4d95
      paulwilkins authored
      Removed three parameters that are no longer needed in calls
      to calc_arf_boost() and associated minor changes.
      
      No impact on encode results.
      
      Change-Id: Ieaf31d0d2e1990b99cf69647170145a1bbfbb9fb
      55fc4d95
  8. 10 Nov, 2017 1 commit
    • Marco's avatar
      vp9-svc: Avoid minmax variance for non-reference frames. · 6c0011a2
      Marco authored
      For choose_partitioning (speed >= 6): avoid computation
      of minmax variance for non-reference frames in SVC.
      
      Existing condition only avoided this for speed >= 8.
      Combine that existing logic with non-reference condition.
      
      Small speedup (~0.5-1%) for 3 layer SVC,
      neutral change on avgPSNR/SSIM metrics.
      
      Change-Id: I3e9f3a1af0647b15e475cf170d9402908d672ee5
      6c0011a2
  9. 09 Nov, 2017 3 commits
    • Jerome Jiang's avatar
      vp9: SVC feature to use partition from lower resolution. · fdb054a0
      Jerome Jiang authored
      For SVC with 3 spatial layers:
      Add feature to copy/upscale partition from middle spatial layer
      to the upper/highest resolution, when superblock sad is not high.
      
      Enabled for speed >= 7 and only for non-reference frames.
      
      Speedup ~3-4%, small loss in avgPNSR/SSIM of ~1%.
      
      Change-Id: I7f0a2716c0fde28bade0f86159d11b7e31d6ab8d
      fdb054a0
    • Scott LaVarnway's avatar
      vpx: [x86] add vp9_block_error_fp_avx2() · 62ab5e99
      Scott LaVarnway authored
      SSE2 asm vs AVX2 intrinsics speed gains:
      blocksize   16: ~1.00
      blocksize   64: ~1.17
      blocksize  256: ~1.67
      blocksize 1024: ~1.81
      
      Change-Id: I2a86db239cf57e3ff617890ccb2d236aba83ad5e
      62ab5e99
    • paulwilkins's avatar
      Fix to frames considered in arf boost calculation. · d6e29868
      paulwilkins authored
      For a chosen interval "i" the existing arf boost calculation examined frames
      +/- (i-1) frames from the current location in the second pass.
      
      This change checks to make sure that the forward search does not extend
      beyond the next key frame in the event that the distance to the next key
      frame is < (i - 1).
      
      Small metrics gains on all our  test sets but these are localized to a few clips
      (e.g. midres set psnr-hvs sintel -2.59% but overall average was only -0.185%)
      
      Change-Id: I26fc9ce582b6d58fa1113a238395e12ad3123cf6
      d6e29868
  10. 08 Nov, 2017 1 commit
    • paulwilkins's avatar
      CVBR command line option. · 93e83fd7
      paulwilkins authored
      Added command line control of Corpus VBR.
      
      The new corpus vbr mode is a variant of standard
      VBR (end-usage=0) where the complexity distribution
      mid point is passed in rather than calculated for a specific
      clip or chunk.
      
      The new variant is enabled by setting a new command line
      parameter --corpus-complexity to a zero value. Omitting
      this parameter or setting it to 0 will cause the codec to use
      standard vbr mode.
      
      The correct value for a given corpus needs to be derived
      experimentally using a training set such that the average
      rate for the corpus is close to the target value.
      
      For example our using our low res test set with upper and lower
      vbr limits of 50%-150% and a corpus complexity value of 650
      gives a similar average data rate across the set to using standard
      vbr. However, with the corpus mode easier clips will be allocated
      fewer bits and harder clips more bits rather than having the same
      rate target for all.
      
      Change-Id: I03f0fc8c6fb0ee32dc03720fea6a3f1949118589
      93e83fd7
  11. 06 Nov, 2017 1 commit
    • Marco's avatar
      Nonrd_pickmode: avoid computing UV cost when early_term is set. · 6fbc354c
      Marco authored
      For nonrd_pickmode: if early_term is set there should be
      no need to include UV in rdcost (when color_sensitivity is set).
      
      Neutral change on RTC and RTC_derf metrics, for speed >= 5.
      No change for ytlive metrics.
      
      Very small speed gain (~0.5%) on some clips with strong color content.
      
      Change-Id: Ifc00928ecd935fc71e94935ceef0ae7481249f07
      6fbc354c
  12. 03 Nov, 2017 1 commit
    • Marco's avatar
      Compound prediction mode for nonrd pickmode. · eb7d431c
      Marco authored
      Allow for compound prediction mode in nonrd_pickmode for ZEROMV.
      For real-time encoding, 1 pass with non-zero lag-in-frames.
      
      Added speed feature to control the feature.
      Enabled for speed >=6 for now, under VBR mode.
      
      avgPSNR/SSIM metrics positive on ytlive set, for speed 6:
      some clips up by ~3-5%, some clips neutral gain, average gain
      across clips is ~1%.
      
      Small/negligible decrease in speed.
      
      Change-Id: I7a60c7596e69b9a928410c5ee2f9141eecd8613d
      eb7d431c
  13. 01 Nov, 2017 1 commit
  14. 31 Oct, 2017 1 commit
  15. 30 Oct, 2017 1 commit
    • Jerome Jiang's avatar
      vp9: Reduce stack usage of choose_partioning. · cc472311
      Jerome Jiang authored
      Change type of sum_square_error from int64_t to uint32_t.
      Change type of sum_error from int64_t to int32_t.
      
      This reduces the stack usage from ~131K to ~87K.
      
      BUG=b/68362457
      
      Change-Id: I147d7c7b226bceb4f0817bb86848e1fa9d9ac149
      cc472311
  16. 23 Oct, 2017 1 commit
    • Marco's avatar
      vp9-svc: Allow for adapt_rd_thresh with row-mt. · 0738d901
      Marco authored
      Set adaptive_row_thresh_mt = 1 at speed >= 7,
      for svc when multi-threading is used with row-mt.
      This allow the adaptive_rd_thresh feature to be used
      in the nonrd-pickmode.
      
      ~1-2% speedup for SVC encoding with small quality
      loss (< 0.6%) on RTC set.
      
      Change-Id: Iab9878dff117bccdaef3e4d0645165db9808cdfc
      0738d901
  17. 16 Oct, 2017 1 commit
    • Linfeng Zhang's avatar
      Add 4 to 3 scaling SSSE3 optimization · 580d3224
      Linfeng Zhang authored
      Note this change will trigger the different C version on SSSE3 and
      generate different scaled output.
      
      Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().
      
      Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
      580d3224
  18. 13 Oct, 2017 2 commits
    • Marco's avatar
      Adjust threshold in gf_boost for 1 pass vbr · a9248457
      Marco authored
      Small inncrease the sad_thresh1, avoids some false
      detection of possible scene changes within lag.
      
      Small improvement in few clips on ytlive, otherwise neutral change.
      
      Change-Id: Ia79b53bb657bbce65a7aac7d20666b6373d5af8b
      a9248457
    • paulwilkins's avatar
      Corpus VBR tweak for undershoot. · 8842ee0b
      paulwilkins authored
      In cases of strong undershoot adjust Q range down faster.
      
      Change-Id: I84982beceb3c9b6dc50e52e4a6e891c7dd395d03
      8842ee0b
  19. 12 Oct, 2017 3 commits
  20. 11 Oct, 2017 1 commit
    • paulwilkins's avatar
      Prevent double application of min rate in two pass. · 416b7051
      paulwilkins authored
      The initial allocation of bits in the two pass code to each frame
      should be within the min max limits on the command line. However,
      when forming an ARF group the cost of the ARF is shared by frames
      in that group such that the residual bits for a frame could drop below
      the min value. This change prevents the minimum being re-applied
      after the cost of the ARF has been deducted as this may otherwise
      cause low rate sections to overshoot their target.
      
      Test runs comparing to a baseline run with min and max section pct
      0-2000% vs one closer to the YT use case (50-150%) suggest that
      this fix not only results in better rate control but also gives a better
      rd outcome.
      
      For example the HD set vs 0-2000% baseline (opsnr, ssim).
      Old code (50-150):  +0.751, +1.099
      New code(50-150): +0.241, -0.009
      
      Change-Id: I715da7b130bf53ba8aa609532aa9e18b84f5e2ef
      416b7051
  21. 10 Oct, 2017 4 commits
    • Linfeng Zhang's avatar
      Add 4 to 1 scaling x86 optimization · 16166bfd
      Linfeng Zhang authored
      Change-Id: I51c190f0a88685867df36912522e67bdae58a673
      16166bfd
    • Marco's avatar
      Adjustment to scene detection and key frame. · 017257a3
      Marco authored
      For 1 pass vbr: use higher threshold on avg_sad
      and force key frame under scene cut detection if
      above the threshold. Allow it for speed >= 6 for now,
      since it does not use the full nonrd_pickmode partition
      (as in speed 5).
      
      Improves quality somewhat on scene cut frames.
      Neutral on overall metrics and fps for speed 6 on
      ytlive set.
      
      Change-Id: I12626f7627419ca14f9d0d249df86c7104438162
      017257a3
    • paulwilkins's avatar
      Further Corpus VBR change. · 06d231c9
      paulwilkins authored
      Change to the bit allocation within a GF/ARF group.
      
      Normal VBR and CQ mode allocate bits to a GF/ARF group based of the mean
      complexity score of the frames in that group but then share bits evenly between
      the "normal" frames in that group regardless of the individual frame complexity
      scores (with the exception of the middle and last frames).
      
      This patch alters the behavior for the experimental "Corpus VBR" mode such that
      the allocation is always based on the individual complexity scores.
      
      Change-Id: I5045a143eadeb452302886cc5ccffd0906b75708
      06d231c9
    • paulwilkins's avatar
      Corpus Wide VBR test implementation. · 741bd6df
      paulwilkins authored
      This patch makes further changes to support an experimental
      corpus wide VBR mode that uses a corpus complexity
      number as the midpoint of the distribution used to allocate bits
      within a clip, rather than some average error score derived from the
      clip itself.
      
      At the moment the midpoint number is hard wired for testing and
      the mode is enabled or disabled through a #ifdef.  Ultimately this
      would need to be controlled by command line parameters.
      
      Change-Id: I9383b76ac9fc646eb35a5d2c5b7d8bc645bfa873
      741bd6df
  22. 09 Oct, 2017 1 commit
  23. 06 Oct, 2017 3 commits
    • Marco Paniconi's avatar
      Revert "Speed >=5 real-time: add TM intra mode for high_source_sad." · bcbc6ed8
      Marco Paniconi authored
      This reverts commit 9311ef18.
      
      Reason for revert:
      Notice small regression in some clips.
      Will revisit in another change.
      
      Original change's description:
      > Speed >=5 real-time: add TM intra mode for high_source_sad.
      > 
      > Small/neutral change in metrics or speed for ytlive.
      > Some improvement in quality on frames with big content change.
      > 
      > Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
      
      TBR=marpan@google.com,builds@webmproject.org,jianj@google.com
      
      Change-Id: I9d8ec5195bb05ddf329d325699355185affb9b13
      No-Presubmit: true
      No-Tree-Checks: true
      No-Try: true
      bcbc6ed8
    • Marco's avatar
      Adjust threshold in scene detection · e405eb06
      Marco authored
      For 1 pass vbr: increase min_thresh slightly, and also add
      condition on golden/arf update for using full nonrd_pick_partition.
      
      Reduces possible false detection for scene cut detection.
      
      Neutral/small change in metrics or speed for speed 5.
      
      Change-Id: I388f4d9a56e3cc763e0148338c1bc0381e58ad76
      e405eb06
    • Marco's avatar
      Speed >=5 real-time: add TM intra mode for high_source_sad. · 9311ef18
      Marco authored
      Small/neutral change in metrics or speed for ytlive.
      Some improvement in quality on frames with big content change.
      
      Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
      9311ef18
  24. 05 Oct, 2017 1 commit
    • Marco's avatar
      Adjust threshold for adapt_partition for speed 6. · 18262a85
      Marco authored
      Lower SAD threshold to select non_rd pickmode partition
      at superblock level more often.
      Small gain in metrics, small/negligible decrease in speed.
      
      Change-Id: I0f728236b91a604e4ca7e02039adc54d5985c4dc
      18262a85
  25. 04 Oct, 2017 1 commit
    • Marco's avatar
      Avoid nonrd_pick_partition for speed >= 6. · 4bc1fc58
      Marco authored
      For 1 pass vbr speed >= 6: when REFERENCE_PARTITION is selected,
      avoid doing the full nonrd_pickmode based partition.
      No change in overall metrics or speed.
      Reduces encode times on scene cuts by 10-20%.
      
      Change-Id: I0310b1610cc1c83793a509e0a9059840e8f18308
      4bc1fc58