1. 16 Nov, 2017 1 commit
    • paulwilkins's avatar
      Disable allow_partition_search_skip for speed 2. · 44473e7e
      paulwilkins authored
      When allow_partition_search_skip  is set the two pass code
      can optionally skip the partition search in the rd loop if the image
      appears static (based on selection of 0,0 motion).
      Unfortunately 0,0 motion does not necessarily mean that there are
      no meaningful changes or that motion or intra modes will not be selected
      in the second pass.
      Disabling "allow_partition_search_skip" may hurt the encode speed a little
      for a small number of clips but can have a big impact on compression.
      The most notable example of this in our test sets is "bridge_close_cif"
      where this change gives a gains of 18%, 12% and 16% in opsnr, ssim and
      Change-Id: I765e288b5c0cd82bce00a148e7653a21e9203024
  2. 15 Nov, 2017 3 commits
    • paulwilkins's avatar
      Code cleanup. · 05302360
      paulwilkins authored
      Removal of parameters to and code in calc_frame_boost() that is no
      longer required.
      No change to results from previous patch.
      Change-Id: Ic92da35613fdc247d22fddf24d09679fc5329017
    • paulwilkins's avatar
      Remove decay_accumulator clause from alt ref breakout. · 03c1a827
      paulwilkins authored
      The decay accumulator clause covers similar ground to the
      new clause that tests the accumulated second reference error
      so it has been removed to reduce complexity.
      Change-Id: I4ec1cce32d72bd4ee463ad7def2831a68447d525
    • paulwilkins's avatar
      Add clause to alt ref group breakout. · 607e45f4
      paulwilkins authored
      Add a clause to the breakout test for alt ref groups that
      examines the size of the accumulated second reference
      frame error compared to the cost of intra coding.
      This clause causes a reduction in the average group length for many
      clips. Alongside the change to the group length the minimum
      boost is increased.
      On balance the results are positive for psnr and psnr-hvs
      but is negative for ssim/fast ssim for the smaller image formats.
      Strong gains on some harder clips (eg ducks take off (midres) ~20%,
      husky (lowres) 6-17%. Most of the negative cases are lower motion
      clips. Subsequent patch hopefully will help with those.
      Change-Id: Ic1f5dbb9153d5089e58b1540470e799f91a65dc4
  3. 13 Nov, 2017 2 commits
    • paulwilkins's avatar
      New content type to improve grain retention. · a73cee28
      paulwilkins authored
      For new VP9 only content type adjust  the rate distortion and ARF
      filter based on the relative spatial variance of the source and
      In regards to the RD loop the method favors modes where the
      reconstruction variance is similar to the source variance. However it
      is currently only applied to regions where the source variance is quite
      For very low variance blocks it applies a further bias against intra
      coding and large prediction block sizes (the later in particular limit
      the usefulness of the loop filter).
      The final part of this change is to lower the strength of the ARF
      filter for blocks where the source has very low spatial variance, to
      encourage some low amplitude texture or noise to pass through
      the filter.
      This change improves the retention of film grain and fine noise /
      texture in spatially flat regions, but as expected causes a significant
      drop in PSNR on many clips. This is to be expected because similar
      but misaligned noise or texture will give a lower PSNR than a flat
      noise free reconstruction. However, it is worth noting that most clips
      show a strong gain in FAST SSIM.
      The features are enabled on the vpxenc command line by setting
      VPX_ENCODER_ABI_VERSION bumped for this change and cvbr.
      Change-Id: I26a4e4edfa3dc5cacead82fa701fe7a9118ccd0a
    • paulwilkins's avatar
      Small parameter clean up. · 55fc4d95
      paulwilkins authored
      Removed three parameters that are no longer needed in calls
      to calc_arf_boost() and associated minor changes.
      No impact on encode results.
      Change-Id: Ieaf31d0d2e1990b99cf69647170145a1bbfbb9fb
  4. 10 Nov, 2017 1 commit
    • Marco's avatar
      vp9-svc: Avoid minmax variance for non-reference frames. · 6c0011a2
      Marco authored
      For choose_partitioning (speed >= 6): avoid computation
      of minmax variance for non-reference frames in SVC.
      Existing condition only avoided this for speed >= 8.
      Combine that existing logic with non-reference condition.
      Small speedup (~0.5-1%) for 3 layer SVC,
      neutral change on avgPSNR/SSIM metrics.
      Change-Id: I3e9f3a1af0647b15e475cf170d9402908d672ee5
  5. 09 Nov, 2017 3 commits
    • Jerome Jiang's avatar
      vp9: SVC feature to use partition from lower resolution. · fdb054a0
      Jerome Jiang authored
      For SVC with 3 spatial layers:
      Add feature to copy/upscale partition from middle spatial layer
      to the upper/highest resolution, when superblock sad is not high.
      Enabled for speed >= 7 and only for non-reference frames.
      Speedup ~3-4%, small loss in avgPNSR/SSIM of ~1%.
      Change-Id: I7f0a2716c0fde28bade0f86159d11b7e31d6ab8d
    • Scott LaVarnway's avatar
      vpx: [x86] add vp9_block_error_fp_avx2() · 62ab5e99
      Scott LaVarnway authored
      SSE2 asm vs AVX2 intrinsics speed gains:
      blocksize   16: ~1.00
      blocksize   64: ~1.17
      blocksize  256: ~1.67
      blocksize 1024: ~1.81
      Change-Id: I2a86db239cf57e3ff617890ccb2d236aba83ad5e
    • paulwilkins's avatar
      Fix to frames considered in arf boost calculation. · d6e29868
      paulwilkins authored
      For a chosen interval "i" the existing arf boost calculation examined frames
      +/- (i-1) frames from the current location in the second pass.
      This change checks to make sure that the forward search does not extend
      beyond the next key frame in the event that the distance to the next key
      frame is < (i - 1).
      Small metrics gains on all our  test sets but these are localized to a few clips
      (e.g. midres set psnr-hvs sintel -2.59% but overall average was only -0.185%)
      Change-Id: I26fc9ce582b6d58fa1113a238395e12ad3123cf6
  6. 08 Nov, 2017 1 commit
    • paulwilkins's avatar
      CVBR command line option. · 93e83fd7
      paulwilkins authored
      Added command line control of Corpus VBR.
      The new corpus vbr mode is a variant of standard
      VBR (end-usage=0) where the complexity distribution
      mid point is passed in rather than calculated for a specific
      clip or chunk.
      The new variant is enabled by setting a new command line
      parameter --corpus-complexity to a zero value. Omitting
      this parameter or setting it to 0 will cause the codec to use
      standard vbr mode.
      The correct value for a given corpus needs to be derived
      experimentally using a training set such that the average
      rate for the corpus is close to the target value.
      For example our using our low res test set with upper and lower
      vbr limits of 50%-150% and a corpus complexity value of 650
      gives a similar average data rate across the set to using standard
      vbr. However, with the corpus mode easier clips will be allocated
      fewer bits and harder clips more bits rather than having the same
      rate target for all.
      Change-Id: I03f0fc8c6fb0ee32dc03720fea6a3f1949118589
  7. 06 Nov, 2017 1 commit
    • Marco's avatar
      Nonrd_pickmode: avoid computing UV cost when early_term is set. · 6fbc354c
      Marco authored
      For nonrd_pickmode: if early_term is set there should be
      no need to include UV in rdcost (when color_sensitivity is set).
      Neutral change on RTC and RTC_derf metrics, for speed >= 5.
      No change for ytlive metrics.
      Very small speed gain (~0.5%) on some clips with strong color content.
      Change-Id: Ifc00928ecd935fc71e94935ceef0ae7481249f07
  8. 03 Nov, 2017 1 commit
    • Marco's avatar
      Compound prediction mode for nonrd pickmode. · eb7d431c
      Marco authored
      Allow for compound prediction mode in nonrd_pickmode for ZEROMV.
      For real-time encoding, 1 pass with non-zero lag-in-frames.
      Added speed feature to control the feature.
      Enabled for speed >=6 for now, under VBR mode.
      avgPSNR/SSIM metrics positive on ytlive set, for speed 6:
      some clips up by ~3-5%, some clips neutral gain, average gain
      across clips is ~1%.
      Small/negligible decrease in speed.
      Change-Id: I7a60c7596e69b9a928410c5ee2f9141eecd8613d
  9. 01 Nov, 2017 1 commit
  10. 31 Oct, 2017 1 commit
  11. 30 Oct, 2017 1 commit
    • Jerome Jiang's avatar
      vp9: Reduce stack usage of choose_partioning. · cc472311
      Jerome Jiang authored
      Change type of sum_square_error from int64_t to uint32_t.
      Change type of sum_error from int64_t to int32_t.
      This reduces the stack usage from ~131K to ~87K.
      Change-Id: I147d7c7b226bceb4f0817bb86848e1fa9d9ac149
  12. 23 Oct, 2017 1 commit
    • Marco's avatar
      vp9-svc: Allow for adapt_rd_thresh with row-mt. · 0738d901
      Marco authored
      Set adaptive_row_thresh_mt = 1 at speed >= 7,
      for svc when multi-threading is used with row-mt.
      This allow the adaptive_rd_thresh feature to be used
      in the nonrd-pickmode.
      ~1-2% speedup for SVC encoding with small quality
      loss (< 0.6%) on RTC set.
      Change-Id: Iab9878dff117bccdaef3e4d0645165db9808cdfc
  13. 16 Oct, 2017 1 commit
    • Linfeng Zhang's avatar
      Add 4 to 3 scaling SSSE3 optimization · 580d3224
      Linfeng Zhang authored
      Note this change will trigger the different C version on SSSE3 and
      generate different scaled output.
      Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().
      Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
  14. 13 Oct, 2017 2 commits
    • Marco's avatar
      Adjust threshold in gf_boost for 1 pass vbr · a9248457
      Marco authored
      Small inncrease the sad_thresh1, avoids some false
      detection of possible scene changes within lag.
      Small improvement in few clips on ytlive, otherwise neutral change.
      Change-Id: Ia79b53bb657bbce65a7aac7d20666b6373d5af8b
    • paulwilkins's avatar
      Corpus VBR tweak for undershoot. · 8842ee0b
      paulwilkins authored
      In cases of strong undershoot adjust Q range down faster.
      Change-Id: I84982beceb3c9b6dc50e52e4a6e891c7dd395d03
  15. 12 Oct, 2017 3 commits
  16. 11 Oct, 2017 1 commit
    • paulwilkins's avatar
      Prevent double application of min rate in two pass. · 416b7051
      paulwilkins authored
      The initial allocation of bits in the two pass code to each frame
      should be within the min max limits on the command line. However,
      when forming an ARF group the cost of the ARF is shared by frames
      in that group such that the residual bits for a frame could drop below
      the min value. This change prevents the minimum being re-applied
      after the cost of the ARF has been deducted as this may otherwise
      cause low rate sections to overshoot their target.
      Test runs comparing to a baseline run with min and max section pct
      0-2000% vs one closer to the YT use case (50-150%) suggest that
      this fix not only results in better rate control but also gives a better
      rd outcome.
      For example the HD set vs 0-2000% baseline (opsnr, ssim).
      Old code (50-150):  +0.751, +1.099
      New code(50-150): +0.241, -0.009
      Change-Id: I715da7b130bf53ba8aa609532aa9e18b84f5e2ef
  17. 10 Oct, 2017 4 commits
    • Linfeng Zhang's avatar
      Add 4 to 1 scaling x86 optimization · 16166bfd
      Linfeng Zhang authored
      Change-Id: I51c190f0a88685867df36912522e67bdae58a673
    • Marco's avatar
      Adjustment to scene detection and key frame. · 017257a3
      Marco authored
      For 1 pass vbr: use higher threshold on avg_sad
      and force key frame under scene cut detection if
      above the threshold. Allow it for speed >= 6 for now,
      since it does not use the full nonrd_pickmode partition
      (as in speed 5).
      Improves quality somewhat on scene cut frames.
      Neutral on overall metrics and fps for speed 6 on
      ytlive set.
      Change-Id: I12626f7627419ca14f9d0d249df86c7104438162
    • paulwilkins's avatar
      Further Corpus VBR change. · 06d231c9
      paulwilkins authored
      Change to the bit allocation within a GF/ARF group.
      Normal VBR and CQ mode allocate bits to a GF/ARF group based of the mean
      complexity score of the frames in that group but then share bits evenly between
      the "normal" frames in that group regardless of the individual frame complexity
      scores (with the exception of the middle and last frames).
      This patch alters the behavior for the experimental "Corpus VBR" mode such that
      the allocation is always based on the individual complexity scores.
      Change-Id: I5045a143eadeb452302886cc5ccffd0906b75708
    • paulwilkins's avatar
      Corpus Wide VBR test implementation. · 741bd6df
      paulwilkins authored
      This patch makes further changes to support an experimental
      corpus wide VBR mode that uses a corpus complexity
      number as the midpoint of the distribution used to allocate bits
      within a clip, rather than some average error score derived from the
      clip itself.
      At the moment the midpoint number is hard wired for testing and
      the mode is enabled or disabled through a #ifdef.  Ultimately this
      would need to be controlled by command line parameters.
      Change-Id: I9383b76ac9fc646eb35a5d2c5b7d8bc645bfa873
  18. 09 Oct, 2017 1 commit
  19. 06 Oct, 2017 3 commits
    • Marco Paniconi's avatar
      Revert "Speed >=5 real-time: add TM intra mode for high_source_sad." · bcbc6ed8
      Marco Paniconi authored
      This reverts commit 9311ef18.
      Reason for revert:
      Notice small regression in some clips.
      Will revisit in another change.
      Original change's description:
      > Speed >=5 real-time: add TM intra mode for high_source_sad.
      > Small/neutral change in metrics or speed for ytlive.
      > Some improvement in quality on frames with big content change.
      > Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
      Change-Id: I9d8ec5195bb05ddf329d325699355185affb9b13
      No-Presubmit: true
      No-Tree-Checks: true
      No-Try: true
    • Marco's avatar
      Adjust threshold in scene detection · e405eb06
      Marco authored
      For 1 pass vbr: increase min_thresh slightly, and also add
      condition on golden/arf update for using full nonrd_pick_partition.
      Reduces possible false detection for scene cut detection.
      Neutral/small change in metrics or speed for speed 5.
      Change-Id: I388f4d9a56e3cc763e0148338c1bc0381e58ad76
    • Marco's avatar
      Speed >=5 real-time: add TM intra mode for high_source_sad. · 9311ef18
      Marco authored
      Small/neutral change in metrics or speed for ytlive.
      Some improvement in quality on frames with big content change.
      Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
  20. 05 Oct, 2017 1 commit
    • Marco's avatar
      Adjust threshold for adapt_partition for speed 6. · 18262a85
      Marco authored
      Lower SAD threshold to select non_rd pickmode partition
      at superblock level more often.
      Small gain in metrics, small/negligible decrease in speed.
      Change-Id: I0f728236b91a604e4ca7e02039adc54d5985c4dc
  21. 04 Oct, 2017 5 commits
  22. 03 Oct, 2017 2 commits
    • Marco's avatar
      vp9: 1 pass vbr: Limit qpdelta on high_source_sad. · ab2bd340
      Marco authored
      For 1 pass vbr: when significant content/scene change is detected
      (high_source_sad = 1) reduce/turnoff the additional qdelta on the
      active_worst_quality. This helps somewhat to reduce the occurrence
      of large frame sizes and large encode times.
      Allow it only when use_altef_onepass is enabled.
      Neutral/no change on metrics.
      Change-Id: I1dd97dd2ab892d65f707b841b27a5de300b714ea
    • Marco's avatar
      Use adapt_partition for ARF in 1 pass. · c8678fb7
      Marco authored
      For speed 6 real-time mode: use adapt_partition
      on ARF frame instead of REFERENCE_PARTITION (which is slower).
      This requires enabling compute_source_sad_onepass for no-show_frames.
      Speedup of ~3-5% on some clips that heavily use ARF,
      small loss (~0.2%) in quality on ytlive set.
      Change-Id: Ib50acc97df06458244a6ac55d2bd882c30012536