1. 09 Nov, 2017 1 commit
    • Jerome Jiang's avatar
      vp9: SVC feature to use partition from lower resolution. · fdb054a0
      Jerome Jiang authored
      For SVC with 3 spatial layers:
      Add feature to copy/upscale partition from middle spatial layer
      to the upper/highest resolution, when superblock sad is not high.
      Enabled for speed >= 7 and only for non-reference frames.
      Speedup ~3-4%, small loss in avgPNSR/SSIM of ~1%.
      Change-Id: I7f0a2716c0fde28bade0f86159d11b7e31d6ab8d
  2. 06 Nov, 2017 1 commit
    • Marco's avatar
      Nonrd_pickmode: avoid computing UV cost when early_term is set. · 6fbc354c
      Marco authored
      For nonrd_pickmode: if early_term is set there should be
      no need to include UV in rdcost (when color_sensitivity is set).
      Neutral change on RTC and RTC_derf metrics, for speed >= 5.
      No change for ytlive metrics.
      Very small speed gain (~0.5%) on some clips with strong color content.
      Change-Id: Ifc00928ecd935fc71e94935ceef0ae7481249f07
  3. 03 Nov, 2017 2 commits
    • Kyle Siefring's avatar
      Support building AVX-512 and implement sadx4 for AVX-512 · b383a17f
      Kyle Siefring authored
      The added AVX-512 support requires the subset of AVX-512 added in Skylake-X.
      Change-Id: I39666b00d10bf96d06c709823663eb09b89265b7
    • Marco's avatar
      Compound prediction mode for nonrd pickmode. · eb7d431c
      Marco authored
      Allow for compound prediction mode in nonrd_pickmode for ZEROMV.
      For real-time encoding, 1 pass with non-zero lag-in-frames.
      Added speed feature to control the feature.
      Enabled for speed >=6 for now, under VBR mode.
      avgPSNR/SSIM metrics positive on ytlive set, for speed 6:
      some clips up by ~3-5%, some clips neutral gain, average gain
      across clips is ~1%.
      Small/negligible decrease in speed.
      Change-Id: I7a60c7596e69b9a928410c5ee2f9141eecd8613d
  4. 01 Nov, 2017 1 commit
  5. 31 Oct, 2017 1 commit
  6. 30 Oct, 2017 1 commit
    • Jerome Jiang's avatar
      vp9: Reduce stack usage of choose_partioning. · cc472311
      Jerome Jiang authored
      Change type of sum_square_error from int64_t to uint32_t.
      Change type of sum_error from int64_t to int32_t.
      This reduces the stack usage from ~131K to ~87K.
      Change-Id: I147d7c7b226bceb4f0817bb86848e1fa9d9ac149
  7. 23 Oct, 2017 1 commit
    • Marco's avatar
      vp9-svc: Allow for adapt_rd_thresh with row-mt. · 0738d901
      Marco authored
      Set adaptive_row_thresh_mt = 1 at speed >= 7,
      for svc when multi-threading is used with row-mt.
      This allow the adaptive_rd_thresh feature to be used
      in the nonrd-pickmode.
      ~1-2% speedup for SVC encoding with small quality
      loss (< 0.6%) on RTC set.
      Change-Id: Iab9878dff117bccdaef3e4d0645165db9808cdfc
  8. 16 Oct, 2017 1 commit
    • Linfeng Zhang's avatar
      Add 4 to 3 scaling SSSE3 optimization · 580d3224
      Linfeng Zhang authored
      Note this change will trigger the different C version on SSSE3 and
      generate different scaled output.
      Its speed is 2x compared with the version calling vpx_scaled_2d_ssse3().
      Change-Id: I17fff122cd0a5ac8aa451d84daa606582da8e194
  9. 13 Oct, 2017 2 commits
    • Marco's avatar
      Adjust threshold in gf_boost for 1 pass vbr · a9248457
      Marco authored
      Small inncrease the sad_thresh1, avoids some false
      detection of possible scene changes within lag.
      Small improvement in few clips on ytlive, otherwise neutral change.
      Change-Id: Ia79b53bb657bbce65a7aac7d20666b6373d5af8b
    • paulwilkins's avatar
      Corpus VBR tweak for undershoot. · 8842ee0b
      paulwilkins authored
      In cases of strong undershoot adjust Q range down faster.
      Change-Id: I84982beceb3c9b6dc50e52e4a6e891c7dd395d03
  10. 12 Oct, 2017 3 commits
  11. 11 Oct, 2017 1 commit
    • paulwilkins's avatar
      Prevent double application of min rate in two pass. · 416b7051
      paulwilkins authored
      The initial allocation of bits in the two pass code to each frame
      should be within the min max limits on the command line. However,
      when forming an ARF group the cost of the ARF is shared by frames
      in that group such that the residual bits for a frame could drop below
      the min value. This change prevents the minimum being re-applied
      after the cost of the ARF has been deducted as this may otherwise
      cause low rate sections to overshoot their target.
      Test runs comparing to a baseline run with min and max section pct
      0-2000% vs one closer to the YT use case (50-150%) suggest that
      this fix not only results in better rate control but also gives a better
      rd outcome.
      For example the HD set vs 0-2000% baseline (opsnr, ssim).
      Old code (50-150):  +0.751, +1.099
      New code(50-150): +0.241, -0.009
      Change-Id: I715da7b130bf53ba8aa609532aa9e18b84f5e2ef
  12. 10 Oct, 2017 4 commits
    • Linfeng Zhang's avatar
      Add 4 to 1 scaling x86 optimization · 16166bfd
      Linfeng Zhang authored
      Change-Id: I51c190f0a88685867df36912522e67bdae58a673
    • Marco's avatar
      Adjustment to scene detection and key frame. · 017257a3
      Marco authored
      For 1 pass vbr: use higher threshold on avg_sad
      and force key frame under scene cut detection if
      above the threshold. Allow it for speed >= 6 for now,
      since it does not use the full nonrd_pickmode partition
      (as in speed 5).
      Improves quality somewhat on scene cut frames.
      Neutral on overall metrics and fps for speed 6 on
      ytlive set.
      Change-Id: I12626f7627419ca14f9d0d249df86c7104438162
    • paulwilkins's avatar
      Further Corpus VBR change. · 06d231c9
      paulwilkins authored
      Change to the bit allocation within a GF/ARF group.
      Normal VBR and CQ mode allocate bits to a GF/ARF group based of the mean
      complexity score of the frames in that group but then share bits evenly between
      the "normal" frames in that group regardless of the individual frame complexity
      scores (with the exception of the middle and last frames).
      This patch alters the behavior for the experimental "Corpus VBR" mode such that
      the allocation is always based on the individual complexity scores.
      Change-Id: I5045a143eadeb452302886cc5ccffd0906b75708
    • paulwilkins's avatar
      Corpus Wide VBR test implementation. · 741bd6df
      paulwilkins authored
      This patch makes further changes to support an experimental
      corpus wide VBR mode that uses a corpus complexity
      number as the midpoint of the distribution used to allocate bits
      within a clip, rather than some average error score derived from the
      clip itself.
      At the moment the midpoint number is hard wired for testing and
      the mode is enabled or disabled through a #ifdef.  Ultimately this
      would need to be controlled by command line parameters.
      Change-Id: I9383b76ac9fc646eb35a5d2c5b7d8bc645bfa873
  13. 09 Oct, 2017 1 commit
  14. 06 Oct, 2017 3 commits
    • Marco Paniconi's avatar
      Revert "Speed >=5 real-time: add TM intra mode for high_source_sad." · bcbc6ed8
      Marco Paniconi authored
      This reverts commit 9311ef18.
      Reason for revert:
      Notice small regression in some clips.
      Will revisit in another change.
      Original change's description:
      > Speed >=5 real-time: add TM intra mode for high_source_sad.
      > Small/neutral change in metrics or speed for ytlive.
      > Some improvement in quality on frames with big content change.
      > Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
      Change-Id: I9d8ec5195bb05ddf329d325699355185affb9b13
      No-Presubmit: true
      No-Tree-Checks: true
      No-Try: true
    • Marco's avatar
      Adjust threshold in scene detection · e405eb06
      Marco authored
      For 1 pass vbr: increase min_thresh slightly, and also add
      condition on golden/arf update for using full nonrd_pick_partition.
      Reduces possible false detection for scene cut detection.
      Neutral/small change in metrics or speed for speed 5.
      Change-Id: I388f4d9a56e3cc763e0148338c1bc0381e58ad76
    • Marco's avatar
      Speed >=5 real-time: add TM intra mode for high_source_sad. · 9311ef18
      Marco authored
      Small/neutral change in metrics or speed for ytlive.
      Some improvement in quality on frames with big content change.
      Change-Id: Ib3b0703a5f28ea6710e90324436e27598ab7384d
  15. 05 Oct, 2017 1 commit
    • Marco's avatar
      Adjust threshold for adapt_partition for speed 6. · 18262a85
      Marco authored
      Lower SAD threshold to select non_rd pickmode partition
      at superblock level more often.
      Small gain in metrics, small/negligible decrease in speed.
      Change-Id: I0f728236b91a604e4ca7e02039adc54d5985c4dc
  16. 04 Oct, 2017 5 commits
  17. 03 Oct, 2017 3 commits
    • Marco's avatar
      vp9: 1 pass vbr: Limit qpdelta on high_source_sad. · ab2bd340
      Marco authored
      For 1 pass vbr: when significant content/scene change is detected
      (high_source_sad = 1) reduce/turnoff the additional qdelta on the
      active_worst_quality. This helps somewhat to reduce the occurrence
      of large frame sizes and large encode times.
      Allow it only when use_altef_onepass is enabled.
      Neutral/no change on metrics.
      Change-Id: I1dd97dd2ab892d65f707b841b27a5de300b714ea
    • Marco's avatar
      Use adapt_partition for ARF in 1 pass. · c8678fb7
      Marco authored
      For speed 6 real-time mode: use adapt_partition
      on ARF frame instead of REFERENCE_PARTITION (which is slower).
      This requires enabling compute_source_sad_onepass for no-show_frames.
      Speedup of ~3-5% on some clips that heavily use ARF,
      small loss (~0.2%) in quality on ytlive set.
      Change-Id: Ib50acc97df06458244a6ac55d2bd882c30012536
    • Marco's avatar
      ARF in 1 pass vbr: modify skip ref_frame in nonrd_pickmode. · 33e10dfa
      Marco authored
      Speedup of ~2-3% on 1080p clips speed 6.
      Neutral/negligible loss in metrics on ytlive.
      Change-Id: I7ac47a4d8b58c566920bae29a94a0e8d59c36dee
  18. 02 Oct, 2017 2 commits
    • Linfeng Zhang's avatar
      Add 4 to 3 scaling NEON optimization · 0e55b0b0
      Linfeng Zhang authored
      Speed comparing with the one calling vpx_scaled_2d_neon()
        ~1.7 x in general
        ~2.8x for BILINEAR filter
      Change-Id: I8f0a54c2013e61ea086033010f97c19ecf47c7c6
    • Linfeng Zhang's avatar
      Specialize 4 to 3 frame scaling in C · 2c560c3c
      Linfeng Zhang authored
      Scale 3x3 block instead of 16x16 block in each loop. Disabled by
      1. Reduced number of different phase_scaler from 16 to 3.
         Optimization code will be smaller and faster.
      2. Maximum phase_scaler drifting will be reduced from 5/16 to 1/24.
         (The drifting is 1/(3*16) in each step.)
      Change-Id: I59a1f7496d89a1b090498c935d30cfcf1d0c282b
  19. 29 Sep, 2017 2 commits
    • Marco's avatar
      Fix partition selection in speed features for arf overlay frame. · c8f6e7b9
      Marco authored
      For real-time mode. Move the switch to fixed partition
      for is_src_frame_alt_ref so all speeds may use it
      if use_altref_onepass is set.
      Improves metrics by ~2% for ytlive set at speed 4
      (where use_altref_onepass is currently used).
      Change-Id: I033240386598c9dbd0364da89ccbcca64bc663ee
    • Marco's avatar
      Enable use_altref_onepass for speed 4 real-time mode. · f2c3d0a7
      Marco authored
      Used for VBR mode with lag-in-frames > 0.
      On ytlive set at speed 4: ~3% average gain.
      Change-Id: I45dad1700bf8be9d8f177815dc062774f6f2f0de
  20. 28 Sep, 2017 2 commits
    • Marco's avatar
      Set rc->high_source_sad = 0 before scene detection. · a2ef180d
      Marco authored
      Only has effect when sf->use_altref_onepass is enabled,
      as in that case scene detection is skipped for non-show frame
      and so high_source_sad does not get reset to 0.
      No change in metrics or speed.
      Change-Id: I421f066d239341449c18826089e1810b9fc5967f
    • Marco's avatar
      vp9: Modification to adapt the ARF usage for 1 pass vbr · 03e8f133
      Marco authored
      Add stats for past ARF usage, and use it to disable
      ARF usage based on some conditions.
      Overall improvement on ytlive set, reduces the regression
      on the problem clips for this feature.
      Only affects when sf->use_altref_onepass is enabled
      (currently off by default).
      Change-Id: I66267f227ea132dc86acb730e9882f85bead2cdb
  21. 27 Sep, 2017 2 commits
    • Marco's avatar
      Add use_svc condition to the scene detection in 1 pass. · c493ea1a
      Marco authored
      Scene detection is not currently used in SVC 1 pass code.
      Speedup of ~0.4%.
      Change-Id: I0ab769300919de710cd2da1402014fa3f22a1f86
    • Marco Paniconi's avatar
      Revert "Remove the speed condition on scene detection in 1 pass code." · 8d438dc3
      Marco Paniconi authored
      This reverts commit 535b7b91.
      This is actually used in CBR to reset the rate control if high source sad is detected.
      Original change's description:
      > Remove the speed condition on scene detection in 1 pass code.
      > Scene detection is used for VBR mode and for screen_content mode.
      > It was also enabled for CBR mode via the speed condition,
      > but currently the analysis in the scene detection is not used
      > in CRB mode (similar computations are done locally at superblock level
      > when the source_sad feature is enabled).
      > For 1 pass code.
      > No change in behavior. Small speed gain, ~0.5%.
      > Change-Id: I59991d7ef2af320bea7af4b907596e057affa42f
      Change-Id: Ib4e6b02047f75632503e7b0fc870af97fa9291c3
      No-Presubmit: true
      No-Tree-Checks: true
      No-Try: true