1. 02 Jul, 2013 - 9 commits
    • Deb Mukherjee's avatar
      Speed feature to binary search dir intramodes · 37501d68
      Deb Mukherjee authored
      This speed feature will skip searching the directional intra prediction
      modes D63, D117, D27, D153 if the best intra mode so far is not one of
      the diagonal, horizontal or vertical directions closest to the respective
      directions being tested. In other words, this implements a sort of
      binary search in the angular domain.
      
      Speedup: about 9-10%
      Results: -0.05% only on derfraw300.
      
      Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2
      37501d68
    • Deb Mukherjee's avatar
      Tx size selection enhancements · 8d3d2b76
      Deb Mukherjee authored
      (1) Refines the modeling function and uses that to add some speed
      features. Specifically, intead of using a flag use_largest_txfm as
      a speed feature, an enum tx_size_search_method is used, of which
      two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
      new types are added:
      USE_LARGESTINTRA (use largest only for intra)
      USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
      inter)
      
      (2) Another change is that the framework for deciding transform type
      is simplified to use a heuristic count based method rather than
      an rd based method using txfm_cache. In practice the new method
      is found to work just as well - with derf only -0.01 down.
      The new method is more compatible with the new framework where
      certain rd costs are based on full rd and certain others are
      based on modeled rd or are not computed. In this patch the existing
      rd based method is still kept for use in the USE_FULL_RD mode.
      In the other modes, the count based method is used.
      However the recommendation is to remove it eventually since the
      benefit is limited, and will remove a lot of complications in
      the code
      
      (3) Finally a bug is fixed with the existing use_largest_txfm speed feature
      that causes mismatches when the lossless mode and 4x4 WH transform is
      forced.
      
      Results on derf:
      USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
      USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
      pretty good compromise)
      USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
      (currently the benefit of modeling is limited for txfm size selection,
      but keeping this enum as a placeholder) .
      USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
      use_largest_txfm speed feature).
      
      Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
      8d3d2b76
    • Deb Mukherjee's avatar
      Clean-up in forward update to use mapping tables · 9c20cedd
      Deb Mukherjee authored
      Uses mapping tables instead of complicated modulo/division
      operations for prob mapping for forward updates.
      
      No bit-stream or output change.
      
      Change-Id: Ifd9ce8ac1437835c305c94f64c18273c7a68f546
      9c20cedd
    • Yunqing Wang's avatar
      Add speed feature to disable splitmv · b12e060b
      Yunqing Wang authored
      Added a speed feature in speed 1 to disable splitmv for HD (>=720)
      clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim
      loss. Encoding speedup is 36%.
      
      (For reference: The test result on derf set showed 2% psnr loss
      and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be
      enabled for small resolution videos.)
      
      Change-Id: I54f72b94f506c6d404b47c42e71acaa5374d6ee6
      b12e060b
    • Jingning Han's avatar
      Calculate rd cost per transformed block · b91a1586
      Jingning Han authored
      Compute the rate-distortion cost per transformed block, and cumulate
      the cost through all blocks inside a partition. This allows encoder
      to detect if the cumulative rd cost is already above the best rd cost,
      thereby enabling early termination in the rate-distortion optimization
      search.
      
      Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac
      b91a1586
    • Paul Wilkins's avatar
      Revert "New motion threshold factor - speed feature." · b7cd01ed
      Paul Wilkins authored
      This reverts commit 13772781.
      Also fixes a spelling mistake.
      
      Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f
      b7cd01ed
    • Yaowu Xu's avatar
      fix the mismatch again in cpu_used 2 · 9e408e35
      Yaowu Xu authored
      Change-Id: Icc4f70f0b0f91c9e7d5d00eedd67841afe2f2679
      9e408e35
    • Jim Bankoski's avatar
      use partitioning from last frame · d4158283
      Jim Bankoski authored
      
      This cl converts use partition from last frame to do the following:
      
      if part is none,horz, vert -> try split
      if part != none and one of the children is not split - try none
      
      
      Change-Id: I5b6c659e35f3ac9f11c051b92ba98af6d7e8aa87
      Signed-off-by: default avatarJim Bankoski <jimbankoski@google.com>
      d4158283
    • Dmitry Kovalev's avatar
      Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h. · 1ac05402
      Dmitry Kovalev authored
      Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c
      1ac05402
  2. 01 Jul, 2013 - 6 commits
    • Ronald S. Bultje's avatar
      Make get_coef_context() branchless. · 26b6318d
      Ronald S. Bultje authored
      This should significantly speedup cost_coeffs(). Basically what the
      patch does is to make the neighbour arrays padded by one item to
      prevent an eob check in get_coef_context(), then it populates each
      col/row scan and left/top edge coefficient with two times the same
      neighbour - this prevents a single/double context branch in
      get_coef_context(). Lastly, it populates neighbour arrays in pixel
      order (rather than scan order), so we don't have to dereference the
      scantable to get the correct neighbours.
      
      Total encoding time of first 50 frames of bus (speed 0) at 1500kbps
      goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase.
      
      Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56
      26b6318d
    • Ronald S. Bultje's avatar
      Update quantize SSSE3 SIMD to cover 32x32 transform case also. · c8defcfd
      Ronald S. Bultje authored
      Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to
      2min10.1, i.e. a 2.3% overall speed increase.
      
      Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87
      c8defcfd
    • Ronald S. Bultje's avatar
      Quantize (64-bit only, for now) SSSE3 SIMD. · 7353ceab
      Ronald S. Bultje authored
      Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
      goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
      x86-64 only, it needs some minor modifications to be 32bit compatible,
      because it uses 15 xmm registers, whereas 32bit only has 8.
      
      Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
      7353ceab
    • Dmitry Kovalev's avatar
      Removing vp9_modecont.{h, c}. · 2ab3bc88
      Dmitry Kovalev authored
      Moving vp9_default_inter_mode_probs array to vp9_entropymode.c.
      
      Change-Id: I88ebda86ccc07f2a43c6c01d4b37898214cfb6de
      2ab3bc88
    • Yaowu Xu's avatar
      fix a mismatch in cpuused 2 · 632289b3
      Yaowu Xu authored
      Change-Id: I921c9faba6386535aaf717a54301dd346a9b8540
      632289b3
    • Paul Wilkins's avatar
      New motion threshold factor - speed feature. · 13772781
      Paul Wilkins authored
      Added a speed feature that focuses only on thresholds
      for new motion modes.
      
      Moved sf->comp_inter_joint_search_thresh into speed
      1.  This has ~+0.4% impact on quality at speed 0 as
      our quality reference baseline.
      
      Slight adjustment to baseline thresholds.
      
      Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5
      13772781
  3. 29 Jun, 2013 - 4 commits
  4. 28 Jun, 2013 - 9 commits
    • Jingning Han's avatar
      Fix switch statement in 8x8 transform · 9def7f72
      Jingning Han authored
      Change-Id: I7c46354c4983feb5f6202c3ab4a1d9534da7e30f
      9def7f72
    • Ronald S. Bultje's avatar
      Inline vp9_get_coef_context() (and remove vp9_ prefix). · d00b8e5f
      Ronald S. Bultje authored
      Makes cost_coeffs() a lot faster:
      4x4: 236 -> 181 cycles
      8x8: 888 -> 588 cycles
      16x16: 3550 -> 2483 cycles
      32x32: 17392 -> 12010 cycles
      
      Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes
      from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup.
      
      Change-Id: I16b8d595946393c8dc661599550b3f37f5718896
      d00b8e5f
    • Dmitry Kovalev's avatar
      Removing CONFIG_DEBUG checks on assertions. · 8e6ce6bb
      Dmitry Kovalev authored
      Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
      ones from vp9_onyx_int.h and vp9_onyxd_int.h.
      
      Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
      8e6ce6bb
    • Ronald S. Bultje's avatar
      Minor change to prevent one level of dereference in cost_coeffs(). · e3ce2b2a
      Ronald S. Bultje authored
      4x4: 234 -> 236 cycles
      8x8: 878 -> 888 cycles
      16x16: 3664 -> 3550 cycles
      32x32: 18134 -> 17392 cycles
      
      Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78
      e3ce2b2a
    • Ronald S. Bultje's avatar
      Some minor optimizations for cost_coeffs(). · 91d223bd
      Ronald S. Bultje authored
      Cycle timings for first 3 frames of bus (speed 0) at 1500kbps:
      4x4: 298 -> 234 cycles
      8x8: 1227 -> 878 cycles
      16x16: 23426 -> 18134 cycles
      32x32: 4906 -> 3664 cycles
      
      Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes
      from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster.
      
      Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95
      91d223bd
    • Ronald S. Bultje's avatar
      Make coefficient skip condition an explicit RD choice. · af660715
      Ronald S. Bultje authored
      This commit replaces zrun_zbin_boost, a method of biasing non-zero
      coefficients following runs of zero-coefficients to be rounded towards
      zero, with an explicit skip-block choice in the RD loop.
      
      The logic is basically that if individual coefficients should be rounded
      towards zero (from a RD point of view), the trellis/optimize loop should
      take care of it. If whole blocks should be zero (from a RD point of
      view), a single RD check is much more efficient than a complete
      serialization of the quantization loop.
      
      Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
      SIMD for quantize will follow in a separate patch. Results for other
      test sets pending.
      
      Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
      af660715
    • Yaowu Xu's avatar
      Minor cleanups · 8b9eea0a
      Yaowu Xu authored
      Change-Id: I379617c1c731a686b3f7e032b8805860c1055b12
      8b9eea0a
    • Yaowu Xu's avatar
      Optimize partition search order · 1374a06b
      Yaowu Xu authored
      This commit change the partition search order to allow checking of
      rectangular partition to be done after square partitions. It also
      added a speed feature to skip rectangular partition check when
      NONE is better than SPLIT in RD sense.
      
      This feature roughly speed up encoder by 1.5X with loss on compression
      -0.91% on cif set
      -0.56% on stdhd set
      
      Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4
      1374a06b
    • Ronald S. Bultje's avatar
      Fix tile independence with both column tiling and static_thresh set. · fd4eed3b
      Ronald S. Bultje authored
      Change-Id: I0b2be0ec2c410a527f88b95a44f24ac967b2dac1
      fd4eed3b
  5. 27 Jun, 2013 - 3 commits
    • Dmitry Kovalev's avatar
      Decoder's code cleanup. · 3231da0a
      Dmitry Kovalev authored
      Using vp9_set_pred_flag function instead of custom code, adding
      decode_tokens function which is now called from decode_atom,
      decode_sb_intra, and decode_sb.
      
      Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8
      3231da0a
    • Ronald S. Bultje's avatar
      Inline quantize so idiv instruction gets removed from inner loop. · 7a049be6
      Ronald S. Bultje authored
      Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from
      3min15.0 to 3min10.9, i.e. 2.1% faster overall.
      
      Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c
      7a049be6
    • Jingning Han's avatar
      Make intra predictor reference buffer configurable · 861cb06c
      Jingning Han authored
      This commit enables configurable reference buffer pointer for intra
      predictor. This allows later removal of spatial dependency between
      blocks inside a 64x64 superblock in the rate-distortion optimization
      loop.
      
      Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1
      861cb06c
  6. 26 Jun, 2013 - 7 commits
  7. 25 Jun, 2013 - 2 commits