1. 17 Jul, 2013 - 2 commits
    • Ronald S. Bultje's avatar
      Best_rd breakout in rd partition search. · 9f427bfe
      Ronald S. Bultje authored
      About 15% faster for bus (speed 0) first 50 frames @ 1500kbps, which
      goes from 1min36 to 1min24. Results become slightly better (+0.2% on
      derf/yt, +0.4% on hd), probably because of a bugfix for skipmode in
      super_block_yrd(). Overall speed change (on derfraw300) is roughly
      -13%. This can probably be improved further by caching best_yrd
      between partition searches. Also, we might be able to get more
      speedups by always doing PARTITION_NONE before PARTITIONS_SPLIT, not
      just at the sb8x8 level.
      
      Change-Id: I83736949ebd5b4a3b400ee688d7661913fefc98b
      9f427bfe
    • Yunqing Wang's avatar
      Speed up motion estimation using small partitions' result(experiment) · df90d58f
      Yunqing Wang authored
      Current partition checking starts from small sizes, and then goes up
      to large sizes. This experiment uses the small partitions' motion
      estimation result, which is already available, to speed up the
      large partition's motion estimation. We can decide to skip some
      patition checkings if they are unlikely choices. We could use the
      motion vector(MV) result as current partition's prediction MV, limit
      the search range and reference frame.
      
      Current result at speed 1:
      psnr loss: 1.19% for stdhd, 0.287% for derf.
      speed gain: 14% for sunflower(hd), 11% for akiyo.
      
      Further improvement will be done later.
      
      Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
      df90d58f
  2. 16 Jul, 2013 - 3 commits
    • Dmitry Kovalev's avatar
      Changing signature of vp9_get_pred_probs_tx_size. · 5b65a71c
      Dmitry Kovalev authored
      Removing VP9_COMMON* argument and adding struct tx_probs* instead of
      MACROBLOCKD*.
      
      Change-Id: Idf61074631a90ec51eac22c8dcd977f44ac0757c
      5b65a71c
    • Dmitry Kovalev's avatar
      Cleaning up tile code. · 9482a0bf
      Dmitry Kovalev authored
      Removing tile_rows and tile_columns from VP9Common, removing redundant
      constants MIN_TILE_WIDTH and MAX_TILE_WIDTH, changing signature of
      vp9_get_tile_n_bits.
      
      Change-Id: I8ff3104a38179b2c6900df965c144c1d6f602267
      9482a0bf
    • Dmitry Kovalev's avatar
      Rewriting vp9_set_pred_flag_{seg_id, mbskip}. · 863138a2
      Dmitry Kovalev authored
      Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent
      with vp9_get_segment_id without using confusing sub(a, b) macro. Passing
      mi_row and mi_col to functions explicitly instead of replying on
      mb_to_right_edge and mb_to_bottom_edge.
      
      Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435
      863138a2
  3. 15 Jul, 2013 - 1 commit
    • Jingning Han's avatar
      Skip duplicate block encoding in the rd loop · faff6ed0
      Jingning Han authored
      This speed feature allows the encoder to largely remove the spatial
      dependency between blocks inside a 64x64 superblock, thereby removing
      the need to repeatedly encode superblocks per partition type in the
      rate-distortion optimization loop.
      
      A major challenge lies in the intra modes tested in the rate-distortion
      optimization loop. The subsequent blocks do not have access to the
      reconstructed boundary pixels without the intermediate coding steps.
      This was resolved by using the original pixels for intra prediction
      in the rd loop, followed by an appropriately designed distortion
      modeling on the quantization parameters. Experiments also suggested
      that the performance impact is more discernible at lower bit-rate/psnr
      settings. Hence a quantizer dependent threshold is applied to deactivate
      skip of block coding.
      
      For bus_cif at 2000 kbps,
      speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
               performance loss.
      
      speed 1: runtime 65312ms  -> 61536ms, (7...
      faff6ed0
  4. 14 Jul, 2013 - 1 commit
  5. 13 Jul, 2013 - 1 commit
  6. 12 Jul, 2013 - 2 commits
  7. 11 Jul, 2013 - 1 commit
    • Dmitry Kovalev's avatar
      Moving segmentation related vars into separate struct. · c4ad3273
      Dmitry Kovalev authored
      Adding segmentation struct to vp9_seg_common.h. Struct members are from
      macroblockd and VP9Common structs. Moving segmentation related constants
      and enums to vp9_seg_common.h.
      
      Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
      c4ad3273
  8. 10 Jul, 2013 - 4 commits
  9. 08 Jul, 2013 - 4 commits
    • Ronald S. Bultje's avatar
      Don't call encode_sb() for the final of 4-split subpartitions. · a5062cc6
      Ronald S. Bultje authored
      The resulting reconstruction is never used, thus it just wastes CPU
      cycles. Reduces encode time of first 50 frames of bus (speed 0) @
      1500kbps from 2min2.0 to 2min1.2, i.e. a 0.65% overall speedup.
      
      Change-Id: I74755ca3aadc21e2be220f486259060bd4088c45
      a5062cc6
    • Ronald S. Bultje's avatar
      Make frame-wide filter-type decision fully RD-based. · ed995afb
      Ronald S. Bultje authored
      Overall, on all test sets, this gains about +0.2% on all metrics.
      City is a clip where this really hurts (-1.0% on all metrics), I'm
      not quite sure why yet. Maybe interesting to look into in the future.
      
      Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78
      ed995afb
    • Dmitry Kovalev's avatar
      Using mi_cols instead of mb_cols. · b7559258
      Dmitry Kovalev authored
      Eliminating usage of mb-units, switching to mi-units. Adding
      ALIGN_POWER_OF_TWO macro.
      
      Change-Id: I2491c969f713207c062011878b57e4e531818607
      b7559258
    • Deb Mukherjee's avatar
      Implements several heuristics to prune mode search · d9b62160
      Deb Mukherjee authored
      Skips mode searches for intra and compound inter modes depending
      on the best mode so far and the reference frames. The various
      heuristics to be used are selected by bits from a flag. The
      previous direction based intra mode search pruning is also absorbed
      in this framework.
      
      Specifically the flags and their impact are:
      
      1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
      directional modes and TM_PRED if the best so far is
      an inter mode)
      derfraw300: -0.15%, 10% speedup
      
      2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
      mode search if the best so far is not one of the closest
      hor/vert/diagonal directions.
      derfraw300: -0.05%, about 9% speedup
      
      3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
      search if the best so far is an intra mode)
      derfraw300: -0.06%, about 7-8% speedup
      
      4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
      if the best single ref inter mode does not have the same ref
      as one of the two references being tested in the compound mode)
      derfraw300: -0.56%, about 10% speedup
      
      Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495
      d9b62160
  10. 04 Jul, 2013 - 1 commit
  11. 03 Jul, 2013 - 2 commits
    • Dmitry Kovalev's avatar
      Replacing 64 / MI_SIZE with MI_BLOCK_SIZE. · 5a21de84
      Dmitry Kovalev authored
      Change-Id: I32276552b3ea6dc1dce8e298be114cfe1019b31c
      5a21de84
    • Paul Wilkins's avatar
      Added two new skip experiments. · 72c5778e
      Paul Wilkins authored
      sf->unused_mode_skip_lvl. Tests modes as normal for all
      sizes at or below the given level. At larger sizes it skips
      all modes that were not chosen at any smaller size.
      Hence setting BLOCK_SIZE_SB64X64 is in effect off.
      Setting BLOCK_SIZE_AB4X4 will only consider modes that
      were chosen for one or more 4x4 blocks at larger sizes.
      
      sf->reference_masking.
      Do a test encode of the NONE partition at one size and create
      a reference frame mask based on the best rd choice. In the
      full search only allow this reference frame.
      Currently it is testing 64x64 and repeats this in the full search.
      This does not work well with Jim's Partition code just now and
      is disabled by default.
      
      Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd
      72c5778e
  12. 02 Jul, 2013 - 5 commits
    • Dmitry Kovalev's avatar
      Removing redundant struct from union b_mode_info. · be77f6bb
      Dmitry Kovalev authored
      Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83
      be77f6bb
    • Yaowu Xu's avatar
      Added a speed feature use_square_partition_only · 0d7b7c09
      Yaowu Xu authored
      This commit adds a speed feature where only squared partition are
      evaluated in partition picking. Enable this feature in cpu-used 2
      reduces encoding time by ~30%.
      
      loss of compression:
      -0.9% on cif set
      -1.23% on stdhd
      
      Change-Id: Ia6fad11210f0b78365abb889f9245604513be5b9
      0d7b7c09
    • Deb Mukherjee's avatar
      Tx size selection enhancements · 8d3d2b76
      Deb Mukherjee authored
      (1) Refines the modeling function and uses that to add some speed
      features. Specifically, intead of using a flag use_largest_txfm as
      a speed feature, an enum tx_size_search_method is used, of which
      two of the types are USE_FULL_RD and USE_LARGESTALL. Two other
      new types are added:
      USE_LARGESTINTRA (use largest only for intra)
      USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for
      inter)
      
      (2) Another change is that the framework for deciding transform type
      is simplified to use a heuristic count based method rather than
      an rd based method using txfm_cache. In practice the new method
      is found to work just as well - with derf only -0.01 down.
      The new method is more compatible with the new framework where
      certain rd costs are based on full rd and certain others are
      based on modeled rd or are not computed. In this patch the existing
      rd based method is still kept for use in the USE_FULL_RD mode.
      In the other modes, the count based method is used.
      However the recommendation is to remove it eventually since the
      benefit is limited, and will remove a lot of complications in
      the code
      
      (3) Finally a bug is fixed with the existing use_largest_txfm speed feature
      that causes mismatches when the lossless mode and 4x4 WH transform is
      forced.
      
      Results on derf:
      USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction
      USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a
      pretty good compromise)
      USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction
      (currently the benefit of modeling is limited for txfm size selection,
      but keeping this enum as a placeholder) .
      USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing
      use_largest_txfm speed feature).
      
      Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936
      8d3d2b76
    • Jim Bankoski's avatar
      use partitioning from last frame · d4158283
      Jim Bankoski authored
      
      This cl converts use partition from last frame to do the following:
      
      if part is none,horz, vert -> try split
      if part != none and one of the children is not split - try none
      
      
      Change-Id: I5b6c659e35f3ac9f11c051b92ba98af6d7e8aa87
      Signed-off-by: default avatarJim Bankoski <jimbankoski@google.com>
      d4158283
    • Dmitry Kovalev's avatar
      Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h. · 1ac05402
      Dmitry Kovalev authored
      Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c
      1ac05402
  13. 01 Jul, 2013 - 1 commit
  14. 28 Jun, 2013 - 3 commits
    • Dmitry Kovalev's avatar
      Removing CONFIG_DEBUG checks on assertions. · 8e6ce6bb
      Dmitry Kovalev authored
      Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated
      ones from vp9_onyx_int.h and vp9_onyxd_int.h.
      
      Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6
      8e6ce6bb
    • Yaowu Xu's avatar
      Optimize partition search order · 1374a06b
      Yaowu Xu authored
      This commit change the partition search order to allow checking of
      rectangular partition to be done after square partitions. It also
      added a speed feature to skip rectangular partition check when
      NONE is better than SPLIT in RD sense.
      
      This feature roughly speed up encoder by 1.5X with loss on compression
      -0.91% on cif set
      -0.56% on stdhd set
      
      Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4
      1374a06b
    • Ronald S. Bultje's avatar
      Fix tile independence with both column tiling and static_thresh set. · fd4eed3b
      Ronald S. Bultje authored
      Change-Id: I0b2be0ec2c410a527f88b95a44f24ac967b2dac1
      fd4eed3b
  15. 27 Jun, 2013 - 1 commit
    • Dmitry Kovalev's avatar
      Decoder's code cleanup. · 3231da0a
      Dmitry Kovalev authored
      Using vp9_set_pred_flag function instead of custom code, adding
      decode_tokens function which is now called from decode_atom,
      decode_sb_intra, and decode_sb.
      
      Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8
      3231da0a
  16. 26 Jun, 2013 - 3 commits
  17. 24 Jun, 2013 - 1 commit
  18. 21 Jun, 2013 - 3 commits
    • Dmitry Kovalev's avatar
      Removing find_seg_id and using vp9_get_pred_mi_segid instead. · 40141681
      Dmitry Kovalev authored
      Change-Id: Ia40229903c08f14020e90e94cfdf494aba1be827
      40141681
    • Ronald S. Bultje's avatar
      Implement SSE2 block_error. · 54b2a596
      Ronald S. Bultje authored
      Change vp9_block_error() to return a 64bit error variable, change all
      callers to expect a 64bit return value (this will prevent overflows,
      which we basically don't check for at all right now). Remove duplicate
      block_error() function, which fixed that through truncation. Remove
      old (incompatible) mmx/sse2 block_error SIMD versions and replace with
      a new one that returns a 64bit value.
      
      Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to
      3min23, i.e. a 3% overall speedup.
      
      Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68
      54b2a596
    • Yaowu Xu's avatar
      rename variables to avoid build error in MSVC · ee07a261
      Yaowu Xu authored
      Change-Id: I7960178c95c54d5c4497e44cfc8c493566294b34
      ee07a261
  19. 20 Jun, 2013 - 1 commit