1. 26 Jul, 2013 - 1 commit
    • Paul Wilkins's avatar
      Auto min and max partition size experiment. · fe5e2a91
      Paul Wilkins authored
      Speed feature experiment to set an upper and lower
      partition size limit based on what has been seen
      in spatial neighbors.
      
      This seems to gives quite reasonable speed gains in local
      (10-15%) and when used with speed 0 the losses are small
      (0.25% derf, 0.35% stdhd). However, for now I am only
      enabling it on speed 1 as there may be clashes with the existing
      temporal partition selection in speed 2.
      
      Using a tighter min / max around the range derived from the
      neighbors increases speed further but at the cost of a
      bigger quality loss. However,  I think this spatial method could
      be combined with data from either the last frame or a variance
      method (or both) to refine the range of minimum and maximum
      partition size. I.e. consider the min and max from spatial and
      temporal neighbors and the variance recommendation.
      
      Change-Id: I1b96bf8b84368d6aad0c7aa600fe141b4f07435f
      fe5e2a91
  2. 25 Jul, 2013 - 4 commits
    • Yunqing Wang's avatar
      Add encoding option --static-thresh · d36852b7
      Yunqing Wang authored
      This option exists in VP8, and it was rewritten in VP9 to support
      skipping on different partition levels. After prediction is done,
      we can check if the residuals in the partition block will be all
      quantized to 0. If this is true, the skip flag is set, and only
      prediction data are needed in reconstruction. Based on DCT's energy
      conservation property, the skipping check can be estimated in
      spatial domain.
      
      The prediction error is calculated and compared to a threshold.
      The threshold is determined by the dequant values, and also
      adjusted by partition sizes. To be precise, the DC and AC parts
      for Y, U, and V planes are checked to decide skipping or not.
      
      Test showed that
      1. derf set:
      when static-thresh = 1, psnr loss is 0.666%;
      when static-thresh = 500, psnr loss is 1.162%;
      2. stdhd set:
      when static-thresh = 1, psnr loss is 1.249%;
      when static-thresh = 500, psnr loss is 1.668%;
      
      For different clips, encoding speedup range is between several
      percentage and 20+% when static-thresh <= 500. For example,
      clip            bitrate  static-thresh psnr    time
      akiyo(cif)       500        0          48.923  5.635s(50f)
      akiyo            500        500        48.863  4.402s(50f)
      
      parkjoy(1080p)   4000       0          30.380  77.54s(30f)
      parkjoy          4000       500        30.384  69.59s(30f)
      
      sunflower(1080p) 4000       0          44.461  85.2s(30f)
      sunflower        4000       500        44.418  78.1s(30f)
      
      Higher static-thresh values give larger speedup with larger
      quality loss.
      
      Change-Id: I857031ceb466ff314ab580ac5ec5d18542203c53
      d36852b7
    • Dmitry Kovalev's avatar
      General cleanups. · 7131cb0e
      Dmitry Kovalev authored
      Removing unused constants, macros, and function declarations. Using
      ROUND_POWER_OF_TWO macro, vp9_zero, vp9_copy where possible. Moving
      #include from *.h to *.c. Merging for loops for motion vectors.
      
      Change-Id: Ic3bf841764a2bb177128bb3a6d7aa8f68229cd13
      7131cb0e
    • Adrian Grange's avatar
      Simplify handling of sub-partition motion vectors · be700e14
      Adrian Grange authored
      Simplified the code that extracts and uses the motion
      vectors for the 4 sub-partitions in rd_pick_partition.
      
      Change-Id: Iaf698ef7ee3aef9edd59015e1ae065dd359b17d9
      be700e14
    • Yaowu Xu's avatar
      fix a bug where flags are not reset · 3e386aef
      Yaowu Xu authored
      The feature that uses small partition results as a measure to skip
      mode evaluation at larger partition requires the flags to be reset.
      The reset was missing in the code path that calls rd_use_partition().
      
      Change-Id: Ia0a3a0aee1a862b6e2333d596808db7c48033d50
      3e386aef
  3. 24 Jul, 2013 - 2 commits
  4. 23 Jul, 2013 - 2 commits
    • Adrian Grange's avatar
      Rolled-up several for loops into one · 646edbc1
      Adrian Grange authored
      Several consecutive for loops executed over the same
      index range, so I rolled them into one.
      
      Change-Id: I5cfcc8c38c738478965768409cca9d09adf224e1
      646edbc1
    • Jim Bankoski's avatar
      clean up bw, bh · 86a9dec7
      Jim Bankoski authored
      many structures use bw and bh and they have different meanings.   This cl attempts
      to start this clean up and remove unneccessary 2 step look up log and then
      shift operations...
      
      also removed partition type multiple operation code in bitstream.c.
      
      Change-Id: I7e03e552bdfc0939738e430862e3073d30fdd5db
      86a9dec7
  5. 22 Jul, 2013 - 2 commits
    • Dmitry Kovalev's avatar
      Adding update_tx_counts function. · b2fc6fa9
      Dmitry Kovalev authored
      Moving common encoder/decoder code to update_tx_counts. Also renaming
      vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
      call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
      (twice before).
      
      Change-Id: Ia50247f3893de88ef8e9041b0d44be44a40aaa4d
      b2fc6fa9
    • Jim Bankoski's avatar
      fix left over overflow · 2ac8b50c
      Jim Bankoski authored
      This cl fixes issues rbultje brought up. that I somehow neglected when I
      submitted yaowu's patch.
      
      Change-Id: I07ad18796317822510b96e951c88d29f194a3c2e
      2ac8b50c
  6. 20 Jul, 2013 - 1 commit
    • Yaowu Xu's avatar
      added checks to prevent rate/distortion overflow · ea284d62
      Yaowu Xu authored
      At speed 2, due to the threshold scheme used, it is possible the rate
      and distortion assigned with INT_MAX value. The patch added checking
      to prevent the INT_MAX value is used in further calculation of RD
      scores. The patch also changed the assertion in rd_use_partition() to
      be mirror similar assertion in rd_pick_partition().
      
      Change-Id: Idb52c543cc1e10abdf6e6a5d6e9cb535a42214dc
      ea284d62
  7. 19 Jul, 2013 - 5 commits
  8. 18 Jul, 2013 - 2 commits
    • Ronald S. Bultje's avatar
      Merge scale_factors and scale_factors_uv. · 5ebe503f
      Ronald S. Bultje authored
      This prevents a duplicate memcpy of a 128-byte struct every time
      set_scale_factors() is called (which is a lot), thus leading to a
      decrease from 3.7 MB to 1.85 MB of struct copying per 64x64 block
      RD/partition loop.
      
      Overall, this decreases encoding time of the first 50 frames of bus
      @ 1500kbps (speed 0) from 1min5.9 to 1min4.9, i.e. about a 1.5%
      overall speedup. We can likely get more gains by removing the copy
      of the other struct (and replacing it with an indexing) as well.
      
      Change-Id: I3dceb7e79f71e6fe911b11cc994cf89a869dde7a
      5ebe503f
    • Ronald S. Bultje's avatar
      Remove motion vectors from PARTITION_INFO. · 2d4929e3
      Ronald S. Bultje authored
      The same information already exists in union b_mode_info.
      
      Change-Id: Iac5086b99a3c3cc270380138062bb693e58f9e6d
      2d4929e3
  9. 17 Jul, 2013 - 2 commits
    • Ronald S. Bultje's avatar
      Best_rd breakout in rd partition search. · 9f427bfe
      Ronald S. Bultje authored
      About 15% faster for bus (speed 0) first 50 frames @ 1500kbps, which
      goes from 1min36 to 1min24. Results become slightly better (+0.2% on
      derf/yt, +0.4% on hd), probably because of a bugfix for skipmode in
      super_block_yrd(). Overall speed change (on derfraw300) is roughly
      -13%. This can probably be improved further by caching best_yrd
      between partition searches. Also, we might be able to get more
      speedups by always doing PARTITION_NONE before PARTITIONS_SPLIT, not
      just at the sb8x8 level.
      
      Change-Id: I83736949ebd5b4a3b400ee688d7661913fefc98b
      9f427bfe
    • Yunqing Wang's avatar
      Speed up motion estimation using small partitions' result(experiment) · df90d58f
      Yunqing Wang authored
      Current partition checking starts from small sizes, and then goes up
      to large sizes. This experiment uses the small partitions' motion
      estimation result, which is already available, to speed up the
      large partition's motion estimation. We can decide to skip some
      patition checkings if they are unlikely choices. We could use the
      motion vector(MV) result as current partition's prediction MV, limit
      the search range and reference frame.
      
      Current result at speed 1:
      psnr loss: 1.19% for stdhd, 0.287% for derf.
      speed gain: 14% for sunflower(hd), 11% for akiyo.
      
      Further improvement will be done later.
      
      Change-Id: I5abfd070e9cace2e91e2a0247d1325df313887ab
      df90d58f
  10. 16 Jul, 2013 - 3 commits
    • Dmitry Kovalev's avatar
      Changing signature of vp9_get_pred_probs_tx_size. · 5b65a71c
      Dmitry Kovalev authored
      Removing VP9_COMMON* argument and adding struct tx_probs* instead of
      MACROBLOCKD*.
      
      Change-Id: Idf61074631a90ec51eac22c8dcd977f44ac0757c
      5b65a71c
    • Dmitry Kovalev's avatar
      Cleaning up tile code. · 9482a0bf
      Dmitry Kovalev authored
      Removing tile_rows and tile_columns from VP9Common, removing redundant
      constants MIN_TILE_WIDTH and MAX_TILE_WIDTH, changing signature of
      vp9_get_tile_n_bits.
      
      Change-Id: I8ff3104a38179b2c6900df965c144c1d6f602267
      9482a0bf
    • Dmitry Kovalev's avatar
      Rewriting vp9_set_pred_flag_{seg_id, mbskip}. · 863138a2
      Dmitry Kovalev authored
      Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent
      with vp9_get_segment_id without using confusing sub(a, b) macro. Passing
      mi_row and mi_col to functions explicitly instead of replying on
      mb_to_right_edge and mb_to_bottom_edge.
      
      Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435
      863138a2
  11. 15 Jul, 2013 - 1 commit
    • Jingning Han's avatar
      Skip duplicate block encoding in the rd loop · faff6ed0
      Jingning Han authored
      This speed feature allows the encoder to largely remove the spatial
      dependency between blocks inside a 64x64 superblock, thereby removing
      the need to repeatedly encode superblocks per partition type in the
      rate-distortion optimization loop.
      
      A major challenge lies in the intra modes tested in the rate-distortion
      optimization loop. The subsequent blocks do not have access to the
      reconstructed boundary pixels without the intermediate coding steps.
      This was resolved by using the original pixels for intra prediction
      in the rd loop, followed by an appropriately designed distortion
      modeling on the quantization parameters. Experiments also suggested
      that the performance impact is more discernible at lower bit-rate/psnr
      settings. Hence a quantizer dependent threshold is applied to deactivate
      skip of block coding.
      
      For bus_cif at 2000 kbps,
      speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
               performance loss.
      
      speed 1: runtime 65312ms  -> 61536ms, (7...
      faff6ed0
  12. 14 Jul, 2013 - 1 commit
  13. 13 Jul, 2013 - 1 commit
  14. 12 Jul, 2013 - 2 commits
  15. 11 Jul, 2013 - 1 commit
    • Dmitry Kovalev's avatar
      Moving segmentation related vars into separate struct. · c4ad3273
      Dmitry Kovalev authored
      Adding segmentation struct to vp9_seg_common.h. Struct members are from
      macroblockd and VP9Common structs. Moving segmentation related constants
      and enums to vp9_seg_common.h.
      
      Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
      c4ad3273
  16. 10 Jul, 2013 - 4 commits
  17. 08 Jul, 2013 - 4 commits
    • Ronald S. Bultje's avatar
      Don't call encode_sb() for the final of 4-split subpartitions. · a5062cc6
      Ronald S. Bultje authored
      The resulting reconstruction is never used, thus it just wastes CPU
      cycles. Reduces encode time of first 50 frames of bus (speed 0) @
      1500kbps from 2min2.0 to 2min1.2, i.e. a 0.65% overall speedup.
      
      Change-Id: I74755ca3aadc21e2be220f486259060bd4088c45
      a5062cc6
    • Ronald S. Bultje's avatar
      Make frame-wide filter-type decision fully RD-based. · ed995afb
      Ronald S. Bultje authored
      Overall, on all test sets, this gains about +0.2% on all metrics.
      City is a clip where this really hurts (-1.0% on all metrics), I'm
      not quite sure why yet. Maybe interesting to look into in the future.
      
      Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78
      ed995afb
    • Dmitry Kovalev's avatar
      Using mi_cols instead of mb_cols. · b7559258
      Dmitry Kovalev authored
      Eliminating usage of mb-units, switching to mi-units. Adding
      ALIGN_POWER_OF_TWO macro.
      
      Change-Id: I2491c969f713207c062011878b57e4e531818607
      b7559258
    • Deb Mukherjee's avatar
      Implements several heuristics to prune mode search · d9b62160
      Deb Mukherjee authored
      Skips mode searches for intra and compound inter modes depending
      on the best mode so far and the reference frames. The various
      heuristics to be used are selected by bits from a flag. The
      previous direction based intra mode search pruning is also absorbed
      in this framework.
      
      Specifically the flags and their impact are:
      
      1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique
      directional modes and TM_PRED if the best so far is
      an inter mode)
      derfraw300: -0.15%, 10% speedup
      
      2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153
      mode search if the best so far is not one of the closest
      hor/vert/diagonal directions.
      derfraw300: -0.05%, about 9% speedup
      
      3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode
      search if the best so far is an intra mode)
      derfraw300: -0.06%, about 7-8% speedup
      
      4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search
      if the best single ref inter mode does not have the same ref
      as one of the two references being tested in the compound mode)
      derfraw300: -0.56%, about 10% speedup
      
      Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495
      d9b62160
  18. 04 Jul, 2013 - 1 commit
  19. 03 Jul, 2013 - 1 commit