1. 27 Jul, 2013 - 2 commits
    • Ronald S. Bultje's avatar
      Inverse dimension order in token_cost array. · 118ccdcd
      Ronald S. Bultje authored
      This allows us to increment the position at the band-level only as
      we go from one band to the next; more importantly, that allows us to
      use an add instead of multiply instruction, and omit the instruction
      altogether if the band doesn't change from one coef to the next, thus
      being slightly faster (probably more noticeable on systems where a
      multiply is expensive, like arm).
      
      Change-Id: I4343fe35b9f9a47fa00b217bdcbf5f91ff96c381
      118ccdcd
    • Jingning Han's avatar
      Shortcut 8x8/16x16 inverse 2D-DCT · 38fa4871
      Jingning Han authored
      This commit brought back the shortcut implementation of 8x8/16x16
      inverse 2D-DCT. When the eob <= 10, it skips the inverse transform
      operations on row 4:7/4:15 in the first round. For bus_cif at 1000
      kbps, this provides about 2% speed-up at speed 0.
      
      Change-Id: I453e2d72956467d75be4ad8c04b4482ab889d572
      38fa4871
  2. 26 Jul, 2013 - 3 commits
    • Jingning Han's avatar
      Special handle on DC only inverse 8x8 2D-DCT · 325e0aa6
      Jingning Han authored
      This commit enables a special handle for the 8x8 inverse 2D-DCT,
      where only DC coefficient is quantized to be non-zero. For bus_cif
      at 2000 kbps, it provides about 1% speed-up at speed 0.
      
      Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
      325e0aa6
    • Paul Wilkins's avatar
      Auto min and max partition size experiment. · fe5e2a91
      Paul Wilkins authored
      Speed feature experiment to set an upper and lower
      partition size limit based on what has been seen
      in spatial neighbors.
      
      This seems to gives quite reasonable speed gains in local
      (10-15%) and when used with speed 0 the losses are small
      (0.25% derf, 0.35% stdhd). However, for now I am only
      enabling it on speed 1 as there may be clashes with the existing
      temporal partition selection in speed 2.
      
      Using a tighter min / max around the range derived from the
      neighbors increases speed further but at the cost of a
      bigger quality loss. However,  I think this spatial method could
      be combined with data from either the last frame or a variance
      method (or both) to refine the range of minimum and maximum
      partition size. I.e. consider the min and max from spatial and
      temporal neighbors and the variance recommendation.
      
      Change-Id: I1b96bf8b84368d6aad0c7aa600fe141b4f07435f
      fe5e2a91
    • Yunqing Wang's avatar
      Modify static threshold calculation · 52256cdb
      Yunqing Wang authored
      Used 3 * standard_deviation in internal threshold calculation
      instead of fit curve. This actually approached the algorithm
      better.
      For comparison, similar tests were done:
      The overall psnr loss is less than before.
      1. derf set:
      when static-thresh = 1, psnr loss is 0.329%;
      when static-thresh = 500, psnr loss is 0.970%;
      2. stdhd set:
      when static-thresh = 1, psnr loss is 0.922%;
      when static-thresh = 500, psnr loss is 1.307%;
      
      Similar speedup is achieved. For example,
      clip            bitrate  static-thresh psnr    time
      akiyo(cif)       500        0          48.952  5.077s(50f)
      akiyo            500        500        48.866  4.169s(50f)
      
      parkjoy(1080p)   4000       0          30.388  78.20s(30f)
      parkjoy          4000       500        30.367  70.85s(30f)
      
      sunflower(1080p) 4000       0          44.402  74.55s(30f)
      sunflower        4000       500        44.414  68.69s(30f)
      
      Change-Id: Ic78833642ce1911dbbd1cb6c899a2d7e2dfcc1f3
      52256cdb
  3. 25 Jul, 2013 - 7 commits
    • Yunqing Wang's avatar
      Add encoding option --static-thresh · d36852b7
      Yunqing Wang authored
      This option exists in VP8, and it was rewritten in VP9 to support
      skipping on different partition levels. After prediction is done,
      we can check if the residuals in the partition block will be all
      quantized to 0. If this is true, the skip flag is set, and only
      prediction data are needed in reconstruction. Based on DCT's energy
      conservation property, the skipping check can be estimated in
      spatial domain.
      
      The prediction error is calculated and compared to a threshold.
      The threshold is determined by the dequant values, and also
      adjusted by partition sizes. To be precise, the DC and AC parts
      for Y, U, and V planes are checked to decide skipping or not.
      
      Test showed that
      1. derf set:
      when static-thresh = 1, psnr loss is 0.666%;
      when static-thresh = 500, psnr loss is 1.162%;
      2. stdhd set:
      when static-thresh = 1, psnr loss is 1.249%;
      when static-thresh = 500, psnr loss is 1.668%;
      
      For different clips, encoding speedup range is between several
      percentage and 20+% when static-thresh <= 500. For example,
      clip            bitrate  static-thresh psnr    time
      akiyo(cif)       500        0          48.923  5.635s(50f)
      akiyo            500        500        48.863  4.402s(50f)
      
      parkjoy(1080p)   4000       0          30.380  77.54s(30f)
      parkjoy          4000       500        30.384  69.59s(30f)
      
      sunflower(1080p) 4000       0          44.461  85.2s(30f)
      sunflower        4000       500        44.418  78.1s(30f)
      
      Higher static-thresh values give larger speedup with larger
      quality loss.
      
      Change-Id: I857031ceb466ff314ab580ac5ec5d18542203c53
      d36852b7
    • Dmitry Kovalev's avatar
      General cleanups. · 7131cb0e
      Dmitry Kovalev authored
      Removing unused constants, macros, and function declarations. Using
      ROUND_POWER_OF_TWO macro, vp9_zero, vp9_copy where possible. Moving
      #include from *.h to *.c. Merging for loops for motion vectors.
      
      Change-Id: Ic3bf841764a2bb177128bb3a6d7aa8f68229cd13
      7131cb0e
    • Dmitry Kovalev's avatar
      Adding lookup table for size group. · 08fd41cc
      Dmitry Kovalev authored
      Change-Id: Ia6144d77ebed66e0739b62e4d673e26a95aa9550
      08fd41cc
    • Adrian Grange's avatar
      Simplify handling of sub-partition motion vectors · be700e14
      Adrian Grange authored
      Simplified the code that extracts and uses the motion
      vectors for the 4 sub-partitions in rd_pick_partition.
      
      Change-Id: Iaf698ef7ee3aef9edd59015e1ae065dd359b17d9
      be700e14
    • Jingning Han's avatar
      Make coeff_optimize initialized per-plane · 2f58faff
      Jingning Han authored
      This commit makes the initialization of trellis coeff optimization
      a per-plane operation, thereby eliminating the redundant steps in
      encode_sby and encode_sbuv. It makes the encoder at speed 0 slightly
      faster.
      
      Change-Id: Iffe9faca6a109dafc0dd69dc7273cbdec19b17cd
      2f58faff
    • Dmitry Kovalev's avatar
      Removing vp9_adapt_mode_context function. · 47d61f00
      Dmitry Kovalev authored
      Moving code from vp9_adapt_mode_context to vp9_adapt_mode_probs.
      
      Change-Id: I60829c30b28968cd813551ef3a206dfb98d323c9
      47d61f00
    • Yaowu Xu's avatar
      fix a bug where flags are not reset · 3e386aef
      Yaowu Xu authored
      The feature that uses small partition results as a measure to skip
      mode evaluation at larger partition requires the flags to be reset.
      The reset was missing in the code path that calls rd_use_partition().
      
      Change-Id: Ia0a3a0aee1a862b6e2333d596808db7c48033d50
      3e386aef
  4. 24 Jul, 2013 - 7 commits
  5. 23 Jul, 2013 - 10 commits
    • Jingning Han's avatar
      Unify the use of encode_b_args/optimize_block_args · ab77828b
      Jingning Han authored
      The struct optimize_block_args is defined same as encode_b_args.
      Remove this redundant definition, and use encode_b_args consistently.
      
      Change-Id: I1703aeeb3bacf92e98a34f4355202712110173d9
      ab77828b
    • Dmitry Kovalev's avatar
      Removing LOW_PRECISION_MV_UPDATE define. · 8d13b0d1
      Dmitry Kovalev authored
      Change-Id: I78d16ee758e1fae0200b746f00031f6d9c6d6ce7
      8d13b0d1
    • Adrian Grange's avatar
      Rolled-up several for loops into one · 646edbc1
      Adrian Grange authored
      Several consecutive for loops executed over the same
      index range, so I rolled them into one.
      
      Change-Id: I5cfcc8c38c738478965768409cca9d09adf224e1
      646edbc1
    • Dmitry Kovalev's avatar
      Removing vp9_is_interpolating_filter array. · db7f5d28
      Dmitry Kovalev authored
      All filters are interpolating now, so we don't need this array, all
      values from this array are evaluated to true.
      
      Change-Id: I9af6d8219ae0eb984063cd15e4e2296374ae4961
      db7f5d28
    • Jingning Han's avatar
      Make xform_quant operations tx_type independent · e9e2fe8e
      Jingning Han authored
      The xform_quant() module is only used by inter modes, hence removing
      the redundant switches therein conditioned on tx_type.
      
      Change-Id: Ib87ce5b2f2e4cbf3ceb133a1108afa173c933a3f
      e9e2fe8e
    • Jingning Han's avatar
      Skip inverse transform when eob is zero · 0359ad7f
      Jingning Han authored
      When all the transform coefficients were quantized to zero, skip
      the inverse transform operation. For bus_cif at 1000 kbps, the
      runtime goes from 154967ms -> 149842ms, i.e., about 3% speed-up,
      at speed 0.
      
      Change-Id: Ic0a813fff5e28972d4888ee42d8747846a6c3cc6
      0359ad7f
    • Scott LaVarnway's avatar
      pack_inter_mode_mvs cleanup · 7bc294a3
      Scott LaVarnway authored
      xd->mode_info_context is set to m prior to this call.
      
      Change-Id: Ibc442529961750c29ccf0c6cae08cb2b0431415f
      7bc294a3
    • Jim Bankoski's avatar
      clean up bw, bh · 86a9dec7
      Jim Bankoski authored
      many structures use bw and bh and they have different meanings.   This cl attempts
      to start this clean up and remove unneccessary 2 step look up log and then
      shift operations...
      
      also removed partition type multiple operation code in bitstream.c.
      
      Change-Id: I7e03e552bdfc0939738e430862e3073d30fdd5db
      86a9dec7
    • Paul Wilkins's avatar
      Renaming of segment constants. · 32042af1
      Paul Wilkins authored
      Renamed:
        MAX_MB_SEGMENTS to MAX_SEGMENTS
        MB_SEG_TREE_PROBS to SEG_TREE_PROBS
      
      The minimum unit for segmentation in the segment map
      is now 8x8 so it is misleading to use MB_ as macro-block
      traditionally refers to a 16x16 region.
      
      Change-Id: I0b55a6f0426bb46dd13435fcfa5bae0a30a7fa22
      32042af1
    • James Zern's avatar
      vp9: make some static tables const · 3c8cce35
      James Zern authored
      Change-Id: I8bcae51271673da8755c66a51aea005dfe6a3739
      3c8cce35
  6. 22 Jul, 2013 - 8 commits
    • Ronald S. Bultje's avatar
      More optimizations for cost_coeffs(). · e20fcd95
      Ronald S. Bultje authored
      4x4:    163 ->  123 cycles (33% faster)
      8x8:    491 ->  399 cycles (23% faster)
      16x16: 1889 -> 1763 cycles (7% faster)
      32x32: 8311 -> 8180 cycles (1.6% faster)
      
      Overall encoding time of first 50 frames of bus (speed 0) @ 1500kbps
      goes from 1min4.33 to 1min3.00, i.e. 2.11% faster.
      
      Change-Id: Ib52d1dbb5649b14de769d3e7a74af67440b5284f
      e20fcd95
    • Dmitry Kovalev's avatar
      Adding update_tx_counts function. · b2fc6fa9
      Dmitry Kovalev authored
      Moving common encoder/decoder code to update_tx_counts. Also renaming
      vp9_get_pred_probs_tx_size to get_tx_probs2 and adding get_tx_probs to
      call vp9_get_pred_context_tx_size inside read_selected_tx_size only once
      (twice before).
      
      Change-Id: Ia50247f3893de88ef8e9041b0d44be44a40aaa4d
      b2fc6fa9
    • Yaowu Xu's avatar
      fix a build error · fc186dca
      Yaowu Xu authored
      Change-Id: I3b05687f439ff6a7c426d2c97a6c58c831fa51ac
      fc186dca
    • Deb Mukherjee's avatar
      Diamond search change to accelerate movement · a1e2d50b
      Deb Mukherjee authored
      Optional change in diamond search to continue in the best move
      direction until that move turns worse.
      
      This is still WIP since the exact way the new method is to be used is
      under investigation. One option is to make it an option in diamond
      search and use it only when motion is large.
      
      Overall slightly positive on derfraw300 +0.02%, stdhdraw +0.13%,
      but works a lot better for high motion sequences (ex. football : +1%).
      
      Change-Id: If88e01a6021daa0cda934680cdc70be1ee04f798
      a1e2d50b
    • Jingning Han's avatar
      Optimize operation flow in sub8x8 rd loop · 409e77f2
      Jingning Han authored
      Stack the rate-distortion statistics in the sub8x8 rd loop. This allows
      the encoder to skip the forward transform, quantization, and coeff cost
      estimation, in the sub8x8 rd optimization search, if the motion
      vector(s) are of integer pixel value, and have been tested in the
      previous prediction filter type rd loops of the same block.
      
      This gives about 2% speed-up for bus_cif at 2000 kpbs, for speed 0.
      Its efficacy depends how frequently the motion search will select an
      integer motion vector.
      
      Change-Id: Iee15d4283ad4adea05522c1d40b198b127e6dd97
      409e77f2
    • Paul Wilkins's avatar
      Re-order mode search in rd. · 1d189d64
      Paul Wilkins authored
      Mode search order in rd loop changed to better reflect
      observed hit counts.
      
      Also some adjustment of the baseline mode rd thresholds
      to reflect the order change and observed frequencies.
      
      Change-Id: I47a131cc83e11551df8add6d6d8d413d78d3a63c
      1d189d64
    • Jim Bankoski's avatar
      fix left over overflow · 2ac8b50c
      Jim Bankoski authored
      This cl fixes issues rbultje brought up. that I somehow neglected when I
      submitted yaowu's patch.
      
      Change-Id: I07ad18796317822510b96e951c88d29f194a3c2e
      2ac8b50c
    • Paul Wilkins's avatar
      Fix build error. · 888375d2
      Paul Wilkins authored
      When CONFIG_POSTPROC is set there was a now
      invalid reference to cm->filter_level.
      
      Changed to cpi->mb.e_mbd.lf.filter_level in line with
      change Iaf5fb71c33719cdfa1b991f671caf071be9ea035
      
      Change-Id: If746e60044903f7ba8d0d346225b3d015226c7d0
      888375d2
  7. 21 Jul, 2013 - 1 commit
    • Jingning Han's avatar
      Skip buffer update in sub8x8 rd loop · c725502b
      Jingning Han authored
      This commit allows the encoder to skip a few buffer update steps in
      rd_pick_best_mbsegmentation, when early breakout has been triggered
      in the rd_check_segment_txsize. It provides about 1% speed-up for
      bus_cif at 2000 kbps, in the settings of speed 0.
      
      Change-Id: Ica034f10a24dec572b397d8389a2b81020ebc0b9
      c725502b
  8. 20 Jul, 2013 - 2 commits
    • Yaowu Xu's avatar
      added checks to prevent rate/distortion overflow · ea284d62
      Yaowu Xu authored
      At speed 2, due to the threshold scheme used, it is possible the rate
      and distortion assigned with INT_MAX value. The patch added checking
      to prevent the INT_MAX value is used in further calculation of RD
      scores. The patch also changed the assertion in rd_use_partition() to
      be mirror similar assertion in rd_pick_partition().
      
      Change-Id: Idb52c543cc1e10abdf6e6a5d6e9cb535a42214dc
      ea284d62
    • Dmitry Kovalev's avatar
      Removing pre probabilities from FRAME_CONTEXT. · 7e703de7
      Dmitry Kovalev authored
      Using cm->frame_contexts[cm->frame_context_idx] as source of previous
      probabilities.
      
      Change-Id: Ie03778acf0e7bebdc3a1f6a51854d4a0712f24a1
      7e703de7