1. 16 Jul, 2013 - 1 commit
    • Ronald S. Bultje's avatar
      Inline vp9_quantize() in xform_quant(). · 1ff94fea
      Ronald S. Bultje authored
      Cycle times:
      4x4:    151 to  131 cycles (15% faster)
      8x8:    334 to  306 cycles (9% faster)
      16x16: 1401 to 1368 cycles (2.5% faster)
      32x32: 7403 to 7367 cycles (0.5% faster)
      
      Total encode time of first 50 frames of bus @ 1500kbps (speed 0)
      goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup.
      
      Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
      1ff94fea
  2. 11 Jul, 2013 - 1 commit
    • Dmitry Kovalev's avatar
      Moving segmentation related vars into separate struct. · c4ad3273
      Dmitry Kovalev authored
      Adding segmentation struct to vp9_seg_common.h. Struct members are from
      macroblockd and VP9Common structs. Moving segmentation related constants
      and enums to vp9_seg_common.h.
      
      Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03
      c4ad3273
  3. 01 Jul, 2013 - 2 commits
    • Ronald S. Bultje's avatar
      Update quantize SSSE3 SIMD to cover 32x32 transform case also. · c8defcfd
      Ronald S. Bultje authored
      Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to
      2min10.1, i.e. a 2.3% overall speed increase.
      
      Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87
      c8defcfd
    • Ronald S. Bultje's avatar
      Quantize (64-bit only, for now) SSSE3 SIMD. · 7353ceab
      Ronald S. Bultje authored
      Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps
      goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is
      x86-64 only, it needs some minor modifications to be 32bit compatible,
      because it uses 15 xmm registers, whereas 32bit only has 8.
      
      Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904
      7353ceab
  4. 28 Jun, 2013 - 1 commit
    • Ronald S. Bultje's avatar
      Make coefficient skip condition an explicit RD choice. · af660715
      Ronald S. Bultje authored
      This commit replaces zrun_zbin_boost, a method of biasing non-zero
      coefficients following runs of zero-coefficients to be rounded towards
      zero, with an explicit skip-block choice in the RD loop.
      
      The logic is basically that if individual coefficients should be rounded
      towards zero (from a RD point of view), the trellis/optimize loop should
      take care of it. If whole blocks should be zero (from a RD point of
      view), a single RD check is much more efficient than a complete
      serialization of the quantization loop.
      
      Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim.
      SIMD for quantize will follow in a separate patch. Results for other
      test sets pending.
      
      Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4
      af660715
  5. 27 Jun, 2013 - 1 commit
  6. 19 Jun, 2013 - 1 commit
    • Yunqing Wang's avatar
      Add two-pass quantization · b5bf7b13
      Yunqing Wang authored
      Optimized the quantization function by making it a two-pass
      process. The first pass does a quick checking of the transform
      coefficients against the base ZBIN, and only keep the good
      enough set of coefficients for quantization. A skipping
      check is added. If all coefficients are within the base ZBIN, no
      quantization is needed. The second pass is the actual quantization
      pass, which only processes the coefficient subset determined
      in first pass. This reduces the computation. Furthermore, an
      alternitive method is used for large transform size, which often
      has sparse nonzero quantized coefficients.
      
      Overall, the encoder speedup is about 4%. The quantization function
      itself gets 20% faster.
      
      Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22
      b5bf7b13
  7. 23 May, 2013 - 1 commit
  8. 17 May, 2013 - 1 commit
    • John Koleszar's avatar
      Initial version of alpha channel support · 679e4abd
      John Koleszar authored
      This is a mostly-working implementation of an extra channel in the
      bitstream. Configure with --enable-alpha to test. Notable TODOs:
      
       - Add extra channel to all mismatch tests, PSNR, SSIM, etc
       - Configurable subsampling
       - Variable number of planes (currently always uses all 4)
       - Loop filtering
       - Per-plane lossless quantizer
       - ARNR support
      
      This implementation just uses the same contents as the Y channel
      for the A channel, due to lack of content and general pain in
      playing back 4 channel content. A later patch will use the actual
      alpha channel passed in from outside the codec.
      
      Change-Id: Ibf81f023b1c570bd84b3064e9b4b8ae52e087592
      679e4abd
  9. 07 May, 2013 - 3 commits
  10. 03 May, 2013 - 1 commit
    • John Koleszar's avatar
      Separate transform and quant from vp9_encode_sb · 4529c68b
      John Koleszar authored
      This allows removing a large number of transform size specific functions,
      as well as supporting 444/alpha by routing all code through the
      subsampling-aware path.
      
      Change-Id: Ieb085cebe9f37f24fc24de179898b22abfda08a4
      4529c68b
  11. 02 May, 2013 - 1 commit
    • John Koleszar's avatar
      Create common vp9_encode_sb{,y} · 3f4e8063
      John Koleszar authored
      Creates a common encode (subtract, transform, quantize, optimize,
      inverse transform, reconstruct) function for all sb sizes, including
      the old 16x16 path.
      
      Change-Id: I964dff1ea7a0a5c378046a069ad83495f54df007
      3f4e8063
  12. 01 May, 2013 - 1 commit
  13. 30 Apr, 2013 - 2 commits
    • Ronald S. Bultje's avatar
      sb8x8 integration in rd loop. · d068d869
      Ronald S. Bultje authored
      Work-in-progress, not yet ready for review. TODO items:
      - bitstream writing (encoder) and reading (decoder)
      - decoder reconstruction
      
      Change-Id: I5afb7284e7e0480847b47cd0097cb469433c9081
      d068d869
    • Dmitry Kovalev's avatar
      Adding vp9_get_qindex function. · 3f6c6ffc
      Dmitry Kovalev authored
      Moving common code from encoder and decoder to vp9_get_qindex function.
      Also moving quant-related constants from vp9_onyxc_int.h to
      vp9_quant_common.h.
      
      Change-Id: I70c5bfbaa1c8bf00fde0bfc459d077f88b6d46c8
      3f6c6ffc
  14. 26 Apr, 2013 - 1 commit
  15. 25 Apr, 2013 - 3 commits
  16. 24 Apr, 2013 - 2 commits
  17. 23 Apr, 2013 - 1 commit
    • John Koleszar's avatar
      Convert coeff to per-plane MACROBLOCK data · 138ec38c
      John Koleszar authored
      This commit moves the coeff storage from the MACROBLOCK struct to its
      per-plane part. The next commit will remove the coeff member from the
      BLOCK structure so that it is consistently accessed per-plane.
      
      Also refactors vp9_sb_block_error_c and vp9_sb_uv_block_error_c to be
      variable subsampling aware.
      
      Change-Id: I18c30f87f27c3a012119b6c1970d5fa499804455
      138ec38c
  18. 22 Apr, 2013 - 2 commits
  19. 16 Apr, 2013 - 1 commit
  20. 15 Apr, 2013 - 1 commit
  21. 11 Apr, 2013 - 1 commit
    • Ronald S. Bultje's avatar
      Remove unused macroblock versions of reconstruction functions. · 13e41ba4
      Ronald S. Bultje authored
      More specifically, remove vp9_quantize_mb*, vp9_optimize_mb*,
      vp9_inverse_transform_mb* and vp9_transform_mb*. Instead, use the
      generic _sb* functions that take a size argument, and call them with
      BLOCK_SIZE_MB16X16.
      
      Change-Id: I33024afea95d3a23ffbc1df7da426e4645110f29
      13e41ba4
  22. 10 Apr, 2013 - 1 commit
    • Ronald S. Bultje's avatar
      Make SB coding size-independent. · a3874850
      Ronald S. Bultje authored
      Merge sb32x32 and sb64x64 functions; allow for rectangular sizes. Code
      gives identical encoder results before and after. There are a few
      macros for rectangular block sizes under the sbsegment experiment; this
      experiment is not yet functional and should not yet be used.
      
      Change-Id: I71f93b5d2a1596e99a6f01f29c3f0a456694d728
      a3874850
  23. 09 Apr, 2013 - 1 commit
    • Dmitry Kovalev's avatar
      Fixing upper case names. · c34f6fcb
      Dmitry Kovalev authored
      Renaming Y1dequant to y_dequant, UVdequant to uv_dequant, QIndex to qindex.
      
      Change-Id: I1c356e5f886deb3f8807dc212de9799b55b09d58
      c34f6fcb
  24. 05 Apr, 2013 - 1 commit
    • John Koleszar's avatar
      Move EOB to per-plane data · 05a79f2f
      John Koleszar authored
      Continue migrating data from BLOCKD/MACROBLOCKD to the per-plane
      structures.
      
      Change-Id: Ibbfa68d6da438d32dcbe8df68245ee28b0a2fa2c
      05a79f2f
  25. 04 Apr, 2013 - 1 commit
  26. 26 Mar, 2013 - 2 commits
    • Ronald S. Bultje's avatar
      Add col/row-based coefficient scanning patterns for 1D 8x8/16x16 ADSTs. · d9094d8f
      Ronald S. Bultje authored
      These are mostly just for experimental purposes. I saw small gains (in
      the 0.1% range) when playing with this on derf.
      
      Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc
      d9094d8f
    • Deb Mukherjee's avatar
      Modeling default coef probs with distribution · fd18d5df
      Deb Mukherjee authored
      Replaces the default tables for single coefficient magnitudes with
      those obtained from an appropriate distribution. The EOB node
      is left unchanged. The model is represeted as a 256-size codebook
      where the index corresponds to the probability of the Zero or the
      One node. Two variations are implemented corresponding to whether
      the Zero node or the One-node is used as the peg. The main advantage
      is that the default prob tables will become considerably smaller and
      manageable. Besides there is substantially less risk of over-fitting
      for a training set.
      
      Various distributions are tried and the one that gives the best
      results is the family of Generalized Gaussian distributions with
      shape parameter 0.75. The results are within about 0.2% of fully
      trained tables for the Zero peg variant, and within 0.1% of the
      One peg variant.
      
      The forward updates are optionally (controlled by a macro)
      model-based, i.e. restricted to only convey probabilities from the
      codebook. Backward updates can also be optionally (controlled by
      another macro) model-based, but is turned off by default. Currently
      model-based forward updates work about the same as unconstrained
      updates, but there is a drop in performance with backward-updates
      being model based.
      
      The model based approach also allows the probabilities for the key
      frames to be adjusted from the defaults based on the base_qindex of
      the frame. Currently the adjustment function is a placeholder that
      adjusts the prob of EOB and Zero node from the nominal one at higher
      quality (lower qindex) or lower quality (higher qindex) ends of the
      range. The rest of the probabilities are then derived based on the
      model from the adjusted prob of zero.
      
      Change-Id: Iae050f3cbcc6d8b3f204e8dc395ae47b3b2192c9
      fd18d5df
  27. 07 Mar, 2013 - 3 commits
    • Dmitry Kovalev's avatar
      Consistent usage of ROUND_POWER_OF_TWO macro. · 3603dfb6
      Dmitry Kovalev authored
      Change-Id: I44660975e9985310d8c654c158ee7a61291b5a08
      3603dfb6
    • Ronald S. Bultje's avatar
      Re-add support for ADST in superblocks. · d3724abe
      Ronald S. Bultje authored
      This also changes the RD search to take account of the correct block
      index when searching (this is required for ADST positioning to work
      correctly in combination with tx_select).
      
      Change-Id: Ie50d05b3a024a64ecd0b376887aa38ac5f7b6af6
      d3724abe
    • Deb Mukherjee's avatar
      Coding con-zero count rather than EOB for coeffs · eb6ef241
      Deb Mukherjee authored
      This patch revamps the entropy coding of coefficients to code first
      a non-zero count per coded block and correspondingly remove the EOB
      token from the token set.
      
      STATUS:
      Main encode/decode code achieving encode/decode sync - done.
      Forward and backward probability updates to the nzcs - done.
      Rd costing updates for nzcs - done.
      Note: The dynamic progrmaming apporach used in trellis quantization
      is not exactly compatible with nzcs. A suboptimal approach has been
      used instead where branch costs are updated to account for changes
      in the nzcs.
      
      TODO:
      Training the default probs/counts for nzcs
      
      Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
      eb6ef241
  28. 05 Mar, 2013 - 1 commit
    • Ronald S. Bultje's avatar
      Make superblocks independent of macroblock code and data. · 111ca421
      Ronald S. Bultje authored
      Split macroblock and superblock tokenization and detokenization
      functions and coefficient-related data structs so that the bitstream
      layout and related code of superblock coefficients looks less like it's
      a hack to fit macroblocks in superblocks.
      
      In addition, unify chroma transform size selection from luma transform
      size (i.e. always use the same size, as long as it fits the predictor);
      in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma
      transform will now use the 16x16 (instead of the 8x8) chroma transform,
      and 64x64 superblocks using the 32x32 luma transform will now use the
      32x32 (instead of the 16x16) chroma transform.
      
      Lastly, add a trellis optimize function for 32x32 transform blocks.
      
      HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's
      a few negative points here and there that I might want to analyze
      a little closer.
      
      Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
      111ca421
  29. 27 Feb, 2013 - 1 commit