1. 08 Apr, 2013 1 commit
    • Jingning Han's avatar
      Clamp inferred motion vectors only · 12bf0796
      Jingning Han authored
      Clamp only the motion vectors inferred from neighboring reference
      macroblocks. The motion vectors obtained through motion search in
      NEWMV mode are constrained during the search process, which allows
      a relatively larger referencing region than the inferred mvs.
      Hence further clamping the best mv provided by the motion search may
      affect the efficacy of NEWMV mode.
      
      Synchronized the decoding process. The decoded mvs in NEWMV modes
      should be guaranteed to fit in the effective range. Put a mv range
      clamping function there for security purpose.
      
      This improves the coding performance of high motion sequences, e.g.,
      derf set:
      foreman 0.233%
      husky   0.175%
      icd     0.135%
      mother_daughter 0.337%
      pamphlet        0.561%
      
      stdhd set:
      blue_sky 0.408%
      city     0.455%
      also saw sunflower goes down by -0.469%.
      
      Change-Id: I3fcbba669e56dab779857a8126a91b926e899cb5
      12bf0796
  2. 05 Apr, 2013 2 commits
    • Ronald S. Bultje's avatar
      Remove full-pixel-related code. · 36c3a67c
      Ronald S. Bultje authored
      This is a VP8-only feature (part of profile 3) that is unsupported in
      VP9.
      
      Change-Id: I78016eede8d9c834d44d4c517f3e8b8fc2a378b1
      36c3a67c
    • John Koleszar's avatar
      Move EOB to per-plane data · 05a79f2f
      John Koleszar authored
      Continue migrating data from BLOCKD/MACROBLOCKD to the per-plane
      structures.
      
      Change-Id: Ibbfa68d6da438d32dcbe8df68245ee28b0a2fa2c
      05a79f2f
  3. 04 Apr, 2013 2 commits
  4. 28 Mar, 2013 2 commits
    • Deb Mukherjee's avatar
      Framework changes in nzc to allow more flexibility · fe9b5143
      Deb Mukherjee authored
      The patch adds the flexibility to use standard EOB based coding
      on smaller block sizes and nzc based coding on larger blocksizes.
      The tx-sizes that use nzc based coding and those that use EOB based
      coding are controlled by a function get_nzc_used().
      By default, this function uses nzc based coding for 16x16 and 32x32
      transform blocks, which seem to bridge the performance gap
      substantially.
      
      All sets are now lower by 0.5% to 0.7%, as opposed to ~1.8% before.
      
      Change-Id: I06abed3df57b52d241ea1f51b0d571c71e38fd0b
      fe9b5143
    • Ronald S. Bultje's avatar
      Fix mix-up in pt token indexing. · 9eea9fa2
      Ronald S. Bultje authored
      This fixes uninitialized reads in the trellis, and probably makes the
      trellis do something again.
      
      Change-Id: Ifac8dae9aa77574bde0954a71d4571c5c556df3c
      9eea9fa2
  5. 27 Mar, 2013 1 commit
    • Dmitry Kovalev's avatar
      Removing redundant function arguments. · 17cddb4e
      Dmitry Kovalev authored
      Almost all arguments for vp9_build_inter32x32_predictors_sb and
      vp9_build_inter64x64_predictors_sb can be deduced from the first macroblock
      argument.
      
      Change-Id: I5d477a607586d05698d5b3b9b9bc03891dd3fe83
      17cddb4e
  6. 26 Mar, 2013 4 commits
    • Deb Mukherjee's avatar
      Implicit weighted prediction experiment · 23144d23
      Deb Mukherjee authored
      Adds an experiment to use a weighted prediction of two INTER
      predictors, where the weight is one of (1/4, 3/4), (3/8, 5/8),
      (1/2, 1/2), (5/8, 3/8) or (3/4, 1/4), and is chosen implicitly
      based on consistency of the predictors to the already
      reconstructed pixels to the top and left of the current macroblock
      or superblock.
      
      Currently the weighting is not applied to SPLITMV modes, which
      default to the usual (1/2, 1/2) weighting. However the code is in
      place controlled by a macro. The same weighting is used for Y and
      UV components, where the weight is derived from analyzing the Y
      component only.
      
      Results (over compound inter-intra experiment)
      derf: +0.18%
      yt: +0.34%
      hd: +0.49%
      stdhd: +0.23%
      
      The experiment suggests bigger benefit for explicitly signaled weights.
      
      Change-Id: I5438539ff4485c5752874cd1eb078ff14bf5235a
      23144d23
    • Ronald S. Bultje's avatar
      Add col/row-based coefficient scanning patterns for 1D 8x8/16x16 ADSTs. · d9094d8f
      Ronald S. Bultje authored
      These are mostly just for experimental purposes. I saw small gains (in
      the 0.1% range) when playing with this on derf.
      
      Change-Id: Ib21eed477bbb46bddcd73b21c5c708a5b46abedc
      d9094d8f
    • Ronald S. Bultje's avatar
      Redo banding for all transforms. · 3120dbdd
      Ronald S. Bultje authored
      Now that the first AC coefficient in both directions use the same DC
      as their context, there no longer is a purpose in letting both have
      their own band. Merging these two bands allows us to split bands for
      some of the very high-frequency AC bands.
      
      In addition, I'm redoing the banding for the 1D-ADST col/row scans. I
      don't think the old banding made any sense at all (it merged the last
      coefficient of the first row/col in the same band as the first two of
      the second row/col), which was clearly an oversight from the band being
      applied in scan-order (rather than in their actual position). Now,
      coefficients at the same position will be in the same band, regardless
      what scan order is used. I think this makes most sense for the purpose
      of banding, which is basically "predict energy for this coefficient
      depending on the energy of context coefficients" (i.e. pt).
      
      After full re-training, together with previous patch, derf gains about
      1.2-1.3%, and hd/stdhd gain about 0.9-1.0%.
      
      Change-Id: I7a0cc12ba724e88b278034113cb4adaaebf87e0c
      3120dbdd
    • Ronald S. Bultje's avatar
      Use above/left (instead of previous in scan-order) as token context. · 790fb132
      Ronald S. Bultje authored
      Pearson correlation for above or left is significantly higher than for
      previous-in-scan-order (absolute values depend on position in scan, but
      in general, we gain about 0.1-0.2 by using either above or left; using
      both basically just makes this even better). For eob branch skipping,
      we continue to use the previous token in scan order.
      
      This helps about 0.9% on derf after re-training on a limited data set.
      Full re-training and results on larger-resolution clips are pending.
      
      Note that this commit breaks trellis, so we can probably get further
      gains out of it by fixing trellis at some later point.
      
      Change-Id: Iead68e296fc3a105cca746b5e3da9555d6010cfe
      790fb132
  7. 20 Mar, 2013 1 commit
  8. 18 Mar, 2013 2 commits
    • Paul Wilkins's avatar
      Changes to rd error_per_bit calculation. · d8ffee45
      Paul Wilkins authored
      Specifically changes to retain more precision
      especially at low Q through to the point of use.
      
      Change-Id: Ief5f010f2ca4daaabef49520e7edb46c35daf397
      d8ffee45
    • Paul Wilkins's avatar
      Adapt ARNR filter length and strength. · cdb322dd
      Paul Wilkins authored
      Adjust the filter length and strength for each
      ARF group based on a measure of difficulty (the boost)
      and the active q range.
      
      Remove lower limit on RDMULT value.
      
      Average gains on the different sets in range 0.4%-0.9%.
      However the ARNR changes give a very big boost on a
      few clips.
      
      Eg. Soccer ~5%, in derf set and Cyclist ~ 10% in the std-hd set
      
      Change-Id: I2078d78798e27ad2bcc2b32d703ea37b67412ec4
      cdb322dd
  9. 16 Mar, 2013 1 commit
    • Deb Mukherjee's avatar
      Context-pred fix to not use top/left on edges · b1921b2f
      Deb Mukherjee authored
      This fix resolves some of the mismatch issues being seen
      recently. While this is the right thing to do when tiling
      is used for this experiment, it is not the underlying cause
      of the the mismatches.
      Something else is causing writing outside of the allowable
      frame area in the encoder leading to this mismatch.
      
      Change-Id: If52c6f67555aa18ab8762865384e323b47237277
      b1921b2f
  10. 14 Mar, 2013 1 commit
    • John Koleszar's avatar
      Fix pulsing issue with scaling · 9b7be888
      John Koleszar authored
      Updates the YV12_BUFFER_CONFIG structure to be crop-aware. The
      exiting width/height parameters are left unchanged, storing the
      width and height algined to a 16 byte boundary. The cropped
      dimensions are added as new fields.
      
      This fixes a nasty visual pulse when switching between scaled and
      unscaled frame dimensions due to a mismatch between the scaling
      ratio and the 16-byte aligned sizes.
      
      Change-Id: Id4a3f6aea6b9b9ae38bdfa1b87b7eb2cfcdd57b6
      9b7be888
  11. 09 Mar, 2013 1 commit
    • Deb Mukherjee's avatar
      Continued experiment with nonzero count · a28139c8
      Deb Mukherjee authored
      Adds probability updates for extra bits for the nzcs, code for
      getting nzc stats, plus some minor cleanups and fixes.
      
      Change-Id: If2814e7f04fb52f5025ad9f400f3e6c50a00b543
      a28139c8
  12. 08 Mar, 2013 2 commits
    • Jingning Han's avatar
      Extend diff MV limit from +/-256 to +/-1024 · 2a5278bd
      Jingning Han authored
      Increase the motion search range by 4x. Change MV_CLASS tree of the
      entropy coding to allow two additional mv classes to cover the
      extended motion vector limit. The codec determines the effective
      motion search range conditioned on the actual frame dimension.
      
      It provides coding gains:
      
      stdhd 0.39%
      yt    0.56%
      hd    0.47%
      
      Major coding performance gains are packed in several sequences with
      intense motion activities, e.g., ped_1080p gains 7% at high bit-rates,
      and on average 3%.
      
      TODO: Need to further tune the rate control and motion search units.
      
      Change-Id: Ib842540a6796fbee5a797809433ef6a477c6d78d
      2a5278bd
    • Ronald S. Bultje's avatar
      Add support for tx_select in i8x8 encoding in keyframes. · b41dee84
      Ronald S. Bultje authored
      Also enable tx_select for keyframes.
      
      Change-Id: Iadb1231d9fa7af0c8dce3d9b41830b93a302479e
      b41dee84
  13. 07 Mar, 2013 2 commits
    • Ronald S. Bultje's avatar
      Re-add support for ADST in superblocks. · d3724abe
      Ronald S. Bultje authored
      This also changes the RD search to take account of the correct block
      index when searching (this is required for ADST positioning to work
      correctly in combination with tx_select).
      
      Change-Id: Ie50d05b3a024a64ecd0b376887aa38ac5f7b6af6
      d3724abe
    • Deb Mukherjee's avatar
      Coding con-zero count rather than EOB for coeffs · eb6ef241
      Deb Mukherjee authored
      This patch revamps the entropy coding of coefficients to code first
      a non-zero count per coded block and correspondingly remove the EOB
      token from the token set.
      
      STATUS:
      Main encode/decode code achieving encode/decode sync - done.
      Forward and backward probability updates to the nzcs - done.
      Rd costing updates for nzcs - done.
      Note: The dynamic progrmaming apporach used in trellis quantization
      is not exactly compatible with nzcs. A suboptimal approach has been
      used instead where branch costs are updated to account for changes
      in the nzcs.
      
      TODO:
      Training the default probs/counts for nzcs
      
      Change-Id: I951bc1e22f47885077a7453a09b0493daa77883d
      eb6ef241
  14. 05 Mar, 2013 1 commit
    • Ronald S. Bultje's avatar
      Make superblocks independent of macroblock code and data. · 111ca421
      Ronald S. Bultje authored
      Split macroblock and superblock tokenization and detokenization
      functions and coefficient-related data structs so that the bitstream
      layout and related code of superblock coefficients looks less like it's
      a hack to fit macroblocks in superblocks.
      
      In addition, unify chroma transform size selection from luma transform
      size (i.e. always use the same size, as long as it fits the predictor);
      in practice, this means 32x32 and 64x64 superblocks using the 16x16 luma
      transform will now use the 16x16 (instead of the 8x8) chroma transform,
      and 64x64 superblocks using the 32x32 luma transform will now use the
      32x32 (instead of the 16x16) chroma transform.
      
      Lastly, add a trellis optimize function for 32x32 transform blocks.
      
      HD gains about 0.3%, STDHD about 0.15% and derf about 0.1%. There's
      a few negative points here and there that I might want to analyze
      a little closer.
      
      Change-Id: Ibad7c3ddfe1acfc52771dfc27c03e9783e054430
      111ca421
  15. 02 Mar, 2013 1 commit
    • Dmitry Kovalev's avatar
      Code cleanup. · 135428e9
      Dmitry Kovalev authored
      Removing redundant 'extern' keyword, lowercase variable names.
      
      Change-Id: I608e8d8579aba8981f5fac3493f77b4481b13808
      135428e9
  16. 27 Feb, 2013 4 commits
    • Ronald S. Bultje's avatar
      Move eob from BLOCKD to MACROBLOCKD. · e8c74e2b
      Ronald S. Bultje authored
      Consistent with VP8.
      
      Change-Id: I8c316ee49f072e15abbb033a80e9c36617891f07
      e8c74e2b
    • John Koleszar's avatar
      Use ref_frame_map vice active_ref_idx on the encoder · 800ad0b8
      John Koleszar authored
      This patch makes the encoder's use of ref_frame_map and active_ref_idx
      consistent with the decoder. ref_frame_map[] maps a reference buffer
      index to its actual location in the yv12_fb array, since many
      references may share an underlying buffer. active_ref_idx[] mirrors
      cpi->{lst,gld,alt}_fb_idx, holding the active references in each
      slot.
      
      This also fixes a bug in setup_buffer_inter() where the incorrect
      reference was used to populate the scaling factors.
      
      Change-Id: Id3728f6d77cffcd27c248903bf51f9c3e594287e
      800ad0b8
    • John Koleszar's avatar
      Combined motion compensation with scaled predictors · 77f88e97
      John Koleszar authored
      This patch extends the previous support for using references of a
      different resolution in ZEROMV mode to all inter prediction modes.
      Subpixel based best-mv scoring is disabled when the reference frame
      differs in resolution from the current frame.
      
      Change-Id: Id4dc3e5e6692de98d9857fd56bfad3ac57e944ac
      77f88e97
    • John Koleszar's avatar
      Spatial resamping of ZEROMV predictors · eb939f45
      John Koleszar authored
      This patch allows coding frames using references of different
      resolution, in ZEROMV mode. For compound prediction, either
      reference may be scaled.
      
      To test, I use the resize_test and enable WRITE_RECON_BUFFER
      in vp9_onyxd_if.c. It's also useful to apply this patch to
      test/i420_video_source.h:
      
        --- a/test/i420_video_source.h
        +++ b/test/i420_video_source.h
        @@ -93,6 +93,7 @@ class I420VideoSource : public VideoSource {
      
           virtual void FillFrame() {
             // Read a frame from input_file.
        +    if (frame_ != 3)
             if (fread(img_->img_data, raw_sz_, 1, input_file_) == 0) {
               limit_ = frame_;
             }
      
      This forces the frame that the resolution changes on to be coded
      with no motion, only scaling, and improves the quality of the
      result.
      
      Change-Id: I1ee75d19a437ff801192f767fd02a36bcbd1d496
      eb939f45
  17. 26 Feb, 2013 5 commits
  18. 25 Feb, 2013 1 commit
    • Jingning Han's avatar
      clean up forward and inverse hybrid transform · 77a3becf
      Jingning Han authored
      Rebased.
      
      Remove the old matrix multiplication transform computation. The 16x16
      ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16
      300/0 in vp9/common/vp9_blockd.h.
      
      Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f
      77a3becf
  19. 23 Feb, 2013 2 commits
    • Ronald S. Bultje's avatar
      Split coefficient token tables intra vs. inter. · 0c9e2e9a
      Ronald S. Bultje authored
      Change-Id: I5416455f8f129ca0f450d00e48358d2012605072
      0c9e2e9a
    • Paul Wilkins's avatar
      Further changes to coefficient contexts. · c17672a3
      Paul Wilkins authored
      This patch alters the balance of context between the
      coefficient bands (reflecting the position of coefficients
      within a transform blocks) and the energy of the previous
      token (or tokens) within a block.
      
      In this case the number of coefficient bands is reduced
      but more previous token energy bands are supported.
      
      Some initial rebalancing of the default tables has been
      by running multiple derf clips at multiple data rates using
      the ENTOPY_STATS macro. Further balancing needs to be
      done using larger image formatsd especially in regard to
      the bigger transform sizes which are not as well represented
      in encodings of smaller image formats.
      
      Change-Id: If9736e95c391e711b04aef6393d26f60f36e1f8a
      c17672a3
  20. 22 Feb, 2013 2 commits
    • Paul Wilkins's avatar
      Experimental removal of over quant code · dbf49420
      Paul Wilkins authored
      The over quant code was added in VP8 post
      bitstream freeze to allow compression to lower
      data rates
      
      In VP9 the real qualtizer range has been greatly
      extended anyway.
      
      Change-Id: I5d384fa5e9a83ef75a3df34ee30627bd21901526
      dbf49420
    • Jingning Han's avatar
      Forward butterfly hybrid transform · babbd5d1
      Jingning Han authored
      This patch includes 4x4, 8x8, and 16x16 forward butterfly ADST/DCT
      hybrid transform. The kernel of 4x4 ADST is sin((2k+1)*(n+1)/(2N+1)).
      The kernel of 8x8/16x16 ADST is of the form sin((2k+1)*(2n+1)/4N).
      
      Change-Id: I8f1ab3843ce32eb287ab766f92e0611e1c5cb4c1
      babbd5d1
  21. 21 Feb, 2013 1 commit
    • Deb Mukherjee's avatar
      Refactoring of switchable filter search for speed · 28b1db92
      Deb Mukherjee authored
      Refactors the switchable filter search in the rd loop to
      improve encode speed.
      
      Uses a piecewise approximation to a closed form expression to estimate
      rd cost for a Laplacian source with a given variance and quantization
      step-size.
      
      About 40% encode time reduction is achieved.
      
      Results (on a feb 12 baseline) show a slight drop:
      
      derf: -0.019%
      yt: +0.010%
      std-hd: -0.162%
      hd: -0.050%
      
      Change-Id: Ie861badf5bba1e3b1052e29a0ef1b7e256edbcd0
      28b1db92
  22. 16 Feb, 2013 1 commit