1. 07 Feb, 2013 - 2 commits
  2. 06 Feb, 2013 - 4 commits
  3. 05 Feb, 2013 - 5 commits
    • Ronald S. Bultje's avatar
      [WIP] Add column-based tiling. · 1407bdc2
      Ronald S. Bultje authored
      This patch adds column-based tiling. The idea is to make each tile
      independently decodable (after reading the common frame header) and
      also independendly encodable (minus within-frame cost adjustments in
      the RD loop) to speed-up hardware & software en/decoders if they used
      multi-threading. Column-based tiling has the added advantage (over
      other tiling methods) that it minimizes realtime use-case latency,
      since all threads can start encoding data as soon as the first SB-row
      worth of data is available to the encoder.
      
      There is some test code that does random tile ordering in the decoder,
      to confirm that each tile is indeed independently decodable from other
      tiles in the same frame. At tile edges, all contexts assume default
      values (i.e. 0, 0 motion vector, no coefficients, DC intra4x4 mode),
      and motion vector search and ordering do not cross tiles in the same
      frame.
      t log
      
      Tile independence is not maintained between frames ATM, i.e. tile 0 of
      frame 1 is free to use motion vectors that point into any tile of frame
      0. We support 1 (i.e. no tiling), 2 or 4 column-tiles.
      
      The loopfilter crosses tile boundaries. I discussed this briefly with Aki
      and he says that's OK. An in-loop loopfilter would need to do some sync
      between tile threads, but that shouldn't be a big issue.
      
      Resuls: with tiling disabled, we go up slightly because of improved edge
      use in the intra4x4 prediction. With 2 tiles, we lose about ~1% on derf,
      ~0.35% on HD and ~0.55% on STD/HD. With 4 tiles, we lose another ~1.5%
      on derf ~0.77% on HD and ~0.85% on STD/HD. Most of this loss is
      concentrated in the low-bitrate end of clips, and most of it is because
      of the loss of edges at tile boundaries and the resulting loss of intra
      predictors.
      
      TODO:
      - more tiles (perhaps allow row-based tiling also, and max. 8 tiles)?
      - maybe optionally (for EC purposes), motion vectors themselves
        should not cross tile edges, or we should emulate such borders as
        if they were off-frame, to limit error propagation to within one
        tile only. This doesn't have to be the default behaviour but could
        be an optional bitstream flag.
      
      Change-Id: I5951c3a0742a767b20bc9fb5af685d9892c2c96f
      1407bdc2
    • Ronald S. Bultje's avatar
      Add SSE3 versions for sad{32x32,64x64}x4d functions. · 58c983d1
      Ronald S. Bultje authored
      Overall encoding about 15% faster.
      
      Change-Id: I176a775c704317509e32eee83739721804120ff2
      58c983d1
    • Yaowu Xu's avatar
      rewrite 4x4 idct and fdct · fa36981e
      Yaowu Xu authored
      This commit changes the 4x4 iDCT to use same algorithm & constants as
      other iDCTs. The 4x4 fDCT is also changed to be based on the new iDCT.
      
      Change-Id: Ib1a902693228af903862e1f5a08078c36f2089b0
      fa36981e
    • Paul Wilkins's avatar
      Change definition of NearestMV. · 81043e8d
      Paul Wilkins authored
      This commit makes the NearestMV match the chosen
      best reference MV. It can be a 0,0 or non zero vector
      which means the the compound nearest mv mode can
      combine a 0,0 and a non zero vector.
      
      Change-Id: I2213d09996ae2916e53e6458d7d110350dcffd7a
      81043e8d
    • Scott LaVarnway's avatar
      Added vp9_short_idct1_32x32_c · 5780c4cb
      Scott LaVarnway authored
      and called this function in vp9_dequant_idct_add_32x32_c when
      eob == 1.  For the test clip used, the decoder performance improved
      by 21+%.  Based on Yaowu's 16 point idct work.
      
      Change-Id: Ib579a90fed531d45777980e04bf0c9b23c093c43
      5780c4cb
  4. 04 Feb, 2013 - 3 commits
    • Paul Wilkins's avatar
      Re-factor code for rd thresholds. · 3ab53876
      Paul Wilkins authored
      Separate out code to set the main encode speed
      related rd thresholds. Some values changed from
      the initial defaults for various new modes.
      
      Quality test results pending but even the addition
      of some further non-zero defaults helps encode speed
      somewhat in limited testing on derf clips.
      
      Adjustment of thresholds for quality / speed tradeoff
      to follow.
      
      Change-Id: I117ee473157e151a1b93193d5f393449328de20d
      3ab53876
    • Yaowu Xu's avatar
      re-write 8 point idct · 1eb79dc1
      Yaowu Xu authored
      to be consistent with idct16 and idct32.
      
      Change-Id: Ie89dbd32b65c33274b7fecb4b41160fcf1962204
      1eb79dc1
    • Yaowu Xu's avatar
      a couple of minor fixes · ccaaeb4b
      Yaowu Xu authored
      fixed a function prototypes to prevent compiler warnings;
      removed a function not in use;
      un-capitialize "Refstride" to ref_stride
      
      Change-Id: Ib4472b6084f357d96328c6a06e795b6813a9edba
      ccaaeb4b
  5. 01 Feb, 2013 - 1 commit
    • Yaowu Xu's avatar
      Changes 16 point idct · 91e0e801
      Yaowu Xu authored
      This commit changes the inverse 16 point dct to use the same algorithm
      as the one for 32 point idct. In fact, now 16 point dct uses the exact
      version of the souce code for even portion of the 32 point idct.
      
      Tests showed current implementation has significant better accuracy
      than the previous version. With this implementation and the minor bug
      fix on forward 16 point dct, encoding tests showed about 0.2% better
      compression of CIF set, test results on std-hd setting pending.
      
      Change-Id: I68224b60c816ba03434e9f08bee147c7e344fb63
      91e0e801
  6. 31 Jan, 2013 - 2 commits
    • Yaowu Xu's avatar
      fix a small bug in 16 point forward dct · ab1cad9b
      Yaowu Xu authored
      The commit fixes a minor error in 16 point fdct where in a rotation can
      produce result of -1 instead of 0.
      
      Change-Id: I45aac4a52bcd06225c6d04e643547a13e1c1aade
      ab1cad9b
    • Yaowu Xu's avatar
      A fix point implementation of 32x32 idct · 5149d7f7
      Yaowu Xu authored
      This commit changes the 32x32 idct to use integer only. The algorithm
      was taken directly from "A Fast Computational Algorithm for the
      Discrete Cosine Tranform" by W. Chen, et al., which was published in
      IEEE Transaction on Communication Vol. Com.-25 No. 9, 1977. The signal
      flow graph in the original paper is for a 32 point forward dct, the
      current implementation of inverse DCT was done by follow the graph in
      reversed direction.
      
      With this implementation, the 32 point inverse dct contains a 16 point
      inverse dct in its even portion, similarly the 16 point idct further
      contains 8 point and 4 point inverse dcts.
      
      As of patch 4, encoding tests showed there is no compression loss when
      compared against the floating point baseline. Numbers even showed very
      small postives. (cif: .01%, std-hd: .05%).
      
      Change-Id: I2d2d17a424b0b04b42422ef33ec53f5802b0f378
      5149d7f7
  7. 30 Jan, 2013 - 4 commits
  8. 29 Jan, 2013 - 3 commits
    • Ronald S. Bultje's avatar
      Fix block pointer corruption in intra8x8 prediction with 4x4 transform. · ffc2e4f4
      Ronald S. Bultje authored
      The RD loop would change the pointer after the first mode (DC) was tested,
      leading to corrupt block objects being provided for the others. This
      would essentially render the i8x8 predictor useless.
      
      Change-Id: I16c5906ca64fb34878ac32ce59af8974e4582bb8
      ffc2e4f4
    • Paul Wilkins's avatar
      Remove eob_max_offset markers. · 93762ca9
      Paul Wilkins authored
      Remove eob_max_offset markers and replace
      with the generic skip_block flag to indicate
      to the quantizer that all coeffs to be set to 0
      and eob position set to 0;
      
      Change-Id: Id477e8f8d4ec1a5562758904071013c24b76bfd7
      93762ca9
    • Deb Mukherjee's avatar
      Further improvement on compound inter-intra expt · 3b04d467
      Deb Mukherjee authored
      Adds a special combination mode specific to intra prediciton
      mode D45.
      
      Current results with the compound inter/intra experiment:
      derf: 0.2%
      yt: 0.55%
      std-hd: 0.75%
      hd: 0.74%
      
      Change-Id: I8976bdf3b9b0b66ab8c5c628bbc62c14fc72ca86
      3b04d467
  9. 28 Jan, 2013 - 2 commits
    • Paul Wilkins's avatar
      Segment Skip Flag · 0ff9b033
      Paul Wilkins authored
      First step in simplifying the segment mode and
      segment EOB flags into a simpler segment skip
      flag that implies 0,0 mv and EOB at position 0.
      
      Change-Id: Ib750cac31a7a02dc21082580498efd9f7d8d72a5
      0ff9b033
    • Paul Wilkins's avatar
      Simplify Zero bin and zero bin run code. · 8e2c03fb
      Paul Wilkins authored
      Simplification to eliminate a number of very large data
      data structures. All zero run, zbin boosts for different
      transform sizes are now limited to a maximum run length
      of 15 before they max out the boost.
      
      Some further work still needs be done to refactor, rationalize
      and optimize the multiple quantizer functions.
      
      The simplification coupled with tweaks to the 16 element array
      now used for all transform sizes, has minimal effect on quality.
      
      Change-Id: I6f3948b8ca0418b60d4db9030ff19026a34ed423
      8e2c03fb
  10. 26 Jan, 2013 - 2 commits
  11. 25 Jan, 2013 - 2 commits
  12. 24 Jan, 2013 - 2 commits
    • Paul Wilkins's avatar
      Mvref speedup · fcb4a25c
      Paul Wilkins authored
      Quality / decode speed trade off changes.
      Simpler insert method without sort. Quality impact small.
      
      Change-Id: Id0c0941bc508d985405abd06a13ffe7489170b62
      fcb4a25c
    • Deb Mukherjee's avatar
      Adds an error-resilient mode with test · 01cafaab
      Deb Mukherjee authored
      Adds an error-resilient mode where frames can be continued
      to be decoded even when there are errors (due to network losses)
      on a prior frame. Specifically, backward updates are turned off
      and probabilities of various symbols are reset to defaults at
      the beginning of each frame. Further, the last frame's mvs are
      not used for the mv reference list, and the sorting of the
      initial list based on search on previous frames is turned off
      as well.
      
      Also adds a test where an arbitrary set of frames are skipped
      from decoding to simulate errors. The test verifies (1) that if
      the error frames are droppable - i.e. frame buffer updates have
      been turned off - there are no mismatch errors for the remaining
      frames after the error frames; and (2) if the error-frames are non
      droppable, there are not only no decoding errors but the mismatch
      PSNR between the decoder's version of the post-error frames and the
      encoder's version is at least 20 dB.
      
      Change-Id: Ie6e2bcd436b1e8643270356d3a930e8989ff52a5
      01cafaab
  13. 23 Jan, 2013 - 1 commit
    • Scott LaVarnway's avatar
      Intrinsic version of loopfilter now matches C code · 6a997400
      Scott LaVarnway authored
      Updated the instrinsic code to match Yaowu's latest loopfilter change.
      (I584393906c4f5f948a581d6590959522572743bb)
      
      The decoder performance improved by ~30% for the test clip used.
      
      Change-Id: I026cfc75d5bcb7d8d58be6f0440ac9e126ef39d2
      6a997400
  14. 18 Jan, 2013 - 2 commits
    • John Koleszar's avatar
      Use alt-ref frame context for keyframes · 2f24ad9e
      John Koleszar authored
      This matches the behavior prior to generalizing the frame context
      selection, and intuitively makes sense in that the first forward ref
      is immediately after the keyframe, so it's quality is improved a bit
      by using the keyframe's entropy context rather than the default.
      
      Change-Id: Ia82cef79382b9d8cfafdc44ba0533d4dc3e44053
      2f24ad9e
    • Yaowu Xu's avatar
      a minor change to a portion of loop filtering · b95ed688
      Yaowu Xu authored
      The loop filtering used for MB edge or internal edge of a MB using 8x8
      tranform was reading 5 pixel each side and writting 3 pixel each side.
      With suggestion from Aki and Scott on hardware&software performance,
      this commit changed to read 4 pixel each side and write 3 pixel each
      side.
      
      Change-Id: I584393906c4f5f948a581d6590959522572743bb
      b95ed688
  15. 16 Jan, 2013 - 5 commits
    • John Koleszar's avatar
      Preserve the previous golden frame on golden updates · 26bd81b9
      John Koleszar authored
      This commit restores the quality lost when the buffer-to-buffer copy
      logic was removed. Note that this is specific to the current use of
      golden frames and will need rework when RTC functionality is added.
      
      Change-Id: I7324a75acd96eafd9e0f9b8633d782e390d5dc21
      26bd81b9
    • John Koleszar's avatar
      Generalize and increase frame coding contexts · 4b65837b
      John Koleszar authored
      Previously there were two frame coding contexts tracked, one for normal
      frames and one for alt-ref frames. Generalize this by signalling the
      context to use in the bitstream, rather than tieing it to the alt ref
      refresh bit. Also increase the number of contexts available to 4, which
      may be useful for temporal scalability.
      
      Change-Id: I7b66daaddd55c535c20cd16713541fab182b1662
      4b65837b
    • John Koleszar's avatar
      Start to anonymize reference frames · da832a80
      John Koleszar authored
      Remove lst_fb_idx, gld_fb_idx, alt_fb_idx, refresh_last_frame,
      refresh_golden_frame, refresh_alt_ref_frame from common. Gold/Alt are
      encode side conventions. From the decoder's perspective, we want to be
      dealing with numbered references.
      
      Updates to active_ref 2 signal mode context switches, vestigial from
      refresh_alt_ref_frame. This needs some clean up to make sense with
      increased numbers of reference frames, as well as reimplementing the
      swapping of alt/golden which was previously done using the
      buffer-to-buffer copy mechanism removed in an earlier commit.
      
      Change-Id: I7334445158b7666f9295d2a2dd22aa03f4485f58
      da832a80
    • John Koleszar's avatar
      Update encoder to use fb_idx_ref_cnt · 394b0a6a
      John Koleszar authored
      Do reference counting the same way on the encoder as the decoder does,
      rather than maintaining the 'flags' member of YV12_BUFFER_CONFIG.
      
      Change-Id: I91dc210ffca081acaf9d5c09a06e7461b3c3139c
      394b0a6a
    • John Koleszar's avatar
      Remove buffer-to-buffer copy logic · b8e02798
      John Koleszar authored
      This is the first in a series of commits to add additional reference
      frames to the codec. Each frame will be able to update any of the
      available references, but copying between references is not
      supported.
      
      Change-Id: I5945b5ce6cc3582c495102b4e7eed4f08c44d5a1
      b8e02798