1. 16 Jul, 2013 - 4 commits
    • Ronald S. Bultje's avatar
      Increase border size from 96 to 160. · b02c4d36
      Ronald S. Bultje authored
      This is required because upon downscaling, if a motion vector points
      partially into the UMV (e.g. all minus 1 of 64+7 pixels, i.e. 70),
      then we can point up to 140 pixels into the larger-resolution (2x)
      reference buffer UMV, which means the UMV for reference buffers in
      downscaling needs to be 140 rounded up to the nearest multiple of 32,
      i.e. 160.
      
      Longer-term, we should probably handle the UMV differently by detecting
      edge coverage on-the-fly and using a temporary buffer for edge extensions
      instead of adding 160 pixels on all sides of the image (which means a
      CIF image uses 3x its own area size for borders).
      
      Change-Id: I5184443e6731cd6721fc6a5d430a53e7d91b4f7e
      b02c4d36
    • Ronald S. Bultje's avatar
      Inline vp9_quantize() in xform_quant(). · 1ff94fea
      Ronald S. Bultje authored
      Cycle times:
      4x4:    151 to  131 cycles (15% faster)
      8x8:    334 to  306 cycles (9% faster)
      16x16: 1401 to 1368 cycles (2.5% faster)
      32x32: 7403 to 7367 cycles (0.5% faster)
      
      Total encode time of first 50 frames of bus @ 1500kbps (speed 0)
      goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup.
      
      Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f
      1ff94fea
    • Ronald S. Bultje's avatar
      7e684e20
    • Frank Galligan's avatar
  2. 15 Jul, 2013 - 5 commits
    • Ronald S. Bultje's avatar
      Inline xform_quant() in encode_block_intra(). · 6fb41874
      Ronald S. Bultje authored
      Also inline some of the block calculations to assist the compiler to
      not do silly things like calculating the same offset (or converting
      between raster/transform block offset or block, mi and pixel unit)
      many, many, many times.
      
      Cycle times:
      4x4:     584 ->   505 cycles (16% faster)
      8x8:    1651 ->  1560 cycles (6% faster)
      16x16:  7897 ->  7704 cycles (2.5% faster)
      32x32: 16096 -> 15852 cycles (1.5% faster)
      
      Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the
      first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall.
      
      Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80
      6fb41874
    • Dmitry Kovalev's avatar
      Code cleanup inside vp9_decodeframe.c. · 2c317298
      Dmitry Kovalev authored
      Removing unused DEC_DEBUG define and dec_debug variable. Changing function
      signatures to eliminate code duplication, renaming function
      mb_init_dequantizer to init_dequantizer. Also removing redundant curly
      braces, and comments.
      
      
      Change-Id: Ia56ee1b0be5f24abb0e878581845be8a4773c298
      2c317298
    • Frank Galligan's avatar
      Neon: Update mbfilter if all vectors follow one branch. · f4f60f60
      Frank Galligan authored
      Change the mbfilter Neon code from executing both branches if all
      vectors follow only one branch.
      
      The code is about 5% faster when executing only one branch and about
      1% slower when executing both branches.
      
      -PS5: Remove local stack space from mbfilter.
      
      Change-Id: I6a23f9b318a9f4568a2718b4c9348db988fe2182
      f4f60f60
    • Jingning Han's avatar
      Skip duplicate block encoding in the rd loop · faff6ed0
      Jingning Han authored
      This speed feature allows the encoder to largely remove the spatial
      dependency between blocks inside a 64x64 superblock, thereby removing
      the need to repeatedly encode superblocks per partition type in the
      rate-distortion optimization loop.
      
      A major challenge lies in the intra modes tested in the rate-distortion
      optimization loop. The subsequent blocks do not have access to the
      reconstructed boundary pixels without the intermediate coding steps.
      This was resolved by using the original pixels for intra prediction
      in the rd loop, followed by an appropriately designed distortion
      modeling on the quantization parameters. Experiments also suggested
      that the performance impact is more discernible at lower bit-rate/psnr
      settings. Hence a quantizer dependent threshold is applied to deactivate
      skip of block coding.
      
      For bus_cif at 2000 kbps,
      speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB
               performance loss.
      
      speed 1: runtime 65312ms  -> 61536ms, (7...
      faff6ed0
    • Dmitry Kovalev's avatar
  3. 13 Jul, 2013 - 3 commits
  4. 12 Jul, 2013 - 24 commits
  5. 11 Jul, 2013 - 4 commits