1. 03 Aug, 2011 1 commit
    • Yunqing Wang's avatar
      Adjust half-pixel only search · b9f19f89
      Yunqing Wang authored
      Changed motion search in vp8_find_best_half_pixel_step() to be the
      same as in vp8_find_best_sub_pixel_step(), which checks 5 points
      instead of 8 points. This only affects real-time mode with
      cpu-used >=9. Tests showed it gives 2% encoding speedup with
      a quality loss(psnr) of up to 0.5%.
      
      Change-Id: I16049cad1535002346d46cfdfad345bfc3dc5146
      b9f19f89
  2. 27 Jul, 2011 2 commits
    • Yunqing Wang's avatar
      Preload reference area in sub-pixel motion search (real-time mode) · 2f2302f8
      Yunqing Wang authored
      This change implemented same idea in change "Preload reference area
      to an intermediate buffer in sub-pixel motion search." The changes
      were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
      _pixel_step() functions which are called when speed >= 5. Test
      result (using tulip clip):
      
      1. On Core2 Quad machine(Linux)
      rt mode, speed (-5 ~ -8), encoding speed gain: 2% ~ 3%
      rt mode, speed (-9 ~ -11), encoding speed gain: 1% ~ 2%
      rt mode, speed (-12 ~ -14), no noticeable encoding speed gain
      
      2. On Xeon machine(Linux)
      Test on speed (-5 ~ -14) didn't show noticeable speed change.
      
      Change-Id: I21bec2d6e7fbe541fcc0f4c0366bbdf3e2076aa2
      2f2302f8
    • Yunqing Wang's avatar
      Fix range checks in motion search · bde2afbe
      Yunqing Wang authored
      There were some situations that the start motion vectors were
      out of range. This fix adjusted range checks to make sure they
      are checked and clamped.
      
      Change-Id: Ife83b7fed0882bba6d1fa559b6e63c054fd5065d
      bde2afbe
  3. 26 Jul, 2011 2 commits
  4. 25 Jul, 2011 3 commits
  5. 22 Jul, 2011 2 commits
    • Johann's avatar
      fix sharpness bug and clean up · a04ed0e8
      Johann authored
      sharpness was not recalculated in vp8cx_pick_filter_level_fast
      
      remove last_filter_type. all values are calculated, don't need to update
      the lfi data when it changes.
      
      always use cm->sharpness_level. the extra indirection was annoying.
      
      don't track last frame_type or sharpness_level manually. frame type
      only matters for motion search and sharpness_level is taken care of in
      frame_init
      
      move function declarations to their proper header
      
      Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
      a04ed0e8
    • Yunqing Wang's avatar
      Preload reference area to an intermediate buffer in sub-pixel motion search · 20bd1446
      Yunqing Wang authored
      In sub-pixel motion search, the search range is small(+/- 3 pixels).
      Preload whole search area from reference buffer into a 32-byte
      aligned buffer. Then in search, load reference data from this buffer
      instead. This keeps data in cache, and reduces the crossing cache-
      line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
      showed encoder speed improvement:
        3.4%   at --rt --cpu-used =-4
        2.8%   at --rt --cpu-used =-3
        2.3%   at --rt --cpu-used =-2
        2.2%   at --rt --cpu-used =-1
      
      Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
      Test on Xeon machine also showed less improvement, since unaligned
      data access latency is greatly reduced in newer cores.
      
      Next, I will apply similar idea to other 2 sub-pixel search functions
      for encoding speed > 4.
      
      Make this change exclusively for x86 platforms.
      
      Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
      20bd1446
  6. 20 Jul, 2011 2 commits
    • Timothy B. Terriberry's avatar
      Increase chrow row alignment to 16 bytes. · 7d1b37cd
      Timothy B. Terriberry authored
      This is done by expanding luma row to 32-byte alignment, since
       there is currently a bunch of code that assumes that
       uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c,
       common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm,
       encoder/temporal_filter.c, and possibly others; I haven't done a
       full audit).
      It also uses replaces the hardcoded border of 16 in a number of
       encoder buffers with VP8BORDERINPIXELS (currently 32), as the
       chroma rows start at an offset of border/2.
      Together, these two changes have the nice advantage that simply
       dumping the frame memory as a contiguous blob produces a valid,
       if padded, image.
      
      Change-Id: Iaf5ea722ae5c82d5daa50f6e2dade9de753f1003
      7d1b37cd
    • Attila Nagy's avatar
      encoder: don't set the fragment bit for the last partition · 0afcc769
      Attila Nagy authored
      Change-Id: Icb4e4f0d7c3074a8507852178be87541a1cb5bac
      0afcc769
  7. 19 Jul, 2011 2 commits
    • Johann's avatar
      remove old armv5 code · 6afafc31
      Johann authored
      armv5 dequantizer is not referenced
      
      Change-Id: Id1cc617dcee35ebd6a406816ec6aaa26e8bbc8ad
      6afafc31
    • Scott LaVarnway's avatar
      Moved vp8_encode_bool into boolhuff.h · a25f6a9c
      Scott LaVarnway authored
      allowing the compiler to inline this function.  For real-time
      encodes, this gave a boost of 1% to 2.5%, depending on the
      speed setting.
      
      Change-Id: I3929d176cca086b4261267b848419d5bcff21c02
      a25f6a9c
  8. 18 Jul, 2011 1 commit
    • John Koleszar's avatar
      Improved 1-pass CBR rate control · b5ea2fbc
      John Koleszar authored
      This patch attempts to improve the handling of CBR streams with
      respect to the short term buffering requirements. The "buffer level"
      is changed to be an average over the rc buffer, rather than a long
      running average. Overshoot is also tracked over the same interval
      and the golden frame targets suppressed accordingly to correct for
      overly aggressive boosting.
      
      Testing shows that this is fairly consistently positive in one
      metric or another -- some clips that show significant decreases
      in quality have better buffering characteristics, others show
      improvenents in both.
      
      Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920
      b5ea2fbc
  9. 15 Jul, 2011 1 commit
    • Tero Rintaluoma's avatar
      Tokenize MB optimized · 4e82f015
      Tero Rintaluoma authored
      Optimized C-code of the following functions:
       - vp8_tokenize_mb
       - tokenize1st_order_b
       - tokenize2nd_order_b
      Gives ~1-5% speed-up for RT encoding on Cortex-A8/A9
      depending on encoding parameters.
      
      Change-Id: I6be86104a589a06dcbc9ed3318e8bf264ef4176c
      4e82f015
  10. 14 Jul, 2011 2 commits
  11. 13 Jul, 2011 2 commits
    • Yunqing Wang's avatar
      Fix unnecessary casting of B_PREDICTION_MODE (issue 349) · 139577f9
      Yunqing Wang authored
      Minor fix.
      
      Change-Id: Iaf93f6e47e882a33c479e57c7a0d0bf321e291c0
      139577f9
    • Yunqing Wang's avatar
      Add improvements made in good-quality mode to real-time mode · 0e9a6ed7
      Yunqing Wang authored
      Several improvements we made in good-quality mode can be added
      into real-time mode to speed up encoding in speed 1, 2, and 3
      with small quality loss. Tests using tulip clip showed:
      
      --rt --cpu-used=-1
      (before change)
      PSNR: 38.028
      time: 1m33.195s
      (after change)
      PSNR: 38.014
      time: 1m20.851s
      
      --rt --cpu-used=-2
      (before change)
      PSNR: 37.773
      time: 0m57.650s
      (after change)
      PSNR: 37.759
      time: 0m54.594s
      
      --rt --cpu-used=-3
      (before change)
      PSNR: 37.392
      time: 0m42.865s
      (after change)
      PSNR: 37.375
      time: 0m41.949s
      
      Change-Id: I76ab2a38d72bc5efc91f6fe20d332c472f6510c9
      0e9a6ed7
  12. 12 Jul, 2011 2 commits
  13. 11 Jul, 2011 1 commit
  14. 08 Jul, 2011 4 commits
    • Yunqing Wang's avatar
      Minor change in pick_inter_mode() · 587ca06d
      Yunqing Wang authored
      Scott suggested to move vp8_mv_pred() under "case NEWMV" to save
      extra checks.
      
      Change-Id: I09e69892f34a08dd425a4d81cfcc83674e344a20
      587ca06d
    • Yunqing Wang's avatar
      Adjust full-pixel clamping and motion vector limit calculation · 40991fae
      Yunqing Wang authored
      Do mvp clamping in full-pixel precision instead of 1/8-pixel
      precision to avoid error caused by right shifting operation.
      Also, further fixed the motion vector limit calculation in change:
      b7480454
      
      Change-Id: Ied88a4f7ddfb0476eb9f7afc6ceeddbf209fffd7
      40991fae
    • Johann's avatar
      update x86 asm for loopfilter · 01433c50
      Johann authored
      Change-Id: I1ed739522db7c00c189851c7095c1b64ef6412ce
      01433c50
    • Attila Nagy's avatar
      New loop filter interface · 62295844
      Attila Nagy authored
      Separate simple filter with reduced no. of parameters.
      MB filter level picking based on precalculated table. Level table updated for
      each frame. Inside and edge limits precalculated and updated just when
      sharpness changes. HEV threshhold is constant.
      ARM targets use scalars and others vectors.
      
      Change works only with --target=generic-gnu
      All other targets have to be updated!
      
      Change-Id: I6b73aca6b525075b20129a371699b2561bd4d51c
      62295844
  15. 07 Jul, 2011 1 commit
    • John Koleszar's avatar
      Set VPX_FRAME_IS_DROPPABLE · 37de0b8b
      John Koleszar authored
      Allow the encoder to inform the application that the encoded frame will not
      be used as a reference.
      
      Change-Id: I90e41962325ef73d44da03327deb340d6f7f4860
      37de0b8b
  16. 30 Jun, 2011 2 commits
  17. 29 Jun, 2011 3 commits
    • Paul Wilkins's avatar
      Change to arf boost calculation. · 11694aab
      Paul Wilkins authored
      In this commit I have added an experimental function
      that tests prediction quality either side of a central position
      to calculate a suggested boost number for an ARF frame.
      
      The function is passed an offset from the current position and
      a number of frames to search forwards and backwards.
      It returns a forward, backward and compound boost number.
      
      The new code can be deactivated using #define NEW_BOOST 0
      
      In its current default state the code searches forwards and backwards
      from the proposed  position of the next alt ref.
      
      The the old code used a boost number calculated by scanning forward
      from the previous GF up to the proposed alt ref frame position.
      
      I have also added some code to try and prevent placement of a gf/arf
      where there is a brief flash.
      
      Change-Id: I98af789a5181148659f10dd5dd2ff2d4250cd51c
      11694aab
    • Johann's avatar
      remove incorrect initialization · fe53107f
      Johann authored
      Values were set, then reset. Only set them once.
      
      Change-Id: Iaf43c8467129f2f261f04fa9188b603aa46216b5
      fe53107f
    • Johann's avatar
      clean up warnings when building arm with rtcd · 6611f669
      Johann authored
      Change-Id: I3683cb87e9cb7c36fc22c1d70f0799c7c46a21df
      6611f669
  18. 28 Jun, 2011 5 commits
    • John Koleszar's avatar
      Use MAX_ENTROPY_TOKENS and ENTROPY_NODES more consistently · b32da7c3
      John Koleszar authored
      There were many instances in the code of vp8_coef_tokens and
      vp8_coef_tokens-1, which was a preprocessor macro despite the naming
      convention. Replace these with MAX_ENTROPY_TOKENS and ENTROPY_NODES,
      respectively.
      
      Change-Id: I72c4f6c7634c94e1fa066cd511471e5592c748da
      b32da7c3
    • Gaute Strokkenes's avatar
      Simplify decode_macroblock. · 81c05464
      Gaute Strokkenes authored
      Change-Id: Ieb2f3827ae7896ae594203b702b3e8fa8fb63d37
      81c05464
    • Stefan Holmer's avatar
      New ways of passing encoded data between encoder and decoder. · 7296b3f9
      Stefan Holmer authored
      With this commit frames can be received partition-by-partition
      from the encoder and passed partition-by-partition to the
      decoder.
      
      At the encoder-side this makes it easier to split encoded
      frames at partition boundaries, useful when packetizing
      frames. When VPX_CODEC_USE_OUTPUT_PARTITION is enabled,
      several VPX_CODEC_CX_FRAME_PKT packets will be returned
      from vpx_codec_get_cx_data(), containing one partition
      each. The partition_id (starting at 0) specifies the decoding
      order of the partitions. All partitions but the last has
      the VPX_FRAME_IS_FRAGMENT flag set.
      
      At the decoder this opens up the possibility of decoding partition
      N even though partition N-1 was lost (given that independent
      partitioning has been enabled in the encoder) if more info
      about the missing parts of the stream is available through
      external signaling.
      
      Each partition is passed to the decoder through the
      vpx_codec_decode() function, with the data pointer pointing
      to the start of the partition, and with data_sz equal to the
      size of the partition. Missing partitions can be signaled to
      the decoder by setting data != NULL and data_sz = 0. When
      all partitions have been given to the decoder "end of data"
      should be signaled by calling vpx_codec_decode() with
      data = NULL and data_sz = 0.
      
      The first partition is the first partition according to the
      VP8 bitstream + the uncompressed data chunk + DCT address
      offsets if multiple residual partitions are used.
      
      Change-Id: I5bc0682b9e4112e0db77904755c694c3c7ac6e74
      7296b3f9
    • Stefan Holmer's avatar
      Adding support for independent partitions · 4cb0ebe5
      Stefan Holmer authored
      Adding support in the encoder for generating
      independent residual partitions by forcing
      equal probabilities over the prev coef entropy
      contexts.
      
      Change-Id: I402f5c353255f3ca20eae2620af739f6a498cd21
      4cb0ebe5
    • Mike Hommey's avatar
      Avoid text relocations in ARM vp8 decoder · e3f850ee
      Mike Hommey authored
      The current code stores pointers to coefficient tables and loads them to
      access the tables contents. As these pointers are stored in the code
      sections, it means we end up with text relocations. eu-findtextrel will
      thus complain about code not compiled with -fpic/-fPIC.
      
      Since the pointers are stored in the code sections, we can actually cheat
      and let the assembler generate relative addressing when accessing the
      coefficient tables, and just load their location with adr.
      
      Change-Id: Ib74ae2d3f2bab80b29991355f2dbe6955f38f6ae
      e3f850ee
  19. 27 Jun, 2011 2 commits