1. 18 Dec, 2014 1 commit
  2. 12 Dec, 2014 3 commits
  3. 11 Dec, 2014 7 commits
    • hkuang's avatar
      Remove unnecessary dqcoeff memset. · 3c7a06c3
      hkuang authored
      dqcoeff is set to be 0 on initialization. And set back to 0 after being
      used everytime.
      Change-Id: I32b8e149bba40a8d707849f737a8e49a691f319c
    • Jingning Han's avatar
      Replace division with bit shift in choose_partitioning · d5c396a9
      Jingning Han authored
      This commit explicitly uses the bit shift operation instead of
      division for computing block variance.
      Change-Id: Id19c0ff27dd1d1ae4aceee6657e1aad0d406bd74
    • Peter de Rivaz's avatar
      Corrected optimization of 8x8 DCT code · 5c22224e
      Peter de Rivaz authored
      The 8x8 DCT uses a fast version whenever possible.
      There was a mistake in the checking code which
      meant sometimes the fast version was used when it
      was not safe to do so.
      Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7
      (cherry picked from commit fd05fb0c)
    • Jingning Han's avatar
      Refactor choose_partitioning computing scheme · 377d2f02
      Jingning Han authored
      This commit refactors the choose_partitioning function. It removes
      redundant memset calls and makes the encoder to calculate
      variance value per block only when it is needed. It reduces the
      average runtime cost of choose_partitioning by 60%. Overall it
      reduces speed -6 runtime by 2-5%.
      Change-Id: I951922c50d901d0fff77a3bafc45992179bacef9
    • JackyChen's avatar
      Multiframe Quality Enhancement(MFQE) in VP9. · 7ac3e3c1
      JackyChen authored
      It is the first version of MFQE in VP9. There are a few TODOs included
      in this version.
      Usage: Add flag --enable-vp9-postproc to config the project.
      In decoder, use flag --mfqe in the command line to enable
      MFQE in postproc.
      Note: Need to have key frame with low quality to see the effect of this
      new patch. In my experiment, I fixed the qindex to 200 in key frame.
      Change-Id: I021f9ce4616ed3574c81e48d968662994b56a396
    • James Yu's avatar
      VP9 common for ARMv8 by using NEON intrinsics 18 · 3f7c12da
      James Yu authored
      Add vp9_idct32x32_add_neon.c
      - vp9_idct32x32_1024_add_neon
      Change-Id: Ic598b772c28bd3487a8ead7a4598a66b25f9b00f
      Signed-off-by: 's avatarJames Yu <james.yu@linaro.org>
    • James Yu's avatar
      VP9 common for ARMv8 by using NEON intrinsics 14 · 3cfed4bf
      James Yu authored
      Add vp9_idct16x16_add_neon.c
      - vp9_idct16x16_256_add_neon_pass1
      - vp9_idct16x16_256_add_neon_pass2
      - vp9_idct16x16_10_add_neon_pass1
      - vp9_idct16x16_10_add_neon_pass2
      Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de
      Signed-off-by: 's avatarJames Yu <james.yu@linaro.org>
  4. 10 Dec, 2014 11 commits
  5. 09 Dec, 2014 8 commits
    • Jingning Han's avatar
      Refactor update_state_rt · e728678c
      Jingning Han authored
      Update the frame motion vector only if previous frame motion vector
      is needed for next frame reference motion vector.
      Change-Id: Ica50f9d7b46ad4f815bba0d9e30f5546df29546f
    • hkuang's avatar
      Fix clang ioc warning due to NULL src_mi pointer. · 4eee74d6
      hkuang authored
      The warning only happens in VP9 encoder's first pass due to src_mi
      is not set up yet. But it will not fail the encoder as left_mi and
      above_mi are not used in the first_pass and they will be set up again
      in the second pass.
      Change-Id: I12dffcd5fb1002b2b2dabb083c8726650e4b5f08
    • James Yu's avatar
      VP9 common for ARMv8 by using NEON intrinsics 01 · 5b098b18
      James Yu authored
      Add vp9_loopfilter_neon.c
      - vp9_lpf_horizontal_4_neon
      - vp9_lpf_vertical_4_neon
      - vp9_lpf_horizontal_8_neon
      - vp9_lpf_vertical_8_neon
      Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb
      Signed-off-by: 's avatarJames Yu <james.yu@linaro.org>
    • Jingning Han's avatar
      Make RTC coding flow support sub8x8 in key frame coding · 225cdef6
      Jingning Han authored
      This commit enables the use of sub8x8 blocks in RTC key frame
      encoding. It requires the block size to be preset and will decide
      the coding mode and encode the bit-stream.
      Change-Id: I35aaf8ee2d4d6085432410c7963f339f85a2c19b
    • Jingning Han's avatar
      Cosmetic naming change · 4bacaab4
      Jingning Han authored
      Rename set_modeinfo_offsets as set_mode_info_offsets, to be more
      consistent with naming convention.
      Change-Id: I68ca1f36c4a78127d9439a50c1506a2afd07927d
    • Jingning Han's avatar
      Take out redundant setting of mode_info from set_block_size · f051a7be
      Jingning Han authored
      The later encoding process will take the top-left block's
      mode_info for pre-determined block size.
      Change-Id: I76a90f9ce7f3b2dbc2975b52442114e461c465b5
    • Paul Wilkins's avatar
      Substantial restructuring of AQ mode 2. · e68c8dcf
      Paul Wilkins authored
      The restructure moves the decision into the rd pick
      modes loop and makes a decision based at the 16x16
      block level instead of only the 64x64 level.
      This gives finer granularity and better visual results
      on the clips I have tested. Metrics results are worse
      than the old AQ2 especially for PSNR and this mode
      now falls between AQ0 and AQ1 in terms of visual
      impact and metrics results.
      Further tuning of this to follow.
      It should be noted that if there are multiple iterations
      of the recode loop the segment for a MB could change
      in each loop if the previous loop causes a change in the
      complexity / variance bin of the block. Also where a block
      gets a delta Q this will alter the rd multiplier for this block
      in subsequent recode iterations and frames where the
      segmentation is applied.
      Change-Id: I20256c125daa14734c16f7cc9aefab656ab808f7
    • Jingning Han's avatar
      Remove unused rd cost calculation from nonrd_use_partition · 1395ded2
      Jingning Han authored
      The per block rd cost calculation is not needed when partition
      size is preset.
      Change-Id: Ie5575248bbffb584e908aa13097f697ace6ec747
  6. 08 Dec, 2014 1 commit
    • levytamar82's avatar
      SSSE3 Optimization for Atom processors using new instruction selection and ordering · 8f9d94ec
      levytamar82 authored
      The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors.
      By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved.
      In the original code, the PSHUBF uses every byte and is consecutively copied.
      This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result.
      For example:
      filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8
      Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
      REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7
      REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15
      PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8
      This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors.
      There was no observed performance impact on Core processors (expected).
      Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0
  7. 06 Dec, 2014 4 commits
  8. 05 Dec, 2014 5 commits