1. 11 Dec, 2014 3 commits
    • JackyChen's avatar
      Multiframe Quality Enhancement(MFQE) in VP9. · 7ac3e3c1
      JackyChen authored
      It is the first version of MFQE in VP9. There are a few TODOs included
      in this version.
      Usage: Add flag --enable-vp9-postproc to config the project.
      In decoder, use flag --mfqe in the command line to enable
      MFQE in postproc.
      Note: Need to have key frame with low quality to see the effect of this
      new patch. In my experiment, I fixed the qindex to 200 in key frame.
      
      Change-Id: I021f9ce4616ed3574c81e48d968662994b56a396
      7ac3e3c1
    • James Yu's avatar
      VP9 common for ARMv8 by using NEON intrinsics 18 · 3f7c12da
      James Yu authored
      Add vp9_idct32x32_add_neon.c
      - vp9_idct32x32_1024_add_neon
      
      Change-Id: Ic598b772c28bd3487a8ead7a4598a66b25f9b00f
      Signed-off-by: 's avatarJames Yu <james.yu@linaro.org>
      3f7c12da
    • James Yu's avatar
      VP9 common for ARMv8 by using NEON intrinsics 14 · 3cfed4bf
      James Yu authored
      Add vp9_idct16x16_add_neon.c
      - vp9_idct16x16_256_add_neon_pass1
      - vp9_idct16x16_256_add_neon_pass2
      - vp9_idct16x16_10_add_neon_pass1
      - vp9_idct16x16_10_add_neon_pass2
      
      Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de
      Signed-off-by: 's avatarJames Yu <james.yu@linaro.org>
      3cfed4bf
  2. 10 Dec, 2014 10 commits
  3. 09 Dec, 2014 1 commit
  4. 23 Sep, 2014 1 commit
  5. 18 Sep, 2014 1 commit
  6. 16 Sep, 2014 1 commit
  7. 05 Sep, 2014 1 commit
    • Dmitry Kovalev's avatar
      Removing postproc mmx code. · 1100e262
      Dmitry Kovalev authored
      Removed functions:
      * vp9_post_proc_down_and_across_mmx
      * vp9_mbpost_proc_down_mmx
      * vp9_plane_add_noise_mmx
      
      They all have sse2 equivalent.
      
      Change-Id: I59c1fac12b7c96ca4538d455e4400c2b7875feff
      1100e262
  8. 05 Aug, 2014 1 commit
    • Johann's avatar
      Remove vp9_postproc_x86.h · 7516abc7
      Johann authored
      This configuration has moved to vp9_rtcd_defs.pl
      
      Change-Id: I71a31dbb8d79df226b60dd834324a5af69956c51
      7516abc7
  9. 07 Jul, 2014 1 commit
    • hkuang's avatar
      Move vp9_thread.* to common. · 337e8015
      hkuang authored
      Prepare for frame parallel decoding, the reference count buffers
      need to be protected by mutex. Move vp9_thread.* to common
      folder so that those buffers could use cross-platform mutex
      from vp9_thread.*.
      
      Change-Id: I541277cf15eefed6641555944f67f4a0bcdc8154
      337e8015
  10. 23 May, 2014 1 commit
    • Jingning Han's avatar
      Inverse 16x16 2D-DCT SSSE3 implementation · 48b08913
      Jingning Han authored
      This commit enables the SSSE3 implementation of full inverse 16x16
      2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
      about 7% speed-up.
      
      Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
      48b08913
  11. 22 May, 2014 1 commit
  12. 21 May, 2014 1 commit
    • Deb Mukherjee's avatar
      Renames x86_64 specific asm files · e2722734
      Deb Mukherjee authored
      Renames all x86_64 specific assembly files to consistently
      end in _x86_64.asm. This will be useful for build systems to
      handle these files differently.
      All new 64-bit specific assembly files should use the new
      naming convention.
      
      Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
      e2722734
  13. 12 May, 2014 1 commit
  14. 05 May, 2014 1 commit
    • Jingning Han's avatar
      SSSE3 implementation of full inverse 8x8 2D-DCT · 52ae97b6
      Jingning Han authored
      This commit enables SSSE3 version full inverse 8x8 2D-DCT and
      reconstruction. It makes the runtime of vp9_idct8x8_64_add down
      from 256 cycles (SSE2) to 246 cycles.
      
      Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
      52ae97b6
  15. 06 Mar, 2014 1 commit
  16. 03 Mar, 2014 1 commit
    • James Zern's avatar
      build: convert rtcd.sh to perl · 805078a1
      James Zern authored
      significantly speeds up file generation.
      
      the goal of this change is to convert rtcd.sh to perl as directly as
      possible to allow for simple comparison. future changes can make it more
      perl-like.
      
      ---
      Linux
          [CREATE] vpx_scale_rtcd.h
      real    0m0.485s ->    0m0.022s
          [CREATE] vp8_rtcd.h
      real    0m4.619s ->    0m0.060s
          [CREATE] vp9_rtcd.h
      real    0m10.102s ->    0m0.087s
      
      Windows
          [CREATE] vpx_scale_rtcd.h
      real    0m8.360s ->    0m0.080s
          [CREATE] vp8_rtcd.h
      real    1m8.083s ->    0m0.160s
          [CREATE] vp9_rtcd.h
      real    2m6.489s ->    0m0.233s
      
      Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
      805078a1
  17. 27 Feb, 2014 1 commit
  18. 14 Feb, 2014 1 commit
    • levytamar82's avatar
      SSSE3 convolution optimization · 3068d7d9
      levytamar82 authored
      Optimizing all SSSE3 assembly for convolution:
      1. vp9_filter_block1d4_h8_sse2
      2. vp9_filter_block1d8_h8_sse2
      3. vp9_filter_block1d16_h8_sse2
      4. vp9_filter_block1d4_v8_sse2
      5. vp9_filter_block1d8_v8_sse2
      6. vp9_filter_block1d16_v8_sse2
      my optimization include:
      -processing 2x8 elements in one 128 bit register instead of processing
      8 elements in one 128 bit register.
      -removing unecessary loads.
      This optimization gives between 2.4% user level gain for 480p input
      and 1.6% user level gain for 720p.
      This Optimization is done only for 64 bit
      
      Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b
      3068d7d9
  19. 13 Feb, 2014 1 commit
    • levytamar82's avatar
      AVX2 Convolve Optimization · 876c72a0
      levytamar82 authored
      Two convolve functions were optimized for AVX2:
      1. vp9_filter_block1d16_h8
      2. vp9_filter_block1d16_v8
      vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of
      loop strides by half, two strides were processed in parallel.
      vp9_filter_block1d16_v8 was also optimized in the same way also some of the
      loads were being done outside of the loop and by that preventing redundant
      loads.
      This Optimization gives 43% function level gain and 1.3% user level gain.
      Now can be compiled in Windows
      
      Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
      876c72a0
  20. 10 Feb, 2014 1 commit
    • Frank Galligan's avatar
      Add get release decoder frame buffer functions. · e8e15279
      Frank Galligan authored
      This CL changes libvpx to call a function when a frame buffer
      is needed for decode. Libvpx will call a release callback when
      no other frames reference the frame buffer. This CL adds a
      default implementation of the frame buffer callbacks. Currently
      only VP9 is supported. A future CL will add support for
      applications to supply their own frame buffer callbacks.
      
      Change-Id: I1405a320118f1cdd95f80c670d52b085a62cb10d
      e8e15279
  21. 05 Feb, 2014 1 commit
    • James Zern's avatar
      *.mk: s/\bUSE_X86INC/CONFIG_USE_X86INC/ · 7cf0c783
      James Zern authored
      CONFIG_USE_X86INC is available to every makefile, there's no need to
      duplicate its value with USE_X86INC
      
      Change-Id: Id12bd5f09cba78abba56ab5a8f56351562e5b8b6
      7cf0c783
  22. 04 Feb, 2014 1 commit
  23. 03 Feb, 2014 1 commit
    • Yunqing Wang's avatar
      Optimize bilinear sub-pixel filters in sse2 · 2488cb34
      Yunqing Wang authored
      Using bilinear filters could speed up the codec in real-time mode.
      This patch added sse2 optimizations of bilinear filters that
      operate on different-sized blocks.
      
      Tests showed that the real-time encoder was speeded up by 3%.
      
      Change-Id: If99a7ee4385fcc225c3ee7445d962d5752e57c3f
      2488cb34
  24. 01 Feb, 2014 2 commits
  25. 17 Jan, 2014 1 commit
  26. 13 Jan, 2014 1 commit
  27. 10 Jan, 2014 1 commit
  28. 09 Jan, 2014 1 commit
    • levytamar82's avatar
      SSSE3 convolution optimization · 511d218c
      levytamar82 authored
      Optimizing all SSSE3 assembly for convolution:
      1. vp9_filter_block1d4_h8_sse2
      2. vp9_filter_block1d8_h8_sse2
      3. vp9_filter_block1d16_h8_sse2
      4. vp9_filter_block1d4_v8_sse2
      5. vp9_filter_block1d8_v8_sse2
      6. vp9_filter_block1d16_v8_sse2
      my optimization include:
      -processing 2x8 elements in one 128 bit register instead of processing
      8 elements in one 128 bit register.
      -removing unecessary loads.
      This optimization gives between 2.4% user level gain for 480p input
      and 1.6% user level gain for 720p.
      This Optimization done only for 64bit.
      
      Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
      511d218c