1. 28 Jul, 2015 2 commits
  2. 26 Jul, 2015 1 commit
    • Jingning Han's avatar
      Refactor vp9_idct.h file · 5ebc8feb
      Jingning Han authored
      Separate the common coefficient constant into vpx_dsp/txfm_common.h.
      Move the SSE2 macro definitions to vpx_dsp/x86/txfm_common_sse2.h.
      This clears the use case of vp9_idct.h in vpx_dsp folder.
      
      Change-Id: I319735a2abf42888e5080ac14cfbcde34be7b121
      5ebc8feb
  3. 21 Jul, 2015 1 commit
  4. 08 Jul, 2015 1 commit
  5. 15 May, 2015 2 commits
  6. 04 Dec, 2014 1 commit
  7. 02 Dec, 2014 1 commit
  8. 05 Nov, 2014 1 commit
  9. 01 May, 2014 1 commit
  10. 24 Oct, 2013 1 commit
  11. 17 Oct, 2013 1 commit
  12. 04 Oct, 2013 1 commit
  13. 03 Oct, 2013 1 commit
    • A.Mahfoodh's avatar
      Simplifying and inlining k_cvtlo_epi16 and k_cvthi_epi16 · 5215b83a
      A.Mahfoodh authored
      Simplify the k_cvtlo_epi16 and k_cvthi_epi16 to only two
      instructions. Then inlined them.
      
      quoting from intel MMX_App_Compute_16bit_Vector.pdf‎
      "The PMADDWD instruction multiplies four
      pairs of 16-bit numbers and produces partial sums of the results
      and can do so once per clock (with a three-clock latency)."
      so I am assuming that there will be three clock overhead after the
      last _mm_madd_pi16 command.
      Even with the overhead the number of clocks in general should be
      smaller. I am not sure though becasue I could not find information
      about number of clocks required for instructions in k_cvtlo_epi16
      and k_cvthi_epi16. I will run a test and compare the execution time.
      
      Change-Id: Ieda4aa338f69ad3dd196ac6e7892da3cf1b47ea7
      5215b83a
  14. 01 Sep, 2013 1 commit
    • Jingning Han's avatar
      Fix 32x32 forward transform SSE2 version · 3cf46fa5
      Jingning Han authored
      This commit fixed the potential overflow issue in the SSE2
      implementation of 32x32 forward DCT. It resolved the corrupted
      coded frames in the border of scenes.
      
      Change-Id: If87eef2d46209269f74ef27e7295b6707fbf56f9
      3cf46fa5
  15. 12 Aug, 2013 1 commit
    • Jingning Han's avatar
      SSE2 high precision 32x32 forward DCT · 78136edc
      Jingning Han authored
      Enable SSE2 implementation of high precision 32x32 forward DCT. The
      intermediate stacks are of 32-bits. The run-time goes down from
      32126 cycles to 13442 cycles.
      
      Change-Id: Ib5ccafe3176c65bd6f2dbdef790bd47bbc880e56
      78136edc
  16. 06 Aug, 2013 2 commits