• Yunqing Wang's avatar
    Improve sad3x16 SSE2 function · e7cd8071
    Yunqing Wang authored
    Vp9_sad3x16_sse2() is heavily called in decoder, in which the
    unaligned reads consume lots of cpu cycles. When CONFIG_SUBPELREFMV
    is off, the unaligned offset is 1. In this situation,
    we can adjust the src_ptr to be 4-byte aligned, and then do the
    aligned reads. This reduced the reading time significantly. Tests
    on 1080p clip showed over 2% decoder performance gain with
    CONFIG_SUBPELREFM off.
    
    Change-Id: I953afe3ac5406107933ef49d0b695eafba9a6507
    e7cd8071