Commit 3ce88e13 authored by Allan Sandfeld Jensen's avatar Allan Sandfeld Jensen
Browse files

Add AVX2 autovectorized versions of premultiply


Following up on using GCC's autovectorizing for faster SSE4.1
premultiply, this patch adds specialized autovectorized versions
of premultiply for AVX2, giving another almost doubling in speed.
To make the speed up for AVX2 and also SSE4_1 available to non-GCC
compilers, the target-specific methods have been moved to separate
files.

Change-Id: I97ce05be67f4adeeb9a096eef80fd5fb662099f3
Reviewed-by: default avatarGunnar Sletta <gunnar@sletta.org>
parent 1fc6056f
No related merge requests found
Showing with 321 additions and 90 deletions
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment