devdocs/gcc~13/vector-extensions.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86

<div class="section-level-extent" id="Vector-Extensions"> <div class="nav-panel"> <p> Next: <a href="offsetof" accesskey="n" rel="next">Support for <code class="code">offsetof</code></a>, Previous: <a href="return-address" accesskey="p" rel="prev">Getting the Return or Frame Address of a Function</a>, Up: <a href="c-extensions" accesskey="u" rel="up">Extensions to the C Language Family</a> [<a href="index#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="indices" title="Index" rel="index">Index</a>]</p> </div>  <h1 class="section" id="Using-Vector-Instructions-through-Built-in-Functions"><span>6.52 Using Vector Instructions through Built-in Functions<a class="copiable-link" href="#Using-Vector-Instructions-through-Built-in-Functions"> ¶</a></span></h1> <p>On some targets, the instruction set contains SIMD vector instructions which operate on multiple values contained in one large register at the same time. For example, on the x86 the MMX, 3DNow! and SSE extensions can be used this way. </p> <p>The first step in using these extensions is to provide the necessary data types. This should be done using an appropriate <code class="code">typedef</code>: </p> <div class="example smallexample"> <pre class="example-preformatted" data-language="cpp">typedef int v4si __attribute__ ((vector_size (16)));</pre>
</div> <p>The <code class="code">int</code> type specifies the <em class="dfn">base type</em>, while the attribute specifies the vector size for the variable, measured in bytes. For example, the declaration above causes the compiler to set the mode for the <code class="code">v4si</code> type to be 16 bytes wide and divided into <code class="code">int</code> sized units. For a 32-bit <code class="code">int</code> this means a vector of 4 units of 4 bytes, and the corresponding mode of <code class="code">foo</code> is <abbr class="acronym">V4SI</abbr>. </p> <p>The <code class="code">vector_size</code> attribute is only applicable to integral and floating scalars, although arrays, pointers, and function return values are allowed in conjunction with this construct. Only sizes that are positive power-of-two multiples of the base type size are currently allowed. </p> <p>All the basic integer types can be used as base types, both as signed and as unsigned: <code class="code">char</code>, <code class="code">short</code>, <code class="code">int</code>, <code class="code">long</code>, <code class="code">long long</code>. In addition, <code class="code">float</code> and <code class="code">double</code> can be used to build floating-point vector types. </p> <p>Specifying a combination that is not valid for the current architecture causes GCC to synthesize the instructions using a narrower mode. For example, if you specify a variable of type <code class="code">V4SI</code> and your architecture does not allow for this specific SIMD type, GCC produces code that uses 4 <code class="code">SIs</code>. </p> <p>The types defined in this manner can be used with a subset of normal C operations. Currently, GCC allows using the following operators on these types: <code class="code">+, -, *, /, unary minus, ^, |, &amp;, ~, %</code>. </p> <p>The operations behave like C++ <code class="code">valarrays</code>. Addition is defined as the addition of the corresponding elements of the operands. For example, in the code below, each of the 4 elements in <var class="var">a</var> is added to the corresponding 4 elements in <var class="var">b</var> and the resulting vector is stored in <var class="var">c</var>. </p> <div class="example smallexample"> <pre class="example-preformatted" data-language="cpp">typedef int v4si __attribute__ ((vector_size (16)));

v4si a, b, c;

c = a + b;</pre>
</div> <p>Subtraction, multiplication, division, and the logical operations operate in a similar manner. Likewise, the result of using the unary minus or complement operators on a vector type is a vector whose elements are the negative or complemented values of the corresponding elements in the operand. </p> <p>It is possible to use shifting operators <code class="code">&lt;&lt;</code>, <code class="code">&gt;&gt;</code> on integer-type vectors. The operation is defined as following: <code class="code">{a0,
a1, …, an} &gt;&gt; {b0, b1, …, bn} == {a0 &gt;&gt; b0, a1 &gt;&gt; b1,
…, an &gt;&gt; bn}</code>. Vector operands must have the same number of elements. </p> <p>For convenience, it is allowed to use a binary vector operation where one operand is a scalar. In that case the compiler transforms the scalar operand into a vector where each element is the scalar from the operation. The transformation happens only if the scalar could be safely converted to the vector-element type. Consider the following code. </p> <div class="example smallexample"> <pre class="example-preformatted" data-language="cpp">typedef int v4si __attribute__ ((vector_size (16)));

v4si a, b, c;
long l;

a = b + 1;    /* a = b + {1,1,1,1}; */
a = 2 * b;    /* a = {2,2,2,2} * b; */

a = l + a;    /* Error, cannot convert long to int. */</pre>
</div> <p>Vectors can be subscripted as if the vector were an array with the same number of elements and base type. Out of bound accesses invoke undefined behavior at run time. Warnings for out of bound accesses for vector subscription can be enabled with <samp class="option">-Warray-bounds</samp>. </p> <p>Vector comparison is supported with standard comparison operators: <code class="code">==, !=, &lt;, &lt;=, &gt;, &gt;=</code>. Comparison operands can be vector expressions of integer-type or real-type. Comparison between integer-type vectors and real-type vectors are not supported. The result of the comparison is a vector of the same width and number of elements as the comparison operands with a signed integral element type. </p> <p>Vectors are compared element-wise producing 0 when comparison is false and -1 (constant of the appropriate type where all bits are set) otherwise. Consider the following example. </p> <div class="example smallexample"> <pre class="example-preformatted" data-language="cpp">typedef int v4si __attribute__ ((vector_size (16)));

v4si a = {1,2,3,4};
v4si b = {3,2,1,4};
v4si c;

c = a &gt;  b;     /* The result would be {0, 0,-1, 0}  */
c = a == b;     /* The result would be {0,-1, 0,-1}  */</pre>
</div> <p>In C++, the ternary operator <code class="code">?:</code> is available. <code class="code">a?b:c</code>, where <code class="code">b</code> and <code class="code">c</code> are vectors of the same type and <code class="code">a</code> is an integer vector with the same number of elements of the same size as <code class="code">b</code> and <code class="code">c</code>, computes all three arguments and creates a vector <code class="code">{a[0]?b[0]:c[0], a[1]?b[1]:c[1], …}</code>. Note that unlike in OpenCL, <code class="code">a</code> is thus interpreted as <code class="code">a != 0</code> and not <code class="code">a &lt; 0</code>. As in the case of binary operations, this syntax is also accepted when one of <code class="code">b</code> or <code class="code">c</code> is a scalar that is then transformed into a vector. If both <code class="code">b</code> and <code class="code">c</code> are scalars and the type of <code class="code">true?b:c</code> has the same size as the element type of <code class="code">a</code>, then <code class="code">b</code> and <code class="code">c</code> are converted to a vector type whose elements have this type and with the same number of elements as <code class="code">a</code>. </p> <p>In C++, the logic operators <code class="code">!, &amp;&amp;, ||</code> are available for vectors. <code class="code">!v</code> is equivalent to <code class="code">v == 0</code>, <code class="code">a &amp;&amp; b</code> is equivalent to <code class="code">a!=0 &amp; b!=0</code> and <code class="code">a || b</code> is equivalent to <code class="code">a!=0 | b!=0</code>. For mixed operations between a scalar <code class="code">s</code> and a vector <code class="code">v</code>, <code class="code">s &amp;&amp; v</code> is equivalent to <code class="code">s?v!=0:0</code> (the evaluation is short-circuit) and <code class="code">v &amp;&amp; s</code> is equivalent to <code class="code">v!=0 &amp; (s?-1:0)</code>. </p>  <p>Vector shuffling is available using functions <code class="code">__builtin_shuffle (vec, mask)</code> and <code class="code">__builtin_shuffle (vec0, vec1, mask)</code>. Both functions construct a permutation of elements from one or two vectors and return a vector of the same type as the input vector(s). The <var class="var">mask</var> is an integral vector with the same width (<var class="var">W</var>) and element count (<var class="var">N</var>) as the output vector. </p> <p>The elements of the input vectors are numbered in memory ordering of <var class="var">vec0</var> beginning at 0 and <var class="var">vec1</var> beginning at <var class="var">N</var>. The elements of <var class="var">mask</var> are considered modulo <var class="var">N</var> in the single-operand case and modulo <em class="math">2*<var class="var">N</var></em> in the two-operand case. </p> <p>Consider the following example, </p> <div class="example smallexample"> <pre class="example-preformatted" data-language="cpp">typedef int v4si __attribute__ ((vector_size (16)));

v4si a = {1,2,3,4};
v4si b = {5,6,7,8};
v4si mask1 = {0,1,1,3};
v4si mask2 = {0,4,2,5};
v4si res;

res = __builtin_shuffle (a, mask1);       /* res is {1,2,2,4}  */
res = __builtin_shuffle (a, b, mask2);    /* res is {1,5,3,6}  */</pre>
</div> <p>Note that <code class="code">__builtin_shuffle</code> is intentionally semantically compatible with the OpenCL <code class="code">shuffle</code> and <code class="code">shuffle2</code> functions. </p> <p>You can declare variables and use them in function calls and returns, as well as in assignments and some casts. You can specify a vector type as a return type for a function. Vector types can also be used as function arguments. It is possible to cast from one vector type to another, provided they are of the same size (in fact, you can also cast vectors to and from other datatypes of the same size). </p> <p>You cannot operate between vectors of different lengths or different signedness without a cast. </p>  <p>Vector shuffling is available using the <code class="code">__builtin_shufflevector (vec1, vec2, index...)</code> function. <var class="var">vec1</var> and <var class="var">vec2</var> must be expressions with vector type with a compatible element type. The result of <code class="code">__builtin_shufflevector</code> is a vector with the same element type as <var class="var">vec1</var> and <var class="var">vec2</var> but that has an element count equal to the number of indices specified. </p> <p>The <var class="var">index</var> arguments are a list of integers that specify the elements indices of the first two vectors that should be extracted and returned in a new vector. These element indices are numbered sequentially starting with the first vector, continuing into the second vector. An index of -1 can be used to indicate that the corresponding element in the returned vector is a don’t care and can be freely chosen to optimized the generated code sequence performing the shuffle operation. </p> <p>Consider the following example, </p>
<div class="example smallexample"> <pre class="example-preformatted" data-language="cpp">typedef int v4si __attribute__ ((vector_size (16)));
typedef int v8si __attribute__ ((vector_size (32)));

v8si a = {1,-2,3,-4,5,-6,7,-8};
v4si b = __builtin_shufflevector (a, a, 0, 2, 4, 6); /* b is {1,3,5,7} */
v4si c = {-2,-4,-6,-8};
v8si d = __builtin_shufflevector (c, b, 4, 0, 5, 1, 6, 2, 7, 3); /* d is a */</pre>
</div>  <p>Vector conversion is available using the <code class="code">__builtin_convertvector (vec, vectype)</code> function. <var class="var">vec</var> must be an expression with integral or floating vector type and <var class="var">vectype</var> an integral or floating vector type with the same number of elements. The result has <var class="var">vectype</var> type and value of a C cast of every element of <var class="var">vec</var> to the element type of <var class="var">vectype</var>. </p> <p>Consider the following example, </p>
<div class="example smallexample"> <pre class="example-preformatted" data-language="cpp">typedef int v4si __attribute__ ((vector_size (16)));
typedef float v4sf __attribute__ ((vector_size (16)));
typedef double v4df __attribute__ ((vector_size (32)));
typedef unsigned long long v4di __attribute__ ((vector_size (32)));

v4si a = {1,-2,3,-4};
v4sf b = {1.5f,-2.5f,3.f,7.f};
v4di c = {1ULL,5ULL,0ULL,10ULL};
v4sf d = __builtin_convertvector (a, v4sf); /* d is {1.f,-2.f,3.f,-4.f} */
/* Equivalent of:
   v4sf d = { (float)a[0], (float)a[1], (float)a[2], (float)a[3] }; */
v4df e = __builtin_convertvector (a, v4df); /* e is {1.,-2.,3.,-4.} */
v4df f = __builtin_convertvector (b, v4df); /* f is {1.5,-2.5,3.,7.} */
v4si g = __builtin_convertvector (f, v4si); /* g is {1,-2,3,7} */
v4si h = __builtin_convertvector (c, v4si); /* h is {1,5,0,10} */</pre>
</div>  <p>Sometimes it is desirable to write code using a mix of generic vector operations (for clarity) and machine-specific vector intrinsics (to access vector instructions that are not exposed via generic built-ins). On x86, intrinsic functions for integer vectors typically use the same vector type <code class="code">__m128i</code> irrespective of how they interpret the vector, making it necessary to cast their arguments and return values from/to other vector types. In C, you can make use of a <code class="code">union</code> type: </p>
<div class="example smallexample"> <pre class="example-preformatted" data-language="cpp">#include &lt;immintrin.h&gt;

typedef unsigned char u8x16 __attribute__ ((vector_size (16)));
typedef unsigned int  u32x4 __attribute__ ((vector_size (16)));

typedef union {
        __m128i mm;
        u8x16   u8;
        u32x4   u32;
} v128;</pre>
</div> <p>for variables that can be used with both built-in operators and x86 intrinsics: </p> <div class="example smallexample"> <pre class="example-preformatted" data-language="cpp">v128 x, y = { 0 };
memcpy (&amp;x, ptr, sizeof x);
y.u8  += 0x80;
x.mm  = _mm_adds_epu8 (x.mm, y.mm);
x.u32 &amp;= 0xffffff;

/* Instead of a variable, a compound literal may be used to pass the
   return value of an intrinsic call to a function expecting the union: */
v128 foo (v128);
x = foo ((v128) {_mm_adds_epu8 (x.mm, y.mm)});</pre>
</div> </div>  <div class="nav-panel"> <p> Next: <a href="offsetof">Support for <code class="code">offsetof</code></a>, Previous: <a href="return-address">Getting the Return or Frame Address of a Function</a>, Up: <a href="c-extensions">Extensions to the C Language Family</a> [<a href="index#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="indices" title="Index" rel="index">Index</a>]</p> </div><div class="_attribution">
  <p class="_attribution-p">
    &copy; Free Software Foundation<br>Licensed under the GNU Free Documentation License, Version 1.3.<br>
    <a href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Vector-Extensions.html" class="_attribution-link">https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Vector-Extensions.html</a>
  </p>
</div>