diff options
Diffstat (limited to 'devdocs/gcc~13/half-precision.html')
| -rw-r--r-- | devdocs/gcc~13/half-precision.html | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/devdocs/gcc~13/half-precision.html b/devdocs/gcc~13/half-precision.html new file mode 100644 index 00000000..d10c787c --- /dev/null +++ b/devdocs/gcc~13/half-precision.html @@ -0,0 +1,6 @@ +<div class="section-level-extent" id="Half-Precision"> <div class="nav-panel"> <p> Next: <a href="decimal-float" accesskey="n" rel="next">Decimal Floating Types</a>, Previous: <a href="floating-types" accesskey="p" rel="prev">Additional Floating Types</a>, Up: <a href="c-extensions" accesskey="u" rel="up">Extensions to the C Language Family</a> [<a href="index#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="indices" title="Index" rel="index">Index</a>]</p> </div> <h1 class="section" id="Half-Precision-Floating-Point"><span>6.13 Half-Precision Floating Point<a class="copiable-link" href="#Half-Precision-Floating-Point"> ΒΆ</a></span></h1> <p>On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating point via the <code class="code">__fp16</code> type defined in the ARM C Language Extensions. On ARM systems, you must enable this type explicitly with the <samp class="option">-mfp16-format</samp> command-line option in order to use it. On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit) floating point via the <code class="code">_Float16</code> type. For C++, x86 provides a builtin type named <code class="code">_Float16</code> which contains same data format as C. </p> <p>ARM targets support two incompatible representations for half-precision floating-point values. You must choose one of the representations and use it consistently in your program. </p> <p>Specifying <samp class="option">-mfp16-format=ieee</samp> selects the IEEE 754-2008 format. This format can represent normalized values in the range of <em class="math">2^{-14}</em> to 65504. There are 11 bits of significand precision, approximately 3 decimal digits. </p> <p>Specifying <samp class="option">-mfp16-format=alternative</samp> selects the ARM alternative format. This representation is similar to the IEEE format, but does not support infinities or NaNs. Instead, the range of exponents is extended, so that this format can represent normalized values in the range of <em class="math">2^{-14}</em> to 131008. </p> <p>The GCC port for AArch64 only supports the IEEE 754-2008 format, and does not require use of the <samp class="option">-mfp16-format</samp> command-line option. </p> <p>The <code class="code">__fp16</code> type may only be used as an argument to intrinsics defined in <code class="code"><arm_fp16.h></code>, or as a storage format. For purposes of arithmetic and other operations, <code class="code">__fp16</code> values in C or C++ expressions are automatically promoted to <code class="code">float</code>. </p> <p>The ARM target provides hardware support for conversions between <code class="code">__fp16</code> and <code class="code">float</code> values as an extension to VFP and NEON (Advanced SIMD), and from ARMv8-A provides hardware support for conversions between <code class="code">__fp16</code> and <code class="code">double</code> values. GCC generates code using these hardware instructions if you compile with options to select an FPU that provides them; for example, <samp class="option">-mfpu=neon-fp16 -mfloat-abi=softfp</samp>, in addition to the <samp class="option">-mfp16-format</samp> option to select a half-precision format. </p> <p>Language-level support for the <code class="code">__fp16</code> data type is independent of whether GCC generates code using hardware floating-point instructions. In cases where hardware support is not specified, GCC implements conversions between <code class="code">__fp16</code> and other types as library calls. </p> <p>It is recommended that portable code use the <code class="code">_Float16</code> type defined by ISO/IEC TS 18661-3:2015. See <a class="xref" href="floating-types">Additional Floating Types</a>. </p> <p>On x86 targets with SSE2 enabled, without <samp class="option">-mavx512fp16</samp>, all operations will be emulated by software emulation and the <code class="code">float</code> instructions. The default behavior for <code class="code">FLT_EVAL_METHOD</code> is to keep the intermediate result of the operation as 32-bit precision. This may lead to inconsistent behavior between software emulation and AVX512-FP16 instructions. Using <samp class="option">-fexcess-precision=16</samp> will force round back after each operation. </p> <p>Using <samp class="option">-mavx512fp16</samp> will generate AVX512-FP16 instructions instead of software emulation. The default behavior of <code class="code">FLT_EVAL_METHOD</code> is to round after each operation. The same is true with <samp class="option">-fexcess-precision=standard</samp> and <samp class="option">-mfpmath=sse</samp>. If there is no <samp class="option">-mfpmath=sse</samp>, <samp class="option">-fexcess-precision=standard</samp> alone does the same thing as before, It is useful for code that does not have <code class="code">_Float16</code> and runs on the x87 FPU. </p> </div> <div class="nav-panel"> <p> Next: <a href="decimal-float">Decimal Floating Types</a>, Previous: <a href="floating-types">Additional Floating Types</a>, Up: <a href="c-extensions">Extensions to the C Language Family</a> [<a href="index#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="indices" title="Index" rel="index">Index</a>]</p> </div><div class="_attribution"> + <p class="_attribution-p"> + © Free Software Foundation<br>Licensed under the GNU Free Documentation License, Version 1.3.<br> + <a href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Half-Precision.html" class="_attribution-link">https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Half-Precision.html</a> + </p> +</div> |
