Skip to content

Latest commit

 

History

History
54 lines (39 loc) · 2.46 KB

FP_error.md

File metadata and controls

54 lines (39 loc) · 2.46 KB

Function to bound

The function we compute for atomic and reduction use case is

f(N,M)=\sum^{N \cdot M}\frac{1}{M}

Where N is know and M in in [1,M_{max}] and u is the machine epsilon

Relative upper bound error

The upper bound of the Relative error is given in "The Accuracy of Floating Point Summation" (2.6) from Nicholas J. Higham https://doi.org/10.1137/0914050. Applied to f(N,M) it give:

\begin{align*} | RE(f(N,M)) | & \le \frac{(N \cdot M-1) \cdot u \cdot f(N,M)}{N} \ & \le \frac{(N \cdot M-1) \cdot u \cdot N}{N} \ & \le N \cdot M \cdot u \end{align*}

Maximun Relative Error

\begin{align*} \max_{M \in [1,M_{max}]} R_E(N,M) = & \max_{M \in [1,M_{max}]}  N \cdot M \cdot u \ = & N \cdot M_{max} \cdot u \end{align*}

For FP32, maximun problem size to get less than 1%

\begin{align*} 1% & \ge NM \cdot 2^{-24} \ NM & \le \frac{1%}{2^{-24}} \ & \le 167772 \le 55 \cdot 55 \cdot 55 \end{align*}

Empirical analysis

import numpy as np
import sys

def sum_dec32(N,M):
	inc = np.float32(1) / np.float32(M)
	s = np.float32(0)
	for i in range(N*M):
		s += inc

	return abs(N-s)/N

N, M_max = map(int,sys.argv[1:])
m_error,m = max((sum_dec32(N,M),M) for M in range(1,N_max+1))
print (m, f"{m_error:.3%}".format(m_error))

This give for N = 55*55 and M = 55 a error of 0.187%.