30
Introduction to floating point arithmetic Matthias Petschow and Paolo Bientinesi AICES, RWTH Aachen [email protected] October 24th, 2013 Aachen, Germany Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 1 / 21

Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Introduction to floating point arithmetic

Matthias Petschow and Paolo Bientinesi

AICES, RWTH [email protected]

October 24th, 2013Aachen, Germany

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 1 / 21

Page 2: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

“Disclaimer”

Muller et al. - Handbook ofFloating-Point Arithmetic -595 pages!

Many topics not covered in thislecture (e.g., hardware/softwareimplementation of FPA, languagesupport, cleverly using FPA)

This lecture: basics of floatingpoint representation and arithmetic

Goal: Make you aware of the issuesarising in finite precisioncomputations

See the references below for a morethorough treatment of the topic

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 2 / 21

Page 3: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

References

Article: “What every computer scientist should know

about floating-point arithmetic”, by David Goldberg

Article: “Lecture Notes on the Status of IEEE

Standard 754 for Binary Floating-Point

Arithmetic”, by William Kahan

Book: “Accuracy and Stability of Numerical

Algorithms”, by Nick Higham

Book: “Numerical Computing with IEEE Floating

Point Arithmetic”, by Michael Overton

IEEE 754-1985 and IEEE 754-2008:“Standard for Floating-Point Arithmetic”

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 3 / 21

Page 4: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Numerical Representation

Numbers123

29= (first 40 digits)

4.241379310344827586206896551724137931034 . . .

π =3.141592653589793238462643383279502884197 . . .

In general: Infinite number of digits

Computers

Finite memory ⇒ Approximated numbers

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 4 / 21

Page 5: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Numerical Representation

Numbers123

29= (first 40 digits)

4.241379310344827586206896551724137931034 . . .

π =3.141592653589793238462643383279502884197 . . .

In general: Infinite number of digits

Computers

Finite memory ⇒ Approximated numbers

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 4 / 21

Page 6: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

FP representation

Floating point system F\{0}

(−1)s × d0.d1d2d3 . . . dp−1 × βewith base β, precision p, and

s ∈ {0, 1}di ∈ {0, . . . , β − 1}d0 6= 0

emin ≤ e ≤ emax

IEEE-754 specifications (β = 2, d0 = 1)

Single: p = 24, emin = −126, emax = 127Double: p = 53, emin = −1022, emax = 1023Additionally: ±0, ±∞, subnormal numbers, NaNs

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 5 / 21

Page 7: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

IEEE single precision

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 6 / 21

Page 8: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

IEEE single precision

(−1)s × (1 + f)× 2E−127

f = d1 · 2−1 + d2 · 2−2 + d3 · 2−3 + . . .+ d23 · 2−23

e = E − 127, biased exponent 1 ≤ E ≤ 254

Special values for E = 0:F = 0 : ±0F 6= 0 : subnormal numbers with d0 = 0 and implicit

exponent e = −126

Special values for E = 255:F = 0 : ±∞F 6= 0 : NaNs

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 7 / 21

Page 9: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

IEEE double precision

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 8 / 21

Page 10: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Example questions

Question 1What is the largest finite floating point number is IEEE singleprecision?

Question 2What is the smallest positive normalized floating point number isIEEE single precision? What is the smallest positive number?

Question 3How many normalized IEEE single precision numbers and how manysubnormal numbers are there?

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 9 / 21

Page 11: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Example questions

Question 4

Given IEEE single/double precision (p = 24, 53), what is the gapbetween 1 and the next larger number?

Question 5Between an adjacent pair of nonzero IEEE single precision realnumbers, how many IEEE double precision numbers are there?

Question 6What is the largest integer u such that all integer in the interval[−u, u] are exactly representable in IEEE single precision format?What is the corresponding u for double precision?

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 10 / 21

Page 12: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Representation error

Relative rounding error

Let x ∈ R and |x| ∈ [ωmin, ωmax], then

x̄ = x(1 + δ) where |δ| ≤ u

andx̄ = x/(1 + δ̃) where |δ̃| ≤ u

with u denoting the unit roundoff.

ωmin = smallest normalized positive floating point number

ωmax = largest normalized positive floating point number

x̄ = [x] = floating point representation of x

If rounded to nearest float, x̄ = RN(x) and u = 2−p

For other rounding modes, u = 2−p+1

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 11 / 21

Page 13: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Example questions

TaskGiven a base 2 floating point format with precision p, show that ifx ∈ R lies in the normalized range thanRN(x) = x(1 + δ), with |δ| ≤ 2−p,where RN() rounds to the nearest floating point number.

Question

What if RN(x) is subnormal? Find an example where the above isnot true.

Question

What if we truncate – that is round to zero RZ() – instead ofrounding to the nearest number RN()?

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 12 / 21

Page 14: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Finite precision arithmetic4-digit representation

Inexact Arithmetic

123.4 +.5678 =

123.9678 =

Truncated 123.9Rounded 124.0

Associativity?

No!

Exact arithmetic:(123.4 + .5678) + .5432 =123.4 + (.5678 + .5432)

Inexact arithmetic:(123.4 + .5678) + .5432 = 124.4123.4 + (.5678 + .5432) = 124.5

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 13 / 21

Page 15: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Finite precision arithmetic4-digit representation

Inexact Arithmetic

123.4 +.5678 =

123.9678 =

Truncated 123.9Rounded 124.0

Associativity?

No!

Exact arithmetic:(123.4 + .5678) + .5432 =123.4 + (.5678 + .5432)

Inexact arithmetic:(123.4 + .5678) + .5432 = 124.4123.4 + (.5678 + .5432) = 124.5

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 13 / 21

Page 16: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Finite precision arithmetic4-digit representation

Inexact Arithmetic

123.4 +.5678 =

123.9678 =Truncated 123.9

Rounded 124.0

Associativity?

No!

Exact arithmetic:(123.4 + .5678) + .5432 =123.4 + (.5678 + .5432)

Inexact arithmetic:(123.4 + .5678) + .5432 = 124.4123.4 + (.5678 + .5432) = 124.5

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 13 / 21

Page 17: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Finite precision arithmetic4-digit representation

Inexact Arithmetic

123.4 +.5678 =

123.9678 =Truncated 123.9

Rounded 124.0

Associativity?

No!

Exact arithmetic:(123.4 + .5678) + .5432 =123.4 + (.5678 + .5432)

Inexact arithmetic:(123.4 + .5678) + .5432 = 124.4123.4 + (.5678 + .5432) = 124.5

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 13 / 21

Page 18: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Finite precision arithmetic4-digit representation

Inexact Arithmetic

123.4 +.5678 =

123.9678 =Truncated 123.9

Rounded 124.0

Associativity?

No!

Exact arithmetic:(123.4 + .5678) + .5432 =123.4 + (.5678 + .5432)

Inexact arithmetic:(123.4 + .5678) + .5432 = 124.4

123.4 + (.5678 + .5432) = 124.5

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 13 / 21

Page 19: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Finite precision arithmetic4-digit representation

Inexact Arithmetic

123.4 +.5678 =

123.9678 =Truncated 123.9

Rounded 124.0

Associativity? No!

Exact arithmetic:(123.4 + .5678) + .5432 =123.4 + (.5678 + .5432)

Inexact arithmetic:(123.4 + .5678) + .5432 = 124.4123.4 + (.5678 + .5432) = 124.5

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 13 / 21

Page 20: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

IEEE FP arithmetic

The standard floating point model

Given operation � ∈ {+,−,×, /} and x, y ∈ F0,

[x � y] = RN(x � y) = (x � y)(1 + δ)

with |δ| ≤ u, provided no underflow/overflow occurred.

RN() can be replaced by other rounding modes.

For RN() we have seen that u = 2−p.

Also, (x � y)(1 + δ) can be replaced by (x � y)/(1 + δ).

Similarly,√x = RN(x).

What about sin(x), cos(x), exp(x), log(x), ...?

Notation: Assuming a left-to-right evaluation, it holds

[x+ y + z/w] =[[

[x] + [y]]

+[[z]/[w]

]]Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 14 / 21

Page 21: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Example questions

Question 1

If x is a floating point number, is the floating point product [1× x]equal to x?

Question 2

If x 6= 0 is a (finite) floating point number, is the floating pointquotient [x/x] equal to 1?

Question 3If x is a floating point number, is the floating point product[0.5× x] equal to floating point quotient [x/2]?

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 15 / 21

Page 22: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Example questions

Question 4

Is it true that for all a, b ∈ R we have [a+ b] = [b+ a] and[a× b] = [b× a]?

Question 5

Let a = −1, b = 1, and c = 2−25. In IEEE single precisionarithmetic, what are the results of [a+ [b+ c]] and [[a+ b] + c]?

Question 6

Let a = b = 2513, and c = 2−1022. In IEEE double precisionarithmetic, what are the results of [a× [b× c]] and [[a× b]× c]?

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 16 / 21

Page 23: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Example question

Question 7

What is the result calling the function fun(2.0,0.0) defined below(using IEEE floating point values)?

fun(a, b) {res = 1/(1/a+ 1/b)return(res) }

Question 8What is a potential problem of the following program? How can itbe fixed?

hypot(a, b) {c = a2 + b2

c = sqrt(c)return(c) }

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 17 / 21

Page 24: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Example questions

Task

Calculating the roots of a quadratic polynomial with roots() asbelow can cause inaccurate results (e.g. a = c = 1, b = −108 fordouble precision). Write a code that avoids cancellation as much aspossible, e.g. by using r1r2 = c/a. Are there other potentialproblems, such as overflow etc.?

roots(a, b, c) {r1 = −b+

√b2−4ac2a

r2 = −b−√b2−4ac2a

return(r1, r2) }

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 18 / 21

Page 25: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Error Analysis

f : A→ B, y = f(x)

Es.: f(x) = x2 + sin(2 ∗ x)

x =π

123, f(x) =?

Exact arithmetic:( π

123

)2+ sin

(2 ∗ π

123

)= ...

Inexact arithmetic:

x x̂, f f̂ f̂(x̂) instead of f(x)

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 19 / 21

Page 26: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Error Analysis

f : A→ B, y = f(x)

Es.: f(x) = x2 + sin(2 ∗ x)

x =π

123, f(x) =?

Exact arithmetic:( π

123

)2+ sin

(2 ∗ π

123

)= ...

Inexact arithmetic:

x x̂, f f̂ f̂(x̂) instead of f(x)

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 19 / 21

Page 27: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Error Analysis

f : A→ B, y = f(x)

Es.: f(x) = x2 + sin(2 ∗ x)

x =π

123, f(x) =?

Exact arithmetic:( π

123

)2+ sin

(2 ∗ π

123

)= ...

Inexact arithmetic:

x x̂, f f̂ f̂(x̂) instead of f(x)

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 19 / 21

Page 28: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Example: Dot Product

x, y ∈ Rn; κ := xT y

κ :=((

(χ0ψ0 + χ1ψ1) + · · ·)

+ χn−2ψn−2

)+ χn−1ψn−1

κ̌ =

(((χ0ψ0(1 + ε

(0)∗ ) + χ1ψ1(1 + ε

(1)∗ ))(1 + ε

(1)+ ) + · · ·

)(1 + ε

(n−2)+ )

+χn−1ψn−1(1 + ε(n−1)∗ )

)(1 + ε

(n−1)+ )

=

n−1∑i=0

χiψi(1 + ε(i)∗ )

n−1∏j=i

(1 + ε(j)+ )

where ε

(0)+ = 0 and |ε(0)∗ |, |ε(j)∗ |, |ε(j)+ | ≤ u for j = 1, . . . , n− 1

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 20 / 21

Page 29: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Backward Stability

Let f : D → R be a map from the domain D to the range R.Let f̂ : D → R represent the execution in floating point arithmetic of a givenalgorithm A that computes f .

A is said to be backward stable iffor all x ∈ D there exists a perturbed input x̄ ∈ D, close to x, such thatf̂(x) = f(x̄).

I.e., the result computed in floating point arithmetic (f̂(x)) equals the resultobtained when the mathematically exact function (f) is applied to slightlyperturbed data (x̄).

The difference between x̄ and x, is the perturbation to the original input x.

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 21 / 21

Page 30: Matthias Petschow and Paolo Bientinesi - umu.sehpac.cs.umu.se/teaching/lsc-13/lect2-fpa.pdf · 2021. 2. 6. · Introduction to oating point arithmetic Matthias Petschow and Paolo

Backward Stability

Let f : D → R be a map from the domain D to the range R.Let f̂ : D → R represent the execution in floating point arithmetic of a givenalgorithm A that computes f .

A is said to be backward stable iffor all x ∈ D there exists a perturbed input x̄ ∈ D, close to x, such thatf̂(x) = f(x̄).

I.e., the result computed in floating point arithmetic (f̂(x)) equals the resultobtained when the mathematically exact function (f) is applied to slightlyperturbed data (x̄).

The difference between x̄ and x, is the perturbation to the original input x.

Matthias Petschow (AICES, RWTH Aachen) Floating Point October 24th, 2013 21 / 21