Visual Basic 2008 9.0 .NET Examples and Ebook
  Home

Introduction to Visual Basic

Arrays

Volgend Onderwerp

Floating Point Notation - Single Double Decimal

Vorig Onderwerp

Numeric Decimal Datatypes - Single Double Decimal

|

Numeric Literals and Type Coercion

Volgend Onderwerp
Floating Point Notation - Problems

Floating Point Notation - Problems

Floating Point Notation - Representation

Floating Point Notation - Representation



Floating Point Notation - Problems


Following example tries to calculate the units ( bills and coins ) of a certain Euro amount.


Module Example1
    Sub Main()
        Dim units As Single() = _
           {500, 200, 100, 50, 20, 10, 5, 2, 1, 0.5, 0.2, 0.1, 0.05, 0.02, 0.01}
        '
        Dim amount As Single = 0.06
        Console.Write(amount & " : ")
        '
        Dim index As Integer
        Do While amount > 0
            Do While amount - units(index) >= 0
                Console.Write(units(index) & " ")
                amount -= units(index)
            Loop
            index += 1
        Loop
        Console.WriteLine()
        '
        Console.ReadLine()
    End Sub
End Module
Download Broncode

An exception ( runtime error ) IndexOutOfRangeException occurs at index 15. When looking at the above example, you would expect that on index 14 amount would be 0, so index 15, which is indeed out of range, would never be reached.
When 'index' reaches, and 'units(index)' evaluates to 0.05, subtraction 0.06 - 0.05 happens, this doesn't lead to 0.01, but to 0.009999998.

Certain floating point operations can have strange results. These strange results can be unexpected when you don't know anything about floating point operations.

The strange result are usually the caused by the internal representation of the values. Some decimal values ( base 10 ) can never be exactly represented in these floating point datatypes. Often the values need to be approximated, so round off error can be produced when operations on these values occur.

For instance value 1/3 can in decimal scale ( base 10 ) never be exactly represented in its normal representation : 0.333...
Every 3 you add makes it more precise, but it will never be completely accurate.
1/10 for instance can never be completely represented in a binary scale ( base 2 ) : 0.00011001100110011... ( the 0011 part infinitely repeats ).

What ever the scale you use, there will always be values that are impossible to represent exactly and completely. Irrational numbers ( number which cannot be expressed as a fraction ) are particularly hard to represent, for instance some squareroots, the constant pi, the constant e, ... .

All rational numbers could exactly be represented if both the divisor and dividend are stored. But irrational numbers can not be stored this way.

No schema with finite capacity can ever represent all decimal values exactly. It is impossible to represent an infinite range of values in a finite amount of bits.

Most environments ( also .NET ) use floating point notation to represent decimal values. This is not a perfect system, but by implementing the IEEE 754 standard for floating point notations, .NET at least guarantees standardised techniques are be used to approximate values, and to perform operations on approximated values.

In the following example other strange results are produced.


Module Example2
    Sub Main()
        Console.WriteLine(2.0 Mod 0.2 = 0)
        Console.WriteLine(2.0 Mod 0.2)
        '
        Dim someSingle As Single = 4.99
        Console.WriteLine(someSingle * 17 = 84.83)
        Console.WriteLine(someSingle * 17)
        '
        someSingle = 1 / 107.0
        Console.WriteLine(someSingle * 107 = 1)
        Console.WriteLine(someSingle * 107)
        '
        Console.ReadLine()
    End Sub
End Module
Download Broncode

Output :

 False
 0,2
 False
 84,82999
 False
 0,9999999

Klik hier om terug naar boven te gaan.  Up


Floating Point Notation - Representation


IEEE 754 Single Precision ( like Single in .NET ) :

1 bit for the sign (s) + 8 bits for the exponent (e) + 23 bits for the mantissa (m) = 32 bits

Some remarks about the following notations :
- binary values are between square brackets ( for instance [0101] )
- symbol ~ is used for approximation

Binary format :


 seee eeee emmm mmmm mmmm mmmm mmmm mmmm

Different representations are used within floating point :

- normalised
- zero ( negative and positive zero )
- subnormal ( denormalised )
- infinity ( positive and negative infinity )
- not-a-number ( NaN )

Normalised Representation :

This representation is used for most values.

General formula :


 (-1)^[s] * [1.mmmm mmmm mmmm mmmm mmm] * 2^[eeee eeee]

Sign :


 [0]	(-1)^0 =  1
 or
 [1]	(-1)^1 = -1

Exponent :

The exponent is stored as an unsigned byte value. To be ably to represent small values ( with negative exponent ), an offset ( also called bias ) of - 127 is used.

Some possible representations :


 [0000 0000] = 0                -> reserved for other representations
 [0000 0001] = 1   - 127 = -126 -> minimum exponent
 ...
 [0111 1110] = 126 - 127 =   -1
 [0111 1111] = 127 - 127 =    0
 [1000 0000] = 128 - 127 =    1
 ...
 [1111 1110] = 254 - 127 =  127 -> maximum exponent
 [1111 1111] = 255              -> reserved for other representations

Exponents 0 en 255 are reserved for other representations, later more about these reserved values.

Mantissa :

Number 0,5 could be represented as 1 * 2^-1 or as 0.5 * 2^0 or as 0.25 * 2^1 or as 0.125 * 2^2 or as ... . By dividing the mantissa by 2, and adding 1 to the exponent, the same result is reached.
In other words, one value could have different representations, and room ( read : format representations ) for other values is lost.
To avoid this, and to maximize the range of possible values that can be represented, the normalized representation will maximize the significant, and minimize the exponent. This process is called normalization.

The significant is always preceded with [1.], so all 24 digits for the mantissa can be used to represent this mantissa.

Minimum value for the mantissa is :


 [1.0000 0000 0000 0000 0000 000] or 1

Maximum value for the mantissa is :


 [1.1111 1111 1111 1111 1111 111] or 1,999999940395355224609375 or 2^1 - 2^-24

Generally :


 1 <= mantissa < 2

The minimum normalized value uses mantissa 1 and exponent -126 :


 1 * 2^-126 or ~ 1,1754E-38 .

The maximum normalized value uses mantissa 1,999999940395355224609375 and exponent 127 :


 1,999999940395355224609375 * 2^127 or ~ 3.4028E+38

Some possible representations of normalized values :


 [0000 0000 1000 0000 0000 0000 0000 0000] or  ~ 1,1754E-38
 -> minimum positive value
 ...
 [0011 1111 0000 0000 0000 0000 0000 0000] or    0,5
 ...
 [0011 1111 0001 1001 1001 1001 1001 1010] or    0,6
 ...
 [0111 1111 0111 1111 1111 1111 1111 1111] or ~  3,4028E+38
 -> maximum positive value -> 'Single.MaxValue'
 ...
 [1000 0000 1000 0000 0000 0000 0000 0000] or ~ -1,1754E-38
 -> minimum negative value
 ...
 [1111 1111 0111 1111 1111 1111 1111 1111] or ~ -3,4028E+38
 -> maximum negative value -> 'Single.MinValue'

0,5 will be represented with sign 1 ( or [0] ), mantissa 1 or [1.0000 0000 0000 0000 0000 000] ) and exponent -1 ( or [0111 1110] ), or 1 * 1 * 2^-1.

0,6 will be represented with sign 1 ( of [0] ), mantissa 1,1935484 or [1.0011 0011 0011 0011 0011 010] ) and exponent -1 ( or [0111 1110] ), or 1 * 1,1935484 * 2^-1.

Representation of Zero :

How is zero represented? Neither the exponent, nor the mantissa can be zero, so the result ( multiplication of both ) can never be zero.

For zero two representations are reserved, one for positive zero and one for negative zero.

Both the significant and the exponent are 0 for the representation of zero.


 [0000 0000 0000 0000 0000 0000 0000 0000] -> +0
 [1000 0000 0000 0000 0000 0000 0000 0000] -> -0

Subnormal ( Denormalized ) Representation :

General formula :


 (-1)^[s] * [0.mmmm mmmm mmmm mmmm mmm] * 2^-126

These representations are used to represent very small values.

Exponent :

The exponent is always [0000 0000] or 0, this 0 has no meaning ( except for being part of this representation ). The value used for the exponent in this representation is always -126 ( equal to the minimum exponent in the normalized representation ).

Mantissa :

The mantissa is not normalized. A prefix [0.] is always presumed.

The minimum mantissa is :


 [0.0000 0000 0000 0000 0000 001] or 0,00000011920928955078125  or 2^-23

The maximum mantissa is :


 [0.1111 1111 1111 1111 1111 111] or 0,999999940395355224609375 or 2^0 - 2^-24

The minimum denormalized value is :


 0,00000011920928955078125  * 2^-126 or ~ 1,4012E-45

The maximum denormalized value is :


 0,999999940395355224609375 * 2^-126 or ~ 1,1754E-38

 [0000 0000 0000 0000 0000 0000 0000 0001] or ~  1,4012E-45
 -> minimum positive value -> 'Single.Epsilon'
 ...
 [0000 0000 0111 1111 1111 1111 1111 1111] or ~  1,1754E-38
 -> maximum positive value
 ...
 [1000 0000 0000 0000 0000 0000 0000 0001] or ~ -1,4012E-45
 -> minimum negative value
 ...
 [1000 0000 0111 1111 1111 1111 1111 1111] or ~ -1,1754E-38
 -> maximum negative value

Representation of Infinities :

Exponent :

The exponent is always [1111 1111] or 255, this 255 has no meaning ( except for being part of this representation ).

Mantissa :

The mantissa is always [000 0000 0000 0000 0000 0000] or 0, this 0 has no meaning ( except for being part of this representation ).

Sign :

The sign bit indicates positive or negative infinity.


 [0111 1111 1000 0000 0000 0000 0000 0000] -> 'Single.PositiveInfinity'
 [1111 1111 1000 0000 0000 0000 0000 0000] -> 'Single.NegativeInfinity'

Module Example3
    Public Sub Main()
        Console.WriteLine("seeeeeeeemmmmmmmmmmmmmmmmmmmmmmm")
        Console.WriteLine(GetBinary(1.17549435E-38F) & " : " & _
                          1.17549435E-38F.ToString())
        Console.WriteLine(GetBinary(0.5F) & " : " & 0.5F.ToString())
        Console.WriteLine(GetBinary(0.6F) & " : " & 0.6F.ToString())
        Console.WriteLine(GetBinary(Single.MaxValue) & " : " & _
                          Single.MaxValue.ToString())
        Console.WriteLine(GetBinary(-1.17549435E-38F) & " : " & _
                          -1.17549435E-38F.ToString())
        Console.WriteLine(GetBinary(Single.MinValue) & " : " & _
                          Single.MinValue.ToString())
        Console.WriteLine(GetBinary(0.0F) & " : " & 0.0F.ToString())
        Console.WriteLine(GetBinary(-0.0F) & " : " & -0.0F.ToString())
        Console.WriteLine(GetBinary(Single.Epsilon) & " : " & _
                          Single.Epsilon.ToString())
        Console.WriteLine(GetBinary(-1.401298E-45F) & " : " & _
                          -1.401298E-45F.ToString())
        Console.WriteLine(GetBinary(Single.PositiveInfinity) & " : " & _
                          Single.PositiveInfinity.ToString())
        Console.WriteLine(GetBinary(Single.NegativeInfinity) & " : " & _
                          Single.NegativeInfinity.ToString())
        '
        Console.ReadLine()
    End Sub
    Public Function GetBinary(ByVal value As Byte) As String
        For counter As Integer = 1 To 8
            GetBinary = (value Mod 2).ToString() & GetBinary
            value >>= 1
        Next
    End Function
    Public Function GetBinary(ByVal value As Single) As String
        If BitConverter.IsLittleEndian Then
            For Each byteElement As Byte In BitConverter.GetBytes(value)
                GetBinary = GetBinary(byteElement) & GetBinary
            Next
        Else
            Throw New ApplicationException("Only Little Endian supported.")
        End If
    End Function
End Module
Download Broncode

Output :

 seeeeeeeemmmmmmmmmmmmmmmmmmmmmmm
 00000000100000000000000000000000 : 1,175494E-38
 00111111000000000000000000000000 : 0,5
 00111111000110011001100110011010 : 0,6
 01111111011111111111111111111111 : 3,402823E+38
 10000000100000000000000000000000 : -1,175494E-38
 11111111011111111111111111111111 : -3,402823E+38
 00000000000000000000000000000000 : 0
 10000000000000000000000000000000 : 0
 00000000000000000000000000000001 : 1,401298E-45
 10000000000000000000000000000001 : -1,401298E-45
 01111111100000000000000000000000 : oneindig
 11111111100000000000000000000000 : -oneindig

Operations on Zero, NaN and Infinity :

Operation on special values ( zero, NaN and infinity ) will according to the IEEE 754 standard have specific results.

Every operation using a NaN operand, will result in a NaN.

Other operations are illustrated by following example :


Module Example4
    Sub Main()
        Dim singleOperands As Single() = {Single.PositiveInfinity, _
                                          Single.NegativeInfinity, _
                                          123.0F, -123.0F, 0.0F, -0.0F}
        Dim operatorSymbols As String() = {"*", "/", "+", "-"}
        '
        For Each operatorSymbol As String In operatorSymbols
            Console.WriteLine("OPERATOR " & operatorSymbol.ToString())
            Console.WriteLine()
            For Each singleOperand1 As Single In singleOperands
                For Each singleOperand2 As Single In singleOperands
                    PrintCalculation(singleOperand1, operatorSymbol, _
                                     singleOperand2)
                Next
                Console.WriteLine()
            Next
            Console.WriteLine()
        Next
        '
        Console.ReadLine()
    End Sub
    Sub PrintCalculation(ByVal operand1 As Single, _
                         ByVal operatorSymbol As String, _
                         ByVal operand2 As Single)
        Console.Write(GetString(operand1) & " " & operatorSymbol & " " & _
                      GetString(operand2) & " = ")
        Select Case operatorSymbol
            Case "*"
                Console.WriteLine(GetString(operand1 * operand2))
            Case "/"
                Console.WriteLine(GetString(operand1 / operand2))
            Case "+"
                Console.WriteLine(GetString(operand1 + operand2))
            Case "-"
                Console.WriteLine(GetString(operand1 - operand2))
        End Select
    End Sub
    Function GetString(ByVal value As Single) As String
        If IsPositiveZero(value) Then
            GetString = "+0"
        ElseIf IsNegativeZero(value) Then
            GetString = "-0"
        ElseIf Single.IsNegativeInfinity(value) Then
            GetString = "-Infinity"
        ElseIf Single.IsPositiveInfinity(value) Then
            GetString = "+Infinity"
        ElseIf Single.IsNaN(value) Then
            GetString = "NaN"
        Else
            GetString = value.ToString()
        End If
    End Function
    Public Function IsPositiveZero(ByVal value As Single) As Boolean
        If BitConverter.GetBytes(value)(0) = 0 AndAlso _
           BitConverter.GetBytes(value)(1) = 0 AndAlso _
           BitConverter.GetBytes(value)(2) = 0 AndAlso _
           BitConverter.GetBytes(value)(3) = 0 Then _
           IsPositiveZero = True
    End Function
    Public Function IsNegativeZero(ByVal value As Single) As Boolean
        If BitConverter.GetBytes(value)(0) = 0 AndAlso _
           BitConverter.GetBytes(value)(1) = 0 AndAlso _
           BitConverter.GetBytes(value)(2) = 0 AndAlso _
           BitConverter.GetBytes(value)(3) = 128 Then _
           IsNegativeZero = True
    End Function
End Module
Download Broncode

Output :

 OPERATOR *

 +Infinity * +Infinity = +Infinity
 +Infinity * -Infinity = -Infinity
 +Infinity * 123 = +Infinity
 +Infinity * -123 = -Infinity
 +Infinity * +0 = NaN
 +Infinity * -0 = NaN

 -Infinity * +Infinity = -Infinity
 -Infinity * -Infinity = +Infinity
 -Infinity * 123 = -Infinity
 -Infinity * -123 = +Infinity
 -Infinity * +0 = NaN
 -Infinity * -0 = NaN

 123 * +Infinity = +Infinity
 123 * -Infinity = -Infinity
 123 * 123 = 15129
 123 * -123 = -15129
 123 * +0 = +0
 123 * -0 = -0

 -123 * +Infinity = -Infinity
 -123 * -Infinity = +Infinity
 -123 * 123 = -15129
 -123 * -123 = 15129
 -123 * +0 = -0
 -123 * -0 = +0

 +0 * +Infinity = NaN
 +0 * -Infinity = NaN
 +0 * 123 = +0
 +0 * -123 = -0
 +0 * +0 = +0
 +0 * -0 = -0

 -0 * +Infinity = NaN
 -0 * -Infinity = NaN
 -0 * 123 = -0
 -0 * -123 = +0
 -0 * +0 = -0
 -0 * -0 = +0


 OPERATOR /

 +Infinity / +Infinity = NaN
 +Infinity / -Infinity = NaN
 +Infinity / 123 = +Infinity
 +Infinity / -123 = -Infinity
 +Infinity / +0 = +Infinity
 +Infinity / -0 = -Infinity
 -Infinity / +Infinity = NaN
 -Infinity / -Infinity = NaN
 -Infinity / 123 = -Infinity
 -Infinity / -123 = +Infinity
 -Infinity / +0 = -Infinity
 -Infinity / -0 = +Infinity

 123 / +Infinity = +0
 123 / -Infinity = -0
 123 / 123 = 1
 123 / -123 = -1
 123 / +0 = +Infinity
 123 / -0 = -Infinity

 -123 / +Infinity = -0
 -123 / -Infinity = +0
 -123 / 123 = -1
 -123 / -123 = 1
 -123 / +0 = -Infinity
 -123 / -0 = +Infinity

 +0 / +Infinity = +0
 +0 / -Infinity = -0
 +0 / 123 = +0
 +0 / -123 = -0
 +0 / +0 = NaN
 +0 / -0 = NaN

 -0 / +Infinity = -0
 -0 / -Infinity = +0
 -0 / 123 = -0
 -0 / -123 = +0
 -0 / +0 = NaN
 -0 / -0 = NaN


 OPERATOR +

 +Infinity + +Infinity = +Infinity
 +Infinity + -Infinity = NaN
 +Infinity + 123 = +Infinity
 +Infinity + -123 = +Infinity
 +Infinity + +0 = +Infinity
 +Infinity + -0 = +Infinity

 -Infinity + +Infinity = NaN
 -Infinity + -Infinity = -Infinity
 -Infinity + 123 = -Infinity
 -Infinity + -123 = -Infinity
 -Infinity + +0 = -Infinity
 -Infinity + -0 = -Infinity

 123 + +Infinity = +Infinity
 123 + -Infinity = -Infinity
 123 + 123 = 246
 123 + -123 = +0
 123 + +0 = 123
 123 + -0 = 123

 -123 + +Infinity = +Infinity
 -123 + -Infinity = -Infinity
 -123 + 123 = +0
 -123 + -123 = -246
 -123 + +0 = -123
 -123 + -0 = -123

 +0 + +Infinity = +Infinity
 +0 + -Infinity = -Infinity
 +0 + 123 = 123
 +0 + -123 = -123
 +0 + +0 = +0
 +0 + -0 = +0

 -0 + +Infinity = +Infinity
 -0 + -Infinity = -Infinity
 -0 + 123 = 123
 -0 + -123 = -123
 -0 + +0 = +0
 -0 + -0 = -0


 OPERATOR -

 +Infinity - +Infinity = NaN
 +Infinity - -Infinity = +Infinity
 +Infinity - 123 = +Infinity
 +Infinity - -123 = +Infinity
 +Infinity - +0 = +Infinity
 +Infinity - -0 = +Infinity

 -Infinity - +Infinity = -Infinity
 -Infinity - -Infinity = NaN
 -Infinity - 123 = -Infinity
 -Infinity - -123 = -Infinity
 -Infinity - +0 = -Infinity
 -Infinity - -0 = -Infinity

 123 - +Infinity = -Infinity
 123 - -Infinity = +Infinity
 123 - 123 = +0
 123 - -123 = 246
 123 - +0 = 123
 123 - -0 = 123

 -123 - +Infinity = -Infinity
 -123 - -Infinity = +Infinity
 -123 - 123 = -246
 -123 - -123 = +0
 -123 - +0 = -123
 -123 - -0 = -123

 +0 - +Infinity = -Infinity
 +0 - -Infinity = +Infinity
 +0 - 123 = -123
 +0 - -123 = 123
 +0 - +0 = +0
 +0 - -0 = +0

 -0 - +Infinity = -Infinity
 -0 - -Infinity = +Infinity
 -0 - 123 = -123
 -0 - -123 = 123
 -0 - +0 = -0
 -0 - -0 = +0

Updated On : 2008-10-25

Download Broncode

Published On : 2008-11-06

Floating Point Notation - Single Double Decimal

Vorig Onderwerp

Numeric Decimal Datatypes - Single Double Decimal

|

Numeric Literals and Type Coercion

Volgend Onderwerp

Introduction to Visual Basic

Arrays

Volgend Onderwerp
  Home  
Nederlands
Nederlands

Add to favorites (IE).


No printable version available.