Notably, during subtraction of values of similar magnitude a large number of valid digits can be cancelled out. On a machine using 6 BCD digits consider:
From a mathematical viewpoint an infinitely precise value X can be written as:31416.0 - 10000.0*Pi = 31416.0 - 31415.9 = 0.1This is nowhere near the true 6-digit-precision result of 0.0734641 because of the cancellation of the leading valid digits.
X = X_{m} + ewhere X_{m} is the limited-precision machine number and e is a small error value that corresponds to the precision of the calculator.
One (or more) of these numbers X are converted by the algorithm A into a result R:
R = A(X_{m} + e)and the precision of the result R strongly depends on whether A "amplifies" the inaccuracy introduced by e.
Finding an acceptable algorithm for a specific problem (ie. integration, solving differential equations, matrix operations etc.) is often a complicated issue. This is not our topic. However, it will be our topic to assess the precision of built-in algorithms of calculators.
Example: Decimal number 10.5. In binary this is 1100.1. Split into mantissa and exponent: 1.1001 E 3decAdvantage: The big advantage of binary floating point numbers is that most computers have powerful and fast built-in instructions to directly manipulate long integer numbers (usually 16, 32 or 64 bit). For example multiplication and division is often implemented in hardware units. Most computers therefore use binary coding. There are of course computer software packages that perform BCD arithmetic as explained below.
So the mantissa would be 11001 and the exponent +3dec.
Since the mantissa is required to always have a leading 1 it is often omitted to gain space for an additional bit of precision. In this case the leading 1 is implicitly assumed to sit to the left of the most significant digit of the mantissa.
Disadvantage: Whenever a number has to be displayed it must be converted into decimal format. Similarly, decimal input must be converted to binary before calculation can start. This is a quite time consuming process but of course it only occures once before and after a - possibly lengthy - calculation.
Example: Decimal number 10.5. The binary representation of the BCD digits is 0001 0000 0101. So the mantissa would be 000100000101 and the exponent +1dec.Advantage: No complicated conversion between textual and internal representation of a BCD number is needed. This is especially useful for calculators where after each operation the result has to be displayed. In fact, I don't know of any handheld calculator that doesn't use BCD arithmetic!
Disadvantage: BCD arithmetic is slow because it lacks the
dedicated hardware support. This usually doesn't matter because
handheld calculators don't have high-performance CPUs anyway. Rather,
their processors are optimized for low power consumption.
Another disadvantage of BCD numbers is their increased memory
requirement. A n-digit decimal number needs n*4 bits in BCD and
n*ln(10)/ln(2)=n*3.322 bits in binary mode. So the BCD representation
requires 20% more storage.
Consider the decimal number 0.1 which has an exact representation in BCD. However, in binary mode it must be expressed as a sum of powers of 2:It turns out that the binary digits sequence "1100" repeats infinitely. But since every binary representation uses a limited number of bits the value 0.1 cannot be expressed exactly. To distinguish between binary and BCD usage the idea is to "amplify" this inaccuracy so that it can be observed:0.1 = 2^{-4} + 2^{-5} + 2^{-8} + 2^{-9} + 2^{-12} + 2^{-13} + 2^{-16} + 2^{-17} + ... = 0.000110011001100110... (binary)
Calculate: R = (0.1 * 1024 - 102) * 10 - 4In theory as well as on a BCD machine this should give 0. However, on a binary machine 0.1 will have some error e that is amplified by a factor of 10240:
((0.1 + e)*1024 - 102)*10 - 4 = (102.4 + 1024*e - 102) * 10 - 4 = (0.4 + 1024*e)*10 - 4 = 4 + 10240*e - 4 = 10240*eAs a result, calculating R on a machine using binary representation will yield a non-zero value.
Side note 1: Although some limited-length BCD numbers have no exact (=no limited length) representation in binary mode the reverse is not true. All limited length binary numbers do have a limited length representation in BCD. The reason is that all powers of 2 have a limited length representation in BCD (ie. 0.5, 0.25, 0.125, 0.0625 etc.) but not all powers of 10 have a limited-length representation in binary mode.Side note 2: Try calculating R = (0.1+e)*10240 - 1024 = 1024 + 10240*e - 1024 = 10240*e.
Surprisingly, this will yield 0 even on a binary machine! The reason is that due to rounding (0.1+e)*10240 will in fact result in the exact value 1024.
There are three solutions to this dilemma:
But the ultimate goal is to calculate results that are always correct to the number of displayed digits.
The correct rounding scheme examines the digit n+1 following the least significant digit n (and thus needs at least one additional hidden digit). If digit n+1 is in the range 5..9 then digit n is incremented to minimize the difference. Otherwise digit n is left unchanged.
Examples of correct rounding:
Correct value | Expanded correct value | Rounded value, to 5 digits after decimal point |
1/3 | 0.333333... | 0.33333 |
2/3 | 0.666666... | 0.66667 |
1/18 | 0.055555... | 0.05556 |
4/9 | 0.444444... | 0.44444 |
1/11 | 0.090909... | 0.09091 |
To determine whether your calculator uses correct rounding simply calculate 1/18 and examine the last digit: It should be 6.
Besides correct rounding there are of course other less desirable methods, ie. cutting off all digits beyond the least significant one.
100/18 40/9 5555555*5555555 40/9+50/9000 sqrt(2E6) exp(0.005) exp(0.999999) exp(1.000001) exp(100) ln(1E-6) ln(0.9995) ln(1.0005) pow10(0.005) pow10(0.99999) pow10(1.00001) pow10(80.1) log10(2E-8) log10(0.9995) log10(1.0005) log10(1000100) 2^40 2^1.443 1.000001^1E6 sin(0.01 rad) sin(1 rad) sin(1.5608 rad) sin(800 rad) tan(0.01 rad) tan(1 rad) tan(1.5708 rad) tan(800 rad) asin(0.01),rad asin(0.5),rad asin(0.999),rad asin(0.99999),rad atan(0.01),rad atan(0.9999),rad atan(1.0001),rad atan(1E4),rad sin(0.01 deg) sin(50 deg) sin(89.9 deg) sin(5000 deg) tan(0.01 deg) tan(50 deg) tan(89.99 deg) tan(5000 deg) asin(0.01 deg) asin(0.5 deg) asin(0.999 deg) asin(0.99999 deg) atan(0.01 deg) atan(0.9999 deg) atan(1.0001 deg) atan(1E4 deg) |
+5.555555555555555556E+0 +4.444444444444444444E+0 +3.086419135802500000E+13 +4.450000000000000000E+0 +1.414213562373095049E+3 +1.005012520859401063E+0 +2.718279110178575917E+0 +2.718284546742232836E+0 +2.688117141816135448E+43 -1.381551055796427410E+1 -5.001250416822979193E-4 +4.998750416510479141E-4 +1.011579454259898524E+0 +9.999769744141629304E+0 +1.000023026116026881E+1 +1.258925411794167210E+80 -7.698970004336018805E+0 -2.172015458642557997E-4 +2.170929722302082819E-4 +6.000043427276862670E+0 +1.099511627776000000E+12 +2.718856483813477575E+0 +2.718280469319376884E+0 +9.999833334166664683E-3 +8.414709848078965067E-1 +9.999500371413582332E-1 +8.939696481970214179E-1 +1.000033334666720637E-2 +1.557407724654902231E+0 -2.722418084073540959E+5 -1.994900160845839293E+0 +1.000016667416711313E-2 +5.235987755982988731E-1 +1.526071239626163188E+0 +1.566324187113108692E+0 +9.999666686665238206E-3 +7.853481608973649763E-1 +7.854481608975316429E-1 +1.570696326795229953E+0 +1.745329243133368033E-4 +7.660444431189780352E-1 +9.999984769132876988E-1 -6.427876096865393263E-1 +1.745329269716252907E-4 +1.191753592594209959E+0 +5.729577893130590236E+3 -8.390996311772800118E-1 +5.729673448571526491E-1 +3.000000000000000000E+1 +8.743744126687686209E+1 +8.974376527084057279E+1 +5.729386976834859268E-1 +4.499713506778012245E+1 +4.500286464574097998E+1 +8.999427042206779036E+1 |
Important Notes
R = asin( acos( atan( tan( cos( sin(9) ) ) ) ) )Of course the correct result is 9. At the various steps the intermediate results are:
X | |
sin(9) | 0.156 434 465 040 230 869 010... |
cos(x) | 0.999 996 272 742 885 024 117... |
tan(x) | 0.017 454 999 855 488 660 791... |
atan(x) | 0.999 996 272 742 885 024 117... |
acos(x) | 0.156 434 465 040 230 869 010... |
atan(x) | 9.000 000 000 000 000 000 000... |
Now consider a calculator that uses built-in algorithms that are
correct up to the 12th digit. And of course the 12-digit precision
result of one
step is taken as the input for the next step of the calculation:
X rounded to 12 digits | |
sin(9) | 0.156 434 465 040 |
cos(x) | 0.999 996 272 743 |
tan(x) | 0.174 549 998 555 E -1 |
atan(x) | 0.999 996 272 744 |
acos(x) | 0.156 434 441 642 |
atan(x) | 0.899 999 864 267 E 1 |
This is a perfect example of a badly chosen algorithm because it amplifies the inaccuracies of the 12-th digit to a considerable error. Compared to the table with precise results the first noticable deviation occurs when the arcus tangent is calculated. The resulting small error of 1.12E-12 (absolute) is then tremendously amplified by more than 10^{4} by the arcus cosine to 2.34E-8 (absolute).
By looking at the derivative of the arcus cosine near the value of 1 it is immediately clear why this amplification occurs:It must be strongly emphasised that the above result of 8.99999864267 is the correct result for a calculator using 12 digits of precision and perfect built-in algorithms for trigonometric functions! Naturally, similar arguments apply for calculators of different precision.
d/dx acos(x) = -1/sqrt(1-x²) and for x -> 1 the derivative reaches infinity.
If those hidden digits are in fact always correct then why not present them to the user? The display may not be able to show all the digits. But usually one can safely suspect that the hidden digits are sometimes correct and sometimes not.Together with the above mentioned confusion that arises from hidden digits in conjunction with manually entered vs. calculated values it is obvious that storing results with hidden digits is a not a good idea.
This "pulls out" the hidden digits. There may be none, one or more, depending on the calculator.