How can I avoid a small value ignored during calculation?
조회 수: 9 (최근 30일)
이전 댓글 표시
z is a small value, and wen a1 is added by z, it doesn't show any difference. Why is that? And how can I aviod this?
댓글 수: 0
채택된 답변
John D'Errico
2022년 11월 10일
Welcome to the wonderful, wacky world of floating point arithmetic. You need to understand that floating point numbers (doubles) are stored using an IEE standard, where only 52 binary bits are used to represent the mantissa. Effectively, that gives you around 16 dignificant digits when represented as a decimal. So if you add 1e-19 to the number 1, MATLAB rounds the result to the nearest value representable by a double. Effectively, as far as MATLAB is concerned,
1 == (1 + 1e-19)
ans =
In MATLAB, even if you do this:
format long
a1 =
all of those digits are in general not stored. Only the first 16 or so. Beyond that, you generally lose all of the extra digits. However, the number you provided is in fact, EXACTLY representable in 52 binary bits, since we can write it exactly as:
-sum(2.^[-5 -6 -7 -9 -10 -11 -15 -20 -21 -23 -28 -32 -33 -36 -37 -40 -42 -43 -45 -46 -47 -48 -49 -50 -53 -55 -57])
ans =
ans =
If you want to know the stride between two numbers, such that they are different by one bit in the least significant bit for a1, that is given by
ans =
댓글 수: 3
Walter Roberson
2022년 11월 11일
편집: Walter Roberson
2022년 11월 11일
MATLAB does not have a long data type.
The terminology of long is often associated with C and C++ .
In C long does not have fixed meaning. In C, a long integer has width at least as high as a regular integer, and possibly wider -- but for the purposes of the C standard, a long int could be as little as 16 bits.
In C++ a long int is at least 32 bits.
In C, double is at least 32 bits, but much more often is IEEE 754 double, 64 bits (that is, 52 bits mantissa). In C, long double is at least as big as double but could be wider.
in C++, double is "double precision floating-point type. Matches IEEE-754 binary64 format if supported." and long double is "extended precision floating-point type. Matches IEEE-754 binary128 format if supported, otherwise matches IEEE-754 binary64-extended format if supported, otherwise matches some non-IEEE-754 extended floating-point format as long as its precision is better than binary64 and range is at least as good as binary64, otherwise matches IEEE-754 binary64 format. "
MATLAB does not use the phrase long at all. It has single (IEEE 754 binary single precision, 32 bit word) and double (IEEE 754 binary double precision, 64 bit word) . For integers it has int8, int16, int32, int64 and their unsigned versions such as uint32.
MATLAB never calculates floating point at more than 64 bits -- not unless you are using Symbolic Toolbox (or are interfacing to some external class such as in Java or Python)
Note that the command
format long
has nothing to do with how calculations are done and only affects how results are displayed. The long does not have to do with any kind of extended precision mathematics: it just means more digits where short means fewer digits
Walter Roberson
2022년 11월 11일
If you have an input parameter and you are not certain whether the user passeed in a single or a double then just convert it yourself,
gamma_factor = double(gamma_factor) %for example
추가 답변 (1개)
Patrik Forssén
2022년 11월 20일
편집: Patrik Forssén
2022년 11월 20일
@John D'Errico explained why this happens. If you really need to avoid this, you must therefore use an arbitrary-precision numerical class for your calculations. MATLAB does not have one, but you can interface Java’s. Here is what your example would look like,
z1 = java.math.BigDecimal('1.111111e-19');
a1 = java.math.BigDecimal('-0.0581375401465599531136696498379023978486657142639160156250000000000000');
a2 = a1.add(z1);
댓글 수: 0
참고 항목
Help Center 및 File Exchange에서 Logical에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!