Maximizing precision for limited bits
The ability to have a negative fraction lengths or a negative integer lengths is a very good thing. For a given number of bits, it allows a fixed-point type to give maximum precision for very big numbers and for very small numbers respectively.
To see this benefit, look at the amount of error in the examples provided in the question.
a = 5.6632765314184e+15;
aFi = fi(a,1,4);
b = 0.00037;
bFi = fi(b,0,3);
error_a = double(aFi) - a;
relative_abs_error_a = abs(error_a) / abs(a)
relative_abs_error_a = 0.0060
error_b = double(bFi) - b;
relative_abs_error_b = abs(error_b) / abs(b)
relative_abs_error_b = 0.0102
A relative error of under 1% and just over 1% is "pretty darn good" for just 4 bits and 3 bits of word length, respectively.
Fraction Length and Integer Length are Intuitive when "scaling is small".
When thinking of fixed-point numbers using binary-point notation, fraction length and integer length are intuitive
Type Real World Notation: Binary Point
numerictype(0,3,0) 5 = 101.
numerictype(0,3,1) 2.5 = 10.1
numerictype(0,3,2) 1.25 = 1.01
numerictype(0,3,3) 0.625 = .101
FALSE Belief that binary-point must be adjacent to the bits.
If you only look at examples of binary-point displays, it is natural to form a FALSE believe that the binary-point must be adjacent to the bits that make up the variables word-length. But that is not true.
Binary-point notation is too limiting. There is an extremely useful and well-known generalization that breaks the adjacent binary-point needless constraint.
Scientific Notation Removes Needless Limits
Decimal-point notation is intuitive, but limiting. As you know, power of 10 scientific notation allows you to represent very big and very small numbers with good accuracy using far fewer digits than decimal-point notation would require.
Likewise, binary-point notation can be generalized with power of 2 scientific-style notation. Same concept, same benefits, more accuracy with few symbols for very big and very small numbers.
A flavor of binary scientific notation that uses integer valued mantissas is shown here. Notice the integer mantissa's are multiplied by two raised to an exponent.
Type Real World Notation: Integer Mantissa in Binary
Value and Pow2 Exponent
numerictype(0,3,-2) 20 = 101 * 2^2
numerictype(0,3,-1) 10 = 101 * 2^1
numerictype(0,3,0) 5 = 101 * 2^0
numerictype(0,3,1) 2.5 = 101 * 2^-1
numerictype(0,3,2) 1.25 = 101 * 2^-2
numerictype(0,3,3) 0.625 = 101 * 2^-3
numerictype(0,3,4) 0.3125 = 101 * 2^-4
numerictype(0,3,5) 0.15625 = 101 * 2^-5
The attributes of these eight types are
3 3 3 3 3 3 3 3
-2 -1 0 1 2 3 4 5
5 4 3 2 1 0 -1 -2
Notice that the first two fraction lengths are negative and the last to integer lengths are negative.
But none of that is a problem. What really lives in memory of the microcontroller or FPGA or ASIC are the bits that make up the word length. The word length is the only thing that needs to be positive. Let's have some fun and call that the "Law of Bit Conservation".
If you embrace the binary scientific notation way of thinking about fixed-point types, then thinking about the types fixed exponent will become a more natural description.
2 1 0 -1 -2 -3 -4 -5
Conceptual padding bits
If you were really keen to see the binary-point display even if the integer length or fraction length was negative, then what could you do? Just like starting with decimal scientific notation number and converting it to a traditional decimal-point display format, you could as needed jamb in some padding slots on the left end or the right end. For example, for 3e16, you could jamb in 16 zeros after the 3 to get a decimal-point display. Same concept applys in taking binary scientific notation to binary-point display.
For example, if fraction length was negative, then you need to jamb in -1 * FractionLength bits on the least significant end. These pad bits would always be zero.
Likewise, if integer length was negative, then you need to jamb in -1 * IntegerLength bits on the most significant end. If the value is unsigned, then these pad bits are always zero. If the value is two's complement and non-negative, then these pad bits are all zero. If the value is two's complement and negative, then these pad bits are all one. A simplified way to describe all these cases is to say that the pad bits are just a sign extension. For unsigned types, the conceptual sign bit is always implicilty zero.
These slides have shown how to reconcile the interconnected concept for word length, fraction length, and integer length when one of the latter two is negative. The math is all consistent. The "Law of Bit Conservation" is satisfied in all the cases.
Negative fraction lengths are great for maximized accuracy of very large numbers.
Negative integer lengths are great for maximized accuracy of very small numbers.
Binary-point thinking is more intuitive, but for a limited range of scalings.
Scientific notation is more general and gives the power of maximum accuracy for a limited number of bits.