Floating point representation

KasunL · January 9, 2011

Hi!

Could anyone please explain me how to represent a negative binary number (like -1.11) as a 32-bit floating point number? Really appreciate an explanation with a little example.

Thanks for any help.

Edited January 9, 2011 by KasunL

TonyMcC · January 9, 2011

Hi!

Could anyone please explain me how to represent a negative binary number (like -1.11) as a 32-bit floating point number? Really appreciate an explanation with a little example.

Thanks for any help.

In general digital computers perform subtraction by a form of addition. The method uses a method called "two's complement". You can find a lot of information at http://en.wikipedia.org/wiki/Two's_complement

D H · January 9, 2011

The method uses a method called "two's complement".

No, it doesn't. The question was about floating point representation.

Before I talk about floating point representations, it is worth noting that two's complement is but one of several alternative schemes for representing integers (positive and negative) in a computer. Other techniques include one's complement, excess-n, and signed magnitude. Each of these is in a sense easier to understand than two's complement.

One's complement: Represent the number, sans the sign, in binary, and flip each bit. Voila! One's complement.

Signed magnitude: Reserve one bit as the sign bit, typically 0=positive, 1=negative. The remaining bits represent the magnitude of the number. One problem with this scheme: Negative zero and positive zero are different numbers as far as representation goes. They are of course the same number as far as mathematics is concerned.

Excess-n: To represent a number in excess-n notation, just add the offset n. For example, with n=1023, 0 becomes 1023, 1 becomes 1024, ..., and 1024 becomes 2047. For negative numbers, -1 becomes 1022, -2 becomes 1021, ..., and -1023 becomes 0. Note that this excess-1023 notation lets us represent numbers from -1023 to 1024 in eleven bits (0 to 2047).

Both signed magnitude and excess-n are employed in the most common representation of floating point numbers, the IEEE floating point standard. Before I delve into that, I'm going to review scientific notation first. 1.1 in scientific notation is 1.1×10⁰, 9,876.54321 is 9.87654321×10³. This representation is not unique. The number 9,876.54321 can be represented in scientific notation a number of ways: 9,876.54321×10⁰, 987.654321×10¹, etc. The representation with a single digit to the left of the decimal point, 9.87654321×10³, is the normalized representation. One last thing to note: Not all fractions can be represented so cleanly. Some fractions such 243/7 have a repeating decimal. In scientific notation, 243/7 is 3.4714285(714285)...×10³, where the (714285)... indicates that the 714285 keeps on repeating in the decimal representation.

There is nothing special about base 10. We could use base 2 instead of base 10. With this, one becomes 1.0×2⁰ in binary notation while 2₁₀=10₂ becomes 1.0×2¹ and 3₁₀=11₂ becomes 1.1×2¹. What about fractions? 2½ (base 10) is 1.01×2¹. Just as some fractions have a repeating decimal, some fractions have a repeating binary. 1/10 (base 10) is one such number. 1/10₁₀ in binary is 0.000110011(0011)... Thus 1.1₁₀ can also be represented as 1.000110011(0011)...×2⁰.

What about 3.1₁₀? This can be represented in binary as 101.000110011(0011)...×2⁰, 10.1000110011(0011)...×2¹, or 1.01000110011(0011)...×2². The same normalization concept used in base 10 scientific notation can also be used for base 2. In base 10, the leading digit was 1, 2, ..., or 9. In base 2, the leading digit of any non-zero number represented in normalized binary notation always 1. This means we don't really need to represent that leading digit, assuming the number is normalized. The leading 1 is implied by the fact that the number is in normalized form.

So, finally, how to represent floating point numbers on a computer, in steps:

Represent the number in normalized binary notation, ±1×1.(fractional part)×2^exponent. One of the bits in the computer representation will be used for the sign, a number of bits for the exponent, and the remaining bits for the fractional part.
The exponent is represented in excess-n notation. Excess-127 is used for single precision numbers, excess-1023 for double precision.
The fractional part is a string of bits, with the initial bit representing 1/2, the next 1/4, and so on. Single precision uses 23 bits, double precision, 52 bits.

So how to represent 1.1₁₀?

Step 1, 1.1₁₀=+1.0001100110011001100110011001100110011001100110011001(1001)...×2⁰.

Step 2, the exponent is zero, which becomes 0111111₂ in excess-127 notation and 0111111111₂ in excess-1023 notation.

Step 3, the fractional part is 0001100110011001100110011001100110011001100110011001(1001)... The first 23 bits are 00011001100110011001100 (followed by 1100...) while the first 52 bits are 0001100110011001100110011001100110011001100110011001 (followed by 1001...). Note that in both cases the first discarded bit is 1. Thus it would be better to represent the 23 bit fractional part as 00011001100110011001101 and the 52 bit fractional part as 0001100110011001100110011001100110011001100110011010.

Step 4, start to put it all together. Using 0 to represent positive numbers and 1 negative, the single precision representation of 1.1 is 0 0111111 00011001100110011001101 while the double precision representation is 0 0111111111 0001100110011001100110011001100110011001100110011010.

One final step, omitted, is to complete step 4. That varies with the endianness of a computer.

TonyMcC · January 9, 2011

I did say "in general". To be more precise I should have said "The most widely used system in modern computing". There is always more than one way to skin a rabbit! I thought I would leave the conversion of 1/2, 1/4, 1/8 ........into 1/10, 1/100, 1/1000.......... and the question of scientific notation until after the first part was sorted. I am pleased to see that aspect has been standardised since my time by IEEE 754. See "Two's complement" in the link.

http://en.wikipedia.org/wiki/Computer_numbering_formats#Two.27s_complement

As a lecturer at a College of Further Education the two's complement method of representing negative numbers was the only method looked at in detail because it was the most widely used.

I have been looking up some details from when I spent some time working as a computer hardware engineer (1978). The systems I worked on did in fact use one's compliment for negative numbers to the surprise of some software engineers. For floating point they used 36 bits for single precision and 72 bits for double precision and did use a 1 bit sign digit.

KasunL - Best if you ignore my post - and have a good look at D H's, if you can digest it in one go!

Edited January 10, 2011 by TonyMcC

Sign In

Floating point representation

Recommended Posts

KasunL

TonyMcC

D H

TonyMcC

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information