Data Representation Notes

Written by Evan Weaver,  David Ward. 

1. Binary Computers
2. Numbering Systems
3. Converting Between Bases
4. Non-Decimal Arithmetic
5. Organizing the Bits
6. Character Format
7. Collating Sequence
8. Using MS-DOS debug:
9. Unsigned Binary
 
10. Signed Binary
11. Decimal  Complement
12. Two's Complement
13. Zoned Decimal
14. Packed Decimal
15. Floating Point

ASCII Table
EBCDIC Table


1. Binary Computers in a Decimal World - [top]

As we examine the way data is represented on a computer,  we will see how different the world of computers is from the world of humans.  Native mode for humans is a base 10 positional numbering system.  For computers, the native mode is binary.  Everything stored in the computer is represented by combinations of the two binary digits, 1 and 0.   Whether it is text, numbers,  music, or even graphical images, it is stored in binary on the computers. 

A computer uses memory made up of matrices of high speed ON/OFF switches operating in binary.  This  allows calculations in binary which are fast,  efficient, and accurate.  A typical computer purchased today will have memory capacity of a billion or so binary digits.  Longer term storage requirements for binary data is usually placed on slower speed (relatively) magnetic media with a capacity in the range of 10 or more gigabytes or roughly 100 billion bits of information.

Fortunately for us, the computer information is usually presented to us in our familiar format of letters and decimal numbers, music or images.  To provide a good humanized interface, there is much effort going on in the background converting vast amounts of data between the various formats.  When we sit down to edit a document, we see the letters appear on the screen as we type.  We can select and move blocks of text, change fonts or point size, create a hyperlink, spell check or send the document to be printed. 

In this document, we will examine how data is stored on the computer.  We will begin by examining the binary and decimal number systems as well as octal and hexadecimal which put the raw binary data into a format which is more easily digested by humans.  We will look at the two major text formats, ASCII and EBCDIC codes, examine their roots and learn how to use them effectively.  Finally, we will look at the various formats for representing numeric data.

 



2. Numbering Systems [top]

Today, most operating systems, programming languages and hardware devices make the underlying assumption that sophisticated users are familiar with how information is stored in a computer. Without this knowledge, anything other than very basic use of a computer is impossible, as is understanding of many of the fundamental design concepts of digital computers.

We use the decimal numbering system, also known as base 10. The decimal system appears to have evolved naturally because we have ten digits on our hands.   In decimal, a numeric quantity is represented by a series of digits from 0 to 9. Decimal is also a positional numbering system, whereby the rightmost digit represents a number of units. The digit to the left of that represents the number of tens of units, the next number to the left is the number of tens of tens of units (=hundreds of units), the next is the number of tens of tens of tens of units (=thousands of units), and so on. The actual quantity expressed by the number is the sum of all these quantities.

Example: Decimal number 3450

              3  4  5  0  
             /   |  |   \ 
            /    |  |    Zero single items 
           /     |  Five groups of ten items 
          /      Four groups of ten groups of ten items 
         Three groups of ten groups of ten groups of ten items
Any number other than 10 can be used as the base in which numbers are represented (i.e. there is nothing special about humans usually having ten fingers).

For example, if we had been born with one finger on each hand (two fingers in total), we might count in binary (or base 2). Here each digit would range from 0 to 1, and each successive digit from right to left would represent the number of pairs of what the previous digit represented.
 

Example: binary 1101 is the same as decimal 13

                    Binary number
                    1  1  0  1 
                   /   |  |   \ 
                  /    |  |    One single item (= one item) 
                 /     |  Zero groups of two items (= none) 
                /      One group of two groups of two items (= four) 
                One group of two groups of two groups of two items (= eight)
                Eight plus four plus one is, in decimal, 13. 
 

3. Converting Between Bases - [top]

A simple way to convert from binary to decimal:
Write successive powers of two over each digit from right to left, and add up those numbers under which a 1 appears.
 

Example: Convert binary 110100 to decimal

      32   16   8   4   2   1 
       1    1   0   1   0   0 
      32 + 16     + 4          = 52 in decimal
A simple way to convert from decimal to binary:
Divide the number by two. The remainder (which will be either 0 or 1) is the rightmost binary digit. Divide the quotient by two. This remainder will be the next binary digit to the left. Continue dividing the successive quotients by two and using the remainder as the next binary digit to the left, and stop when the quotient is finally zero.
 

Example: Convert decimal 52 to binary

52 / 2 = 26 remainder 0 ------------------|
26 / 2 = 13 remainder 0 ----------------| |
13 / 2 = 6  remainder 1 --------------| | |
 6 / 2 = 3  remainder 0 ------------| | | |
 3 / 2 = 1  remainder 1 ----------| | | | |
 1 / 2 = 0  remainder 1 --------| | | | | |
                                | | | | | |
    The binary equivalent is:   1 1 0 1 0 0
 

Exercises:

Note how the number base is written as a subscript to the number.

     1001012    = ?10

      000112    = ?10

        28710    = ?2

        10110    = ?2

Other bases are just as easy. For exaple, hexadecimal (base 16) uses digits from 0 to 15. Since we can't represent the quantities ten, eleven, ... , fifteen as a single digit with our normal digits, we have to make up some symbols to represent them. Typically, computer people use A for 10, B for 11, ... , and F for 15.
 

When using non-decimal numbering systems, the numeric base should always be mentioned if it is not entirely obvious from the context. Thus

    1012  =  510 ,    10110  =   6516 ,   10116  =  25710

will not get confused with each other. In the case of hexadecimal (often called simply hex for short), many other styles of identifying this base are used. 101x, 101h, x101 and $101 are all common notations used to reference the hexadecimal number 101. Any number base can be used. Because it is easier and more reliable to design electronic circuits that recognize only two states, "on" and "off", than to design circuits that recognize three or more different states, computers today are based on the binary system. Since any numeric quantity can be represented as a sequence of 0s and 1s, any numeric quantity can be represented as a sequence of "offs" and "ons".

Even though it is easy (relatively speaking!) to design binary computers, it is difficult for humans to deal with large volumes of binary numbers. Imagine a page full of 0s and 1s - picking out patterns, even if you knew what different sequences of 0s and 1s were supposed to mean, would be extremely laborious. For this reason, computer people usually convert binary data to some other number base before working with it.

It turns out that it is very simple to convert binary numbers to hexadecimal, and vice versa. Mathematically, because 16 (the number base for hex) is 2 (the number base for binary) raised to the 4th power, there is a direct correspondence between four binary digits and one hex digit.

Notice, for example, that the largest 4 digit binary number is 1111, while the largest single hexadecimal digit is F. Both of these, of course, represent the same thing: the decimal number 15.

To convert a hex number to binary, you could convert the hex number to decimal, then convert the decimal number to binary. The easy way, however, is to take each hex digit, and replace it with the equivalent 4 digit binary number. To go from binary straight to hexadecimal, group the binary number into groups of 4 digits starting from the RIGHTmost digit. (A few leading zeros may have to be added to the binary number to make this work out properly). Each group of 4 can then be replaced with the corresponding hex digit. The following table summarizes the equivalencies between a hex digit and 4 binary digits.
 

Equivalence between binary and hexadecimal

Decimal Hexadecimal Binary 

   0        0        0000 
   1        1        0001 
   2        2        0010 
   3        3        0011 
   4        4        0100 
   5        5        0101 
   6        6        0110 
   7        7        0111 
   8        8        1000 
   9        9        1001 
  10        A        1010 
  11        B        1011 
  12        C        1100 
  13        D        1101 
  14        E        1110 
  15        F        1111 
  16       10       10000
  17       11       10001
  18       12       10010

Example: Convert hexadecimal A40D to binary

      A     4     0     D
    1010  0100  0000  1101

    The binary equivalent is 1010010000001101
 

Example: Convert binary 11100000100001 to hex

    0011 1000 0010 0001
      3    8    2    1

    The hex equivalent is 3821

Note in the last example how the binary digits were grouped from the right to the left, making it necessary to pad with two leading zeros. Hexadecimal, used as a shorthand for binary, is particularly useful because on most machine a byte is eight bits (each "binary digit" is called a bit) long. Thus, one byte of data can be represented by exactly 2 hex digits, which is much more compact than 8 binary digits. (Four bits, which can be represented by exactly one hex digit, is often called a nibble). Some older devices (for example, calculators from a few years ago) do not have the letters A, B, ... F, but only have the digits 0 through 9. For this reason, octal (base eight) has occasionally been used as a shorthand for binary.

Eight is two to the third power, and so three binary digits are equivalent to one octal digit in the same way that four bits are equivalent to one hex digit. The advantage of octal is that it only uses the digits from 0 to 7, and so can be implemented on devices that don't support the alphabet. The main disadvantage to octal is that a byte (on most machines) works out to be exactly two and two thirds octal digits long (3 bits plus 3 bits plus 2 bits), which can be awkward at times. On the old DEC-20 (one of DEC's pre VAX minicomputers, which had 9 bit bytes), however, octal was more convenient than hex, since a byte was exactly 3 octal digits long (and an awkward two and a quarter hex digits).

Example: Convert octal 570 to binary

          5   7   0 
         101 111 000 

    The binary equivalent is 101111000

Example: Convert hex A0E to octal

     Simplest method: Convert to binary, then to octal

          A    0    E 
        1010 0000 1110 

(regrouping from the right...) 

        101 000 001 110 
         5   0   1   6 

     The octal equivalent is 5016

Example: Convert octal 177 to hex

         1   7   7 
        001 111 111 

      (regrouping...) 

        0000 0111 1111 
          0    7    F 

    The hex equivalent is 7F
Note that conversions between octal and hex COULD be done by converting to base 10 as the middle step. Since octal binary and hex to binary conversions are so simple, however, it is much easier to use binary as the middle step.
 

Exercise: Complete the following table

              Number Base 

        2         8         10          16 
     -------   -------   -------      ------- 

      1101 
                  6 
                 71 
                           99 
                                        ABC


 
4. Non-Decimal Arithmetic - [top]

Often a programmer has binary or hexadecimal data and needs to do simple arithmetic with it, such as addition or subtraction. One technique is to convert the data to decimal, do the arithmetic, then convert back to the base in question. This will work. But addition, subtraction and multiplication are quite easy to do even in non-decimal bases, so it is usually simpler to do the arithmetic directly. Essentially all you have to remember is that when you carry (or borrow) a 1, you are carrying (or borrowing) 2, 8 or 16 (depending on what base you're working with) and not 10.

Example: Hexadecimal addition; A0F + F5

                              1                    11 
       A0F                   A0F                   A0F 
        F5      --->          F5       --->         F5 
       ---                   ---                   --- 
         4 carry 1            04 carry 1           B04 

Note that F+5 is 15+5 = 20 (decimal) which is 14 hex. 

Similarly, F+1 is 15+1 = 16 (decimal) which is 10 hex. 

Finally A+1 is 10+1 = 11 (decimal) which is B in hex.
Actually,  the arithmetic is really being done in decimal, since the decimal addition tables are the only ones that people normally have memorized. It's just that the conversion to decimal and back > again is being done one digit at a time, so it is easier to perform than converting larger numbers.

Example: Hexadecimal subtraction; 207 - B6

                      1                       1   
  2 0 7 borrow 1     1 0 7                   1 0 7 
  - B 6       --->   - B 6          --->     - B 6 
  -----              -----                   ----- 
      1                5 1                   1 5 1
Here 7-6 = 1 (same in decimal and hex). 0-B is not possible, so 1 is borrowed from the next column, making 10-B which is 16-11 = 5 (decimal). 1 was borrowed, leaving 1 in the leftmost column.

Example: Binary subtraction; 100011 - 110

                                  1 
      1 0 0 0 1 1 borrow 1       0 0 0 0 1 1 borrow 1 
          - 1 1 0        --->        - 1 1 0        ---> 
      -----------                ----------- 
              0 1                        0 1 

                                  1 1 
      0 1 0 0 1 1 borrow 1       0 1 1 0 1 1 
          - 1 1 0        --->        - 1 1 0 
      -----------                ----------- 
              0 1                  1 1 1 0 1
If all this borrowing confuses you, try to do a decimal subtraction such as 20005 - 99, just to remind yourself how borrowing actually works.

Exercise: Non decimal arithmetic

     Hexadecimal 

        A001              2B03           
       + FFF             -1C00           
        ----              ----            

     Octal
        724               3301          
       +555               -277          
       ----               ----           

     Binary
    1100010            1010001          
     + 1111           - 111000         
    -------            -------        

Exercise:

You have been told that a program is located starting at byte number A3F00h in the computer's memory. You are also told that there is a programming error in the program at byte number A4010h in memory. You must calculate how many bytes the error is from the start of the program.


5. Organizing the Bits - [top]

Everything stored on the computer is stored as a  series of 1s and 0s. They are known as binary digits or bits and are the smallest storage element on the computer.  By grouping the bits together, they can represent numbers, letters, images, movie pictures, sound, cpu commands or even a virtual world.  In this section we will look at how the bits are combined and organized to form larger entities. 

A single binary digit can have the value of 1 or 0.  These are often assigned these values 1 or 0 by assigning the values true or falseyes or no, or on or off.  Bits may be combined together as bytes which are eight bits in length.  Bytes may be combined together into a word, which is the normal number of bits that a computer works with at one time.  The word size is dependent upon the architecture of the machine.  Early personal computers before the IBM PC, usually had a word size of 1 byte or 8 bits.  The CPU (central processing unit) on these processed data eight bits at a time.  The processor used in the IBM pc was an Intel 8088 and it handled data in the processor 16 bits at a time.  The effectively doubled the processing power without increasing the speed of the machine.  The word size increased to 32 bits with the 80386 processor.  Currently, there are several chips in development for microcomputers with a wordsize of 64 bits.  When we apply the terms word, half-word, or double word, these usually apply relative to the wordsize of the computer being described. 

From a logical perpesctive, data can be organized into fields, records, files, or databases in a variety of formats.  In this course, we will focus on ASCII and EBCDIC character codes, some extended characters,  several common formats for numeric representation, and the address formats including a discussion on the  address size and the resultant limits on disk and memory capacities. 


 

6. Data Format: Character - [top]

Since computers are only designed to interpret binary data, textual information must somehow be converted to binary. The method used is simplistic: a number is associated with each possible character. Two codes are in wide use: EBCDIC (pronounced EB-sih-dik) and ASCII (pronounced ASS-key). EBCDIC (Extended Binary Coded Decimal Interchange Code) is used on IBM mini and mainframe computers, as well as mainframes from some other manufacturers. It associates an 8 bit (one byte) number with each character, and is descended from the hole patterns used to represent characters on early punched card systems (IBM has been around a long time!).

ASCII (American Standard Code for Information Interchange) is used on most other computers, and was developed by a committee with representatives from a variety of computer manufacturers. By definition, ASCII associates a 7 bit number with each character (allowing only 128 different characters instead of EBCDIC's 256), although most computer manufacturers that use ASCII provide an 8 bit version of the code which uses the additional 128 characters to provide a richer set of characters than 7 bit ASCII allows. For example, on the IBM PC - an ASCII computer - characters 128 through 255 are used to represent mostly "graphics" characters. These characters, supported by the video hardware of the PC, allow the drawing of boxes and other simple shapes on the screen in text mode.

Both of these codes include "printable" characters (such as the alphabet, digits and punctuation marks typically found on a typewriter) and "non printable" characters (such as the FF character which causes a printer to go to the top of a new page and the BEL character which causes a terminal to beep, as well as other characters used for control of devices other than display devices).

A table showing both the EBCDIC and ASCII codes is included at the end of this document.  The standard character codes in use today are heavily biased in favor of English language use, not easily supporting accents or non Roman alphabets. 
The unicode system  has been developed to be a universal character code, including symbols so that all standard human languages (including Arabic and Oriental languages) can be represented using a single coding system. Most of these new codes utilize a 16 bit, rather than an 8 bit, unit for each character, thus allowing a "vocabulary" of 65536 different characters. 



7. Collating Sequences - [top] 

The collating sequence is the order in which data is sorted on a computer.  The sorting of characters can be done most efficiently done in binary but because of the layout of the ASCII and EBCDIC codes, the binary sorting would yeild different results depending on which of these codes you were using. 

In ASCII, the numbers come first, then the uppercase letters, and finally the lower case letters.  In EBCDIC, the lower case letters come first, then the uppercase letters, and last comes the numbers. 

In order to accomodate conversion from one hardware platform to another, computer languages like COBOL would allow programmers to select which collating sequence that their programming language would follow. 




8. Using DOS Debug  - [top]

Debug is a dump program, so named because it will dump the  raw data from a file or memory.  It is available from Win98, Win98 or WinNT.  The debug program is an excellent tool to use to identify what is actually stored in a file.

Debug is to be run in an MS-DOS window.  Type "debug filename" and debug  will load in the file and give the debug prompt. At the "-" (hyphen) prompt, enter "d" to to display the file.  This starts the display at the beginning of the text. 

The display is made up of 3 sections.  On the left side of the display we have the data  relative to the beginning of the file. The centre portion of the display has the  hexadecimal representaton of the characters.   Each hex character represents 4 bits or half a  byte. There are 16 bytes shown per line. 
The right side of the display shows the  characters in printed form if possible.   ASCII characters which are not  printable, are represented as dots (.). 

To exit debug, type q and the enter key at the "-" prompt. 



9. Unsigned Binary - [top]

Numeric information which will always be a non negative integer (i.e. a whole number greater than or equal to zero) is often stored in a computer in binary. Usually, a fixed size (such as one byte for numbers up to 255, two bytes for numbers up to 65535 or four bytes for numbers up to 4294967295) is used to hold such a number. (The size is usually related to the word size of the computer). This format for data is called unsigned binary because the data is stored as a binary number, with no way to express a negative quantity. The size, unless obvious from the context, should be specified as well, as in the phrase "a 32 bit unsigned binary number".

 

10. Data Format: Signed Binary - [top]

Numeric information, which will always be an integer, but may be positive or negative, is often stored as a binary number with a leading zero for a positive number, and the two's complement of a binary number for a negative number.

This format is called signed binary. In this case, as with unsigned binary, a fixed size is usually used. A one byte signed binary number would therefore range from -128 to +127, while a two byte signed binary ranges from -32768 to +32767, and a four byte signed binary, from -2147483648 to +2147483647. Remember that the first bit of a negative number will always be a 1 while that of a positive will always be a 0.



11. Decimal Complement Arithmetic - [top]

Complement arithmetic offers a simple and elegant method for performing addition and subtraction on a computer. Signed integer values are usually stored on the computer in complement form and can be used in this form directly in arithmetic expressions.  This introduction begins by examining how we usually perform simple addition and subtraction by hand. You will see why our manual methods are not easily implemented on a computer and that complement arithmetic is the best choice for  use on computers. 

When performing additions and subtractions by hand, our usual method works well for us but is very awkward to implement on a computer.  To begin, we examine the signs of the numbers: if they are the same, we add and the sign of the result will be the same as the numbers; if they are different, we subtract the smaller number (absolute value) from the larger number and apply the sign of the larger number to the result. 

And now, complements to the rescue. The algorithm to implement adding and subtracting is greatly simplified with complements. If a number is negative or the operation is a subtraction, calculate the complement of the number and then add. On the hardware side, all the cpu needs is an adder and a complementer to carry out additions and subtractions. 

Signed integer values are usually stored on the computer in 2s complement form. Before we begin examining 2s complement, we will first demonstrate using complement arithmetic with the more familiar decimal numbers.

Here is the process for adding and subtracting using complement arithmetic. These instructions are given in general terms so they will apply to complement arithmetic in any base. We will only be covering Radix complement arithmetic which includes 10s complement in decimal and 2s complement in binary.

Example 1: Adding and Subtracting in Decimal

  • Pad out the high order digits with zeros (usually at least 2 digits higher than the largest number). E.g., in the following operation   152 - 76 we would write the numbers as
     00152
    -00076

    The sign is stored in the high order digit. 0 for positive numbers, Radix-1 for negative numbers. In the case of decimal numbers, a negative sign is a 9 in the high order digit. Any other numbers in the sign field indicate an overflow.
     

  • If the number is negative (or is to be subtracted) convert the number to the complement format.

  •  
    • Subtract each digit from Radix-1 (this is a 9 for decimal numbers).

    • -00076 --> 99923
       

    • Add 1 to the low order digit.

    • 99923 + 1 = 99924
       

  • Now add the numbers

  •      00152
        99924
        00076

    If there is a carry out of the sign digit, ignore it. 

    If the result is a negative number (9 in the sign digit), then you must complement this number to get the result.

Example 2: Adding a column of Negative and Positive numbers.
357
-589
2345
-33
14
-3210
000357
-000589
002345
-000033
000014
-003210
 
000357
999411
002345
999967
000014
  996790
2 998884
= -001116
Pad out the fields to accommodate a larger result and the sign  digit.
Write negative numbers in complement form.
Add and ignore the 
 carry.
The result has a 9 in the sign digit so complement to find the magnitude of the result.
An overflow: Can be detected when the amount carried into the sign digit is not the same as the amount carried out of the sign digit.

12. Two's Complement - [top]

It should be clear by this point that binary addition and multiplication are somewhat simpler processes than binary subtraction. (If you think about it, you might even realize that binary multiplication is really just a series of additions). It happens that, through a mathematical construct called the two's complement, subtraction can be made into one simple operation (the two's complement) followed by an addition. Such simplifications are sought after by computer circuit designers who prefer to design as simple and as few different circuits as possible to increase the reliability and decrease the size of their designs. First, let us define what a two's complement is.

Let us suppose that a fixed size for a number has been determined in the design of the computer. This size generally determines how many wires run throughout the computer. For example, an addition circuit might be built to handle 16 bit numbers, so it would have two sets of 16 wires going in, and one set of 16 wires coming out. For our current purposes, let us assume that this size (often called the word size of the computer) is 16 bits, though many computers have larger word sizes, such as 32 bits or 64 bits.

To form the two's complement of a number, first write it in binary with leading zeros to pad it out to the full size of a word. Then, "flip all the bits" (i.e. make each 0 into a 1, and each 1 into a 0). Finally, add one to the "flipped" result, and you have the two's complement of the original number.

Example: Two's Complement of decimal 48

    Step 1: Write all 16 bits: 0000000000110000 

    Step 2: Flip all the bits: 1111111111001111 

    Step 3: Add 1:                          + 1 

                               ---------------- 

    Two's Complement of 48 is: 1111111111010000
One neat property of the two's complement is that if you take the two's complement of a two's complement, you get the original number back again.

Example: Two's Complement of 1111111111010000

     1111111111010000   --(flip bits)-->     0000000000101111
                                                          + 1
                                             ----------------
                                             0000000000110000
                                      (which is 48 in decimal)

So what is the use of this? Well, it is possible to prove mathematically that adding the two's complement is the same as subtracting the original number.
 

Example: 26 - 48 = -22


    26 in binary: 0000000000011010
    2's complement of 48: +1111111111010000
                           ----------------
                           1111111111101010

    2's complement of 1111111111101010 is:

             0000000000010101
                          + 1
             ----------------
             0000000000010110   which is binary for 22!

The net result is that if the two's complement form is used to represent a negative number, then a subtraction circuit in the computer is unnecessary, since addition of the two's complement can be used in its place.

If this scheme is used (and it is on almost all modern computers), there are a couple of restrictions. First, the largest positive number that can be used is a 0 followed by all 1s. (In our 16 bit example, the largest positive number is 0111111111111111 or 32767 in decimal). This is so that we can distinguish a two's complement from a positive (a two's complement will always begin with a one). (Note also that the "largest" negative number is 1000000000000000 or -32768 in decimal). The second restriction is that some additions will cause an overflow condition, creating a number that exceeds the limits. For example, adding 0100000000000000 (decimal 16384) to itself should yield a large positive number (32768), but in fact yields 1000000000000000 (decimal -32768). (It is not difficult to design the addition circuit to trap such overflow errors).

Exercises: Two's Complement

    In the following, assume a 16 bit word.

a. Find the two's complement of:
       11001 

       111111111111111 

       1111111111111111
b. The following hex value represents the two's complement of a number. Find the decimal value of the number.
       FE02

       (Hint: first convert the hex to binary) 

13. Data Format: Zoned Decimal - [top]

Zoned decimal is a data format often used in business applications. A number is stored in a decimal (base 10) format rather than a binary format. Essentially, the decimal representation of the number is stored as character data. In a system using the EBCDIC code, for example, the number 385 stored in a 6 byte zoned decimal field would be stored as the character string "000385". The actual data as stored shown in hex would be F0F0F0F3F8F5 (the EBCDIC for the digit "0" is F0, and so on). Note that the same number stored as a 4 byte binary would have the hex representation 00000181.

With signed binary numbers, the first bit is used to indicate the sign, and two's complement form is used for negative numbers. Zoned decimal format uses an even stranger method to specify a signed quantity. The reason for the strangeness is due to the way the Hollerith code (the punched card code from which EBCDIC is descended) works. In the Hollerith code, if a number punched on the card is signed, the last digit has an extra hole punched on the card to indicate either a + or a - sign. When the Hollerith code was made into the EBCDIC code, it turned out that the area in which the sign hole was punched was made to be the first half of the byte, called the zone portion. The area of the card in which the number itself was punched was made into the second half of the byte, called the digit portion. For example, the digit "5" in EBCDIC is represented by the hex number F5. The F (the first half of the byte) is the zone portion of that byte, and the 5 is the digit portion. While no punch in the zone portion of the card is represented in EBCDIC with the hex number F, the punch representing a + sign is represented with a hex C and the punch for the - sign is represented with a hex D. Thus, if a zoned decimal number is positive, the first half of the last byte is a hex C, and if it is negative, the first half is a D.

Example: Zoned Decimal Numbers (EBCDIC)
(Assuming a 4 byte zoned decimal number)

--------viewed as--------
    Quantity         Hex            Character 
    --------      ----------       ----------- 

      385          F0F3F8F5           0385 
     +385          F0F3F8C5           038E 
     -385          F0F3F8D5           038N
The character representation is shown in the table above because most programmers that enter data directly into zoned decimal format (for creating sample data test usually) would use a text editor (like the VAX EDIT program) to create the test data rather than a hexadecimal editor. The data would be entered as if it were character data, even though it isn't quite because of the strange way of specifying the sign. The following table shows the character representing all possible last bytes of a zoned decimal number.

Example: Last Digit of Zoned Decimal Numbers

(Assuming the EBCDIC code is used)
   Last     -----Character (Hex Code)----- 
   Digit     Unsigned     Positive    Negative 
   -----     --------     --------    -------- 
     0       "0" (F0)     "{" (C0)    "}" (D0) 
     1       "1" (F1)     "A" (C1)    "J" (D1) 
     2       "2" (F2)     "B" (C2)    "K" (D2) 
     3       "3" (F3)     "C" (C3)    "L" (D3) 
     4       "4" (F4)     "D" (C4)    "M" (D4) 
     5       "5" (F5)     "E" (C5)    "N" (D5) 
     6       "6" (F6)     "F" (C6)    "O" (D6) 
     7       "7" (F7)     "G" (C7)    "P" (D7) 
     8       "8" (F8)     "H" (C8)    "Q" (D8) 
     9       "9" (F9)     "I" (C9)    "R" (D9)
Unfortunately, the zoned decimal format does not translate very well to the ASCII code, since ASCII uses different codes to represent the digits and letters. What happens is that on most ASCII machines, the same characters are used, even though the hex codes for them are different. The following show the differences from an EBCDIC machine.
Example: Zoned Decimal Numbers (ASCII)
(Assuming a 4 byte zoned decimal number)
--------viewed as-------- 

    Quantity        Hex           Character 
    --------    ----------       ----------- 
      385        30333835           0385 
     +385        30333845           038E 
     -385        3033384E           038N
Example: Last Digit of Zoned Decimal Numbers
(Assuming the ASCII code is used)
Last -----Character (Hex Code)----- 

Digit     Unsigned     Positive     Negative 
-----     --------     --------     --------

  0       "0" (30)     "{" (7B)     "}" (7D) 
  1       "1" (31)     "A" (41)     "J" (4A) 
  2       "2" (32)     "B" (42)     "K" (4B) 
  3       "3" (33)     "C" (43)     "L" (4C) 
  4       "4" (34)     "D" (44)     "M" (4D) 
  5       "5" (35)     "E" (45)     "N" (4E) 
  6       "6" (36)     "F" (46)     "O" (4F) 
  7       "7" (37)     "G" (47)     "P" (50) 
  8       "8" (38)     "H" (48)     "Q" (51) 
  9       "9" (39)     "I" (49)     "R" (52)
Note that the hex code for the last digit has lost all its significance on an ASCII computer; it is that way so that zoned decimal data can be passed from ASCII machines to EBCDIC machine in the same way that text data is passed: by converting every ASCII code to the corresponding EBCDIC code for the same character.

(IBM's RS/6000, which is IBM's first large scale ASCII machine, uses a different approach from other ASCII based machines. IBM uses the normal ASCII codes for the last digits of both unsigned and positive values ("0" [hex 30] to "9" [hex 39]) and uses the characters "p" [hex 70] through "y" [hex 79] for the last digit of negative values).

Realize that zoned decimal format is less compact than binary format. A four byte signed binary number can store values from -2147483648 to +2147483647, whereas a four byte zoned decimal number can only hold values from -9999 to +9999. Zoned decimal format tends to be used in languages such as COBOL, which have a long history dating back to the days when programmers had to enter the data in machine readable form (e.g. using punched cards).


Data Format: Packed Decimal - [top]

Packed decimal format is a compromise between the compactness of binary format and the decimal orientation of zoned decimal. In packed decimal, rather than using an entire byte to store each decimal digit, only half a byte is used. The very last half byte of the number is used to store the sign of the number, using the same hex codes that EBCDIC zoned decimal does: F for an unsigned number, C for a positive number and D for a negative.

Example: Packed Decimal Numbers

(Assuming a 4 byte packed decimal number)
    Quantity      Stored As (Hex) 
    --------      --------------- 

      385            0000385F 
     +385            0000385C 

     -385            0000385D
Packed decimal format is used because any machines (such as the IBM/370) have special hardware instructions for doing arithmetic directly on packed decimal numbers, rather than internally converting the decimal quantities to binary, performing calculations, and converting the numbers back to decimal. These special instructions, while less efficient than normal binary arithmetic instructions, can usually be used to calculate with larger numbers than the word size of the computer might allow for binary calculations. For example, the largest binary number on the IBM/370 is 8 bytes big, which allows for numbers approximately 18 to 19 decimal digits long. The largest packed decimal number on this machine is 16 bytes big, allowing numbers 31 decimal digits long.

Exercise:

Explain why a 16 byte packed decimal number may be up to 31 decimal digits long.

15. Data Format: Floating Point - [top]

All of the numeric data types so far assume that only integers (whole numbers) are to be stored. If fractional amounts are required, one scheme (called fixed point) is to decide how many decimal places (or "binary places") of accuracy is desired. All numbers are then calculated to that degree of accuracy, and stored, without the decimal point (or binary point), as a whole number. Only when data is output on reports or screens is the decimal point inserted.

As an example, suppose monetary values are being stored. Then two decimal places of accuracy are what is required (for dollars and cents). A six byte packed decimal data format might be used, allowing values from -999,999,999.99 to +999,999,999.99. The decimal points would only appear on output, and not be stored as part of the data.

In some areas, such as engineering, statistics or science, an approach like fixed point is not acceptable, because often values are so large (100 digits, say) or so small (0.00000000000001 for example) that it is impractical to attempt to store numbers to the accuracy required. In these areas, scientific notation is generally used to represent numbers in everyday use. In scientific notation, 16000000 is > written as 1.6 x 10^7 (here ^ is used to represent exponentiation, or "to the power"), and .0000000034 is written as 3.4 x 10^-9. In this way, both very large and very small numbers can be written in a fairly compact manner.

In scientific notation, a number like 1.6 x 10^7 has three parts. The 1.6 is called the mantissa, the 10 is the base and the 7 is called the exponent. Note that both the mantissa and the exponent can be signed. A negative mantissa means that the numeric quantity is negative, whereas a negative exponent means the number is less than 1 in magnitude.

On computers, a variation on this scientific notation is called floating point format. In floating point, a numeric quantity is broken down into a mantissa, a base (usually either 2 or 16, not 10) and an exponent. Since all floating point numbers on the same computer would use the same base, only the sign, the mantissa and the exponent need to be stored.

There are almost as many different floating point formats as there are types of computer, each being a minor variation on the same theme. Commonly, either 32, 64 or 80 bits are used to store a number. One of these bits is for the sign (0 means positive, 1 means negative). Some of these bits are devoted to the mantissa and the rest to the exponent. A typical 32 bit floating point format might have 25 bits devoted to the mantissa and 7 for the exponent. If the base used were 2, this would allow numbers from 2 to the -128th power to 2 to the 127th power (approximately 10 to the -38th to 10 to the 38th).

Since floating point formats vary considerably in exactly which bits are used for what part of the number, it is important only to remember the general structure of floating point. The bit-by-bit details (whether the exponent or the mantissa comes first, for example) can be looked up in the appropriate manual, should you need to investigate floating point format further.


ASCII Table - [top]

      Second Hex Digit 
        0|  1|  2|  3|  4|  5|  6|  7|  8|  9|  A|  B|  C|  D|  E|  F| 
     |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
 F 0 |NUL|SOH|STX|ETX|EOT|ENQ|ACK|BEL| BS| HT| LF| VT| FF| CR| SO| SI| 
 i   |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
 r 1 |DLE|DC1|DC2|DC3|DC4|NAK|SYN|ETB|CAN| EM|SUB|ESC| FS| GS| RS| US| 
 s   |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
 t 2 | SP|  !|  "|  #|  $|  %|  &|  '|  (|  )|  *|  +|  ,|  -|  .|  /| 
     |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
 H 3 |  0|  1|  2|  3|  4|  5|  6|  7|  8|  9|  :|  ;|  <|  =|  >|  ?| 
 e   |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
 x 4 |  @|  A|  B|  C|  D|  E|  F|  G|  H|  I|  J|  K|  L|  M|  N|  O| 
     |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
 D 5 |  P|  Q|  R|  S|  T|  U|  V|  W|  X|  Y|  Z|  [|  \|  ]|  ^|  _| 
 i   |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
 g 6 |  `|  a|  b|  c|  d|  e|  f|  g|  h|  i|  j|  k|  l|  m|  n|  o| 
 I   |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
 t 7 |  p|  q|  r|  s|  t|  u|  v|  w|  x|  y|  z|  {|  ||  }|  ~|DEL| 
     |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

EBCDIC TABLE - [top]

         0|  1|  2|  3|  4|  5|  6|  7|  8|  9|  A|  B|  C|  D|  E|  F| 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
   0  |NUL|SOH|STX|ETX|SEL| HT|RNL|DEL| GE|SPS|RPT| VT| FF| CR| SO| SI| 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
   1  |DLE|DC1|DC2|DC3|RES| NL| BS|POC|CAN| EM|USB|CU1|IFS|IGS|IRS|ITB| 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
   2  | DS|SOS| FS|WUS|BYP| LF|ETB|ESC| SA|SFE| SM|CSP|MFA|ENQ|ACK|BEL| 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
   3  |   |   |SYN| IR| PP|TRN|NBS|EOT|SBS| IT|RFF|CU3|DC4|NAK|   |SUB| 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
   4  | SP|   |   |   |   |   |   |   |   |   |  c|  .|  <|  (|  +|  || 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
   5  |  &|   |   |   |   |   |   |   |   |   |  !|  $|  *|  )|  ;|   | 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
    6 |  -|  /|   |   |   |   |   |   |   |   |  ||  ,|  %|  _|  >|  ?| 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
    7 |   |   |   |   |   |   |   |   |   |  `|  :|  #|  @|  '|  =|  "| 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
    8 |   |  a|  b|  c|  d|  e|  f|  g|  h|  i|   |   |   |   |   |   | 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
    9 |   |  j|  k|  l|  m|  n|  o|  p|  q|  r|   |   |   |   |   |   | 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
    A |   |   |  s|  t|  u|  v|  w|  x|  y|  z|   |   |   |   |   |   | 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
    B |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   | 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
    C |  {|  A|  B|  C|  D|  E|  F|  G|  H|  I|SHY|   |   |   |   |   | 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
    D |  }|  J|  K|  L|  M|  N|  O|  P|  Q|  R|   |   |   |   |   |   | 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
    E |  \|NSP|  S|  T|  U|  V|  W|  X|  Y|  Z|   |   |   |   |   |   | 
      |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| 
    F |  0|  1|  2|  3|  4|  5|  6|  7|  8|  9|  ||   |   |   |   | EO| 
      |---------------------------------------------------------------|
 
Copyright © 2000, David Ward, Seneca College
david.ward@senecac.on.ca
Last revision, June 6, 2000