Module C - Types

Primitive Types

"A type system defines how a programming language classifies values and expressions into types, how it can manipulate those types and how they interact" Wikipedia (2007).

Integral Types | Floating-Point Types | Storage Class Specifiers | Practice
Pointer Types | Void Types | Type Qualifiers | Practice | Exercise



C and C++ use type systems to classify information that is stored in computer memory.  The memory consists of bits, which are organized into bytes.  The smallest region of accessible memory is a byte.  A set of bytes by itself does not distinguish values, addresses, and program code from one another.  The same set of bytes can even represent different values depending upon how we interpret the bytes: as an integer value, a floating-point value, or a string of characters.  The type system lets us organize memory into specific values, addresses, and program code. 

Each type in a type system defines:

  • how to store and to interpret information of that type and
  • what operations are valid on information of that type.

Once we assign a type to information in some region of memory, the compiler can flag any operation that is not permissible on that information.  For example, the compiler can identify the multiplication of two C-style strings as an error just because the definition of a C-style string type does not include the multiplication operation. 

Both C and C++ admit many types.  These include:

  • primitive types
    • scalar types - integral, floating-point, and pointer types
    • void types - valueless and operationless types
  • compound types - array, structure and union types
  • logic types - functions that contain program instructions

The sizeof() operator gives the size of a type in bytes.  The sizeof operator gives the size of any variable, object or expression in bytes.  For example,


 /* Type Sizes
  * sizeof.c
  * May 14 2007
  */

 #include <stdio.h>

 int main(void) {
     double x;
     printf("On this machine, \n"
      "the size of an int is %d bytes,\n" 
      "the size of x is %d bytes.\n",
      sizeof(int), sizeof x ); 
     return 0;
 }











 On this machine,
 the size of an int is 4 bytes, 
 the size of x is 8 bytes.


 

Note that sizeof() takes a type, while sizeof takes a variable, object or expression. 


Integral Types

Standard C and C++ define four integral types:

  • char
  • int
  • _Bool - C - bool - C++
  • enum

These types store integer values in equivalent binary form without any approximation. 

A value of char type occupies one byte of memory by definition: 

char
1 Byte
               

A value of int type occupies one word of memory.  On a 32-bit platform, one word spans 4 bytes:

int (32-bit platforms)
1 Byte 1 Byte 1 Byte 1 Byte
                                                               

On a 16-bit platform or an emulation, one word spans 2 bytes:

int (16-bit platforms)
1 Byte 1 Byte
                               

One word is the size of a CPU register.  So, the int type is the optimally efficient type. 

A value of _Bool or bool type occupies one byte: 

_Bool, bool
1 Byte
               

A variable of _Bool or bool type can hold one of two values: 0 for false and 1 for true. 

A value of enum type is a name rather than an integer.  For example, a variable of enum type may hold red, green, or blue as values instead of 1, 2, or whatever.  We call each name an enumeration constant.  The declaration of an enumerated type identifies these names and takes the form


 enum Tag { name1, name2, name3, ... };

where Tag is the identifier of the enumerated type.  The type of each name is an int.  The compiler associates a unique integer value with each name.  By default, the compiler associates the value 0 with the first name in the declaration and assigns to each subsequent name a value 1 greater than the value associated with the preceding name. 

The definition of a variable of enumerated type takes the form


 enum Tag identifier;

where identifier is the name of the variable.  For example,


 /* Enumerations
  * enum.c
  * Sep 21 2005
  */

 #include <stdio.h>
 /* declare the type Colour */
 enum Colour {white, red, green, blue}; 

 int main(void) {
     /* define two Colour variables */
     enum Colour wall, ceiling;

     wall = red;
     ceiling = white;

     printf("%d %d\n", wall, ceiling);

     return 0;
 }


















 1 0


 

We can assign our own values to the names by initializing any or all of them.  For example,


 /* Enumerations
  * enum2.c
  * May 21 2006
  */

 #include <stdio.h>
 /* declare the type Colour */
 enum Colour {white=17, red, green, blue}; 

 int main(void) {
     /* define two Colour variables */
     enum Colour wall, ceiling;

     wall = red;
     ceiling = white;

     printf("%d %d\n", wall, ceiling);

     return 0;
 }


















 18 17 


 

Note how the value of red is 18, which is 1 greater than white

The use of names improves both readability and modifiability.  If we insert a new enumeration constant into the type declaration, the compiler renumbers the values associated with the subsequent names listed. 


 /* Enumerations
  * enum3.c
  * Sep 21 2005
  */

 #include <stdio.h>
 /* declare the type Colour */
 enum Colour {white=17, yellow, red, green, blue}; 

 int main(void) {
     /* define two Colour variables */
     enum Colour wall, ceiling;

     wall = red;
     ceiling = white;

     printf("%d %d\n", wall, ceiling);

     return 0;
 }


















 19 17 


 


Size Specifiers

Size specifiers define the minimum number of bits associated with a type. 

The three size specifiers for the int type are:

  • short
  • long
  • long long

A value of short int type or more concisely of short type contains at least 16 bits:

short
1 Byte 1 Byte
                               

A value of long int type or more concisely of long type contains at least 32 bits:

long
1 Byte 1 Byte 1 Byte 1 Byte
                                                               

A value of long long int type or more concisely of long long type contains at least 64 bits:

long long
1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte
                                                                                                                               

Note that in size terms a short fits between a char and an int, while an int fits between a short and a long.


Range Specifiers

Range specifiers define the range of values associated with a type.  The two range specifiers for integral types are

  • signed
  • unsigned

Signed

By default, the ranges for integral types, except char, are centered on zero.  The encoding schemes available for storing negative values include:

  • two's complement notation,
  • one's complement notation, and
  • sign magnitude notation.

These three schemes represent positive values identically.  Two's complement, which is the most popular renders separate subtraction circuits in the ALU unnecessary and yields only one representation of 0. 

Standard C and C++ do not define a default range for the char type.  The range for the char type is platform dependent.  For example, phobos treats values of char type as extending from zero, while .net treats values of char type as centered on zero. 

Since the ASCII collating sequence extends from 0 to 127 inclusive, the char type stores ASCII characters identically regardless of the platform.  However, if we use a variable of char type to hold EOF (typically, -1), we need to identify the variable as centered on zero.  The keyword signed before the type name specifies this range.  For example,


 signed char c;

The range of a signed integral type depends upon the word size of the host platform and the encoding scheme for negative values.  For a two's complement scheme, the ranges are:

32-bit platforms
TypeSizeMinMax
 signed char 1 byte-128127
char1 byte<=0>=127
short>=16 bits<= -32,768>= 32,767
int1 word-2,147,483,6482,147,483,647
long>=32 bits<= -2,147,483,648>= 2,147,483,647
long long>=64 bits<= -9,223,372,036,854,775,808 >= 9,223,372,036,854,775,807 

16-bit platforms
TypeSizeMinMax
 signed char 1 byte-128127
char1 byte<=0>=127
short>=16 bits<= -32,768>= 32,767
int1 word-32,76832,767
long>=32 bits<= -2,147,483,648>= 2,147,483,647
long long>=64 bits<= -9,223,372,036,854,775,808 >= 9,223,372,036,854,775,807 

Note that the ranges for char, short, long, and long long types are independent of the word size of the host platform. Only the range for the int type is platform-dependant. 

Unsigned

The keyword unsigned specifies a range extending from zero into the positive numbers. 

If a variable always holds only non-negative integer values, we may add the keyword unsigned to the definition of the variable.  For example,

  • unsigned char letter;
  • unsigned short languages;
  • unsigned int persons; or more simply unsigned persons;
  • unsigned long students;
  • unsigned long long citizens;

In an unsigned type, all of the bits are available for storing value. 

TypeSizeMinMax - 32 bitMax - 16 bit
 unsigned char 1 byte0255255
unsigned short>=16 bits0>= 65,535
unsigned int1 word04,294,967,29565,535
unsigned long>=32 bits0>= 4,294,967,295
 unsigned long long >=64 bits0>= 18,446,744,073,709,551,615 

The range of an unsigned int type depends upon the word size of the host platform.


Floating-Point Types

Standard C and C++ define two floating-point types:

  • float - a single-precision, floating-point number
  • double - a double-precision, floating-point number

Neither defines the size of a value of float or double type absolutely. 

Typically, a value of float type occupies 4 bytes: 

float
1 Byte 1 Byte 1 Byte 1 Byte
                                                               

Typically, a value of double type occupies 8 bytes: 

double
1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte
                                                                                                                               

Size Specifier

The double type can take a long size specifier, which maximizes the number of significant digits used to represent a double-precision, floating-point number.  Typically, a value of long double type occupies at least 64 bits:

long double
1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte
                                                                                                                               

Standard C requires a long double type to occupy no less bits than a double type.  Note that the standard does not specify a minimum number of bits for this type. 

Data Representation

Values of floating-point types are stored approximately.  The most popular model is the IEEE (Eye-triple-E or Institute of Electrical and Electronics Engineers) Standard 754 for Binary and Floating-Point Arithmetic. 

Under IEEE 754, a value of float type occupies 32 bits, has one sign bit, a 23-bit mantissa and an 8-bit exponent: 

float
1 Byte 1 Byte 1 Byte 1 Byte
s exponent mantissa
                                                               
or

float
1 Byte 1 Byte 1 Byte 1 Byte
s mantissa exponent
                                                               

The value under this model is given by the formula


 x = sign * 2exponent * { 1 + f12-1 + f22-2 + ... + f232-23}

where fi is the value of bit i (i = 1,2,...,23) of the mantissa and


 -126 < exponent <= 128

Under IEEE 754, a value of double type occupies 64 bits, has one sign bit, a 52-bit mantissa and an 11-bit exponent: 

double
1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte
s exponent mantissa
                                                                                                                               
or

double
1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte 1 Byte
s mantissa exponent
                                                                                                                               

The value under this model is given by the formula


 x = sign * 2exponent * { 1 + f12-1 + f22-2 + ... + f522-52}

where fi is the value of bit i (i = 1,2,...,52) of the mantissa and

 -1022 <= exponent <= 1023

Limits

The limits on the number of significant digits and the ranges of the exponent for IEEE 754 variables of float and double types are:

TypeSizeSignificant DigitsMin ExponentMax Exponent
float4 bytes6-3738
double8 bytes15-307308

The exponent range values in this table are decimal (base 10). 


Storage Class Specifiers

Storage class specifiers identify storage duration.  This duration is also called extent or lifetime and may be local or static.  Storage class specifiers are mutually exclusive. 

Local Extent

Variables that are defined within a block or as function parameters have local extent unless otherwise specified.  Their lifetime lasts from their definition to the closing brace of the block that contains their definition. 

There are two usages:

  • normal usage
  • very frequent usage

To specify local extent under normal usage, we add the keyword auto (for automatic) to the definition 


 auto int local = 2;

Since this is the default for any variable defined within a block or for any function parameter, we seldom see this keyword used. 

To specify local extent under very frequent usage, we add the keyword register to the definition 


 register int local = 2;

This keyword suggest to the compiler that the local variable should remain in a CPU register as long as possible.  Since the number of registers is extremely limited, the compiler might not implement this suggestion. 

Static Extent

The alternative to local extent is a lifetime that ends only with the end of the life of the program itself.  We call this static extent.  A variable with static extent survives the block within which the variable has been defined. 

The variable of static extent may be:

  • internally linked - local to the module and unknown to the linker, or
  • externally linked - known to the linker.

Internal Linkage

A variable of static extent with internal linkage is invisible outside its own module.  To identify internal linkage, we add the keyword static to the definition


 static int local = 2;

For example,

 /* Internal Linkage
  * static.c
  * May 12 2007
  */

 #include <stdio.h>

 void display() {
     static int local = 0;

     printf("local is %d\n", local++);
 }

 int main(void) {

     display();
     display();

     return 0;
 }

















 local is 0
 local is 1


 

Note how the second call to display does not re-initialize local.  A static variable is convenient for counting. 

No conflict arises if another module has an identically named variable, which also has internal linkage.  The two variables are independent of one another and the compiler allocates separate memory locations for each variable. 

External Linkage

A variable of static extent with external linkage is shared by several modules.  The compiler allocates memory for the variable in one module, but the variable is accessible in all modules.  Each reference to the variable accesses the same memory location, regardless of the module from which the reference is made.  To identify external linkage, we add the keyword extern to the declaration


 extern int shared;

We omit this keyword in the definition


 int shared = 0;

In C, it is invalid to have both an initializer and extern present in a declaration.  C++ ignores extern if an initializer is present. 

For example,


 /* External Linkage - Module A
  * display.c
  * May 12 2007
  */

 #include <stdio.h>
 extern int shared; /* declaration */

 void display() {

     printf("shared  is at  %p\n", &shared);
     printf("shared is %d\n", shared++); 
 }

 /* External Linkage - Module B
  * extern.c
  * May 12 2007
  */

 #include <stdio.h>
 int shared = 0; /* definition */

 int main(void) {

     printf("shared  is at  %p\n", &shared);
     display();
     display();

     return 0;
 }









 AIX Platform
  
 shared  is at  20000580
 shared  is at  20000580
 shared is 0
 shared  is at  20000580 
 shared is 1






  
 Borland Platform
  
 shared  is at  0040A194
 shared  is at  0040A194
 shared is 0
 shared  is at  0040A194
 shared is 1

 

Note the absence of an initializer in the extern declaration and the absence of extern in the definition of shared

Synonym Types

Synonym types provide aliases for other types and improve readability.  To declare a synonym type, we use the keyword typedef.  A synonym type declaration takes the form


 typedef type Synonym;

where type is the original type along with any specifiers.  Synonym is the alias for the declared type. 

We can then write


 Synonym identifier;

For example


 typedef unsigned long long int VeryLong;

identifies the type VeryLong as an unsigned long long int

The compiler attaches the original declaration to every identifier in every definition.  For example, the compiler interprets the statement


 VeryLong x, y;

as the definition of two unsigned long long int variables: 


 unsigned long long int x;
 unsigned long long int y;

A synonym does not accept type specifiers. 


 unsigned VeryLong x, y; /* ERROR */ 


In-Class Practice

Try the practice problem in the Handout on Types and Specifiers.


Pointer Types

Standard C and C++ accept a pointer type for each and every type - specified and unspecified.  Pointer types include:

  • char *
  • int *
  • float *
  • double *

as well as

  • short *
  • long *
  • long long *
  • long double *

There is also a pointer type for each synonym type. 

Different pointer types are not assignment compatible in standard C:


 char *c;
 int *i;
 i = c;            /* ERROR - Different Types */ 
 i = (int *) c;    /* OK */

Size of a Pointer

Standard C and C++ do not define the size of a pointer type.  The size is platform dependent and may vary from type to type.  One popular assumption is that a variable of long type occupies at least as much space as any pointer. 

C99 defines intptr_t and uintptr_t as synonymns for pointer types to signed and unsigned types respectively.  A variable of one of these types can hold a pointer to a variable of any type. 

Synonym Pointer Types

Synonym pointer types simplify definitions of pointers. 

For example, let us define a synonym for a pointer to an int


 typedef int * Pint;

We can then define several pointer variables without having to include the * before each identifier


 Pint px, py;


Void Type

Standard C and C++ include a generic pointer type that is not associated with any particular type: 

  • void *

We may convert a pointer of some specific type into a generic pointer (void *) and back without incurring any change or loss of information. 


 void *v;
 int *i;
 v = i;  /* OK */
 i = v;  /* OK */

Consider a function that dumps the contents of an address regardless of the type of value stored in the address:


 /* Hexadecimal Representation of Different Data Types 
  * hexa.c
  * Sep 19 2005
  */

 #include <stdio.h>
 void dumpHexa(void *, int);

 int main(void) {
     int i;
     double x;

     printf("Enter an integer : ");
     scanf("%d", &i);
     printf("is Stored As     : ");
     dumpHexa(&i, sizeof i);
     putchar('\n');

     printf("Enter a floating-point value : ");
     scanf("%lf", &x);
     printf("is Stored As                 : ");
     dumpHexa(&x, sizeof x);
     putchar('\n');

     return 0;
 }

 /* Dump the first n bytes to the address a */
 void dumpHexa(void *a, int n) {
     int i;
     unsigned char *c = (unsigned char *)a;

     for (i = 0; i < n; i++)
         printf("%02x ", c[i]);
 }

In dumpHexa, we print each byte of the data stored in address a.  We first cast the generic pointer a to a pointer to an unsigned char.  We then display the contents of c[i] in hexadecimal notation. 

Note that we cannot dereference a generic pointer.  We must first cast the pointer to a specific data type. 


Type Qualifiers

Type qualifiers describe the properties of variables when used as lvalues.  An lvalue is any value that requires storing in an accessible region of memory. 

There are three qualifiers:

  • const - the lvalue is unmodifiable
  • volatile - the lvalue needs continual updating in memory
  • restrict - access to the lvalue is restricted

const

A const lvalue cannot have its value in memory updated.  Including this keyword instructs the compiler to reject any code that attempts to modify the lvalue. 

volatile

A volatile lvalue needs to have its memory location updated every time that the value changes.  Including this keyword instructs the compiler, in its attempts to optimize code, to retain all intermediate steps that store the value in memory.  This is sometimes important in hardware related code. 

restrict

A restrict lvalue applies only to pointer lvalues and specifies that the pointer is the only way to access the value pointed to.  No other pointer is used to access the same value.  This qualification allows the compiler to optimize code associated with the value. 


In-Class Practice

Try the practice problem in the Handout on Generic Pointers.


Exercise

  • Read pages 26-34, 37-41 from Evan Weaver's subject notes
  • Read Wikipedia on type systems




   Printer Friendly Version of this Page print this page     Top  Go Back to the Top of this Page
Previous Reading  Previous: Conditional Inclusion Next: Compound Types   Next Reading


  Designed by Chris Szalwinski   Copying From This Site