Part E - Secondary Storage

Text Files

Write programs that read text data from files and write data from memory to text files
Write programs that receive formatted input and generate formatted output

"The next step is to write a program that accesses a file that is not already connected to the program" (Kernighan and Ritchie, 1988)

Files | Communication | Management | Comparison | Exercises



Secondary storage provides a mechanism for storing data beyond the execution of a program.  The contents of secondary memory are accessible after powering off and powering back on.  The storage units in secondary memory take the form of files. 

This chapter describes how to connect a C program to a file in secondary memory, how to store information in that file and how to access that information. 


Files

A file is a named area of secondary storage.  The file itself may be fragmented: it does not necessarily occupy contiguous space on the storage device.

operating system controls file fragmentation

The fundamental storage unit in a file is the byte.  The distinguishing feature of a file is its end-of-file mark.  We refer to this mark as EOFEOF typically has the value -1

Text Format

A file holds information in one of two formats:

  • text - readable and editable data
  • binary - executable program code (beyond the scope of these notes)

Storing data in text format makes it suitable for displaying and modifying through a text editor.  Text files are portable across platforms that share a standard character set.  A common standard is the IEC/ISO 646-1083 Invariant Code Set, which consists of

  • 52 upper and lower case alphabetic characters: A, B, ..., Z, a, b, ..., z
  • 10 digits: 0, 1, ..., 9
  • space
  • null, line feed, carriage return, horizontal tab, vertical tab and form feed: \0, \l, \n, \t, \v, \f
  • 29 graphic characters: ! # % ^ & * ( _ ) - + = ~ [ ] ' | \ ; : " { } , . < > / ?

This set excludes $ and `.  The encoding for characters like $ and ` does not produce the same characters everywhere (for more details see National Variants). 

Sequential Access

A common way to access the information in a text file is sequentially, byte by byte.  We process the file as a stream of bytes without skipping any byte until we reach its end-of-file mark. 


Connection

A C program connects to a file through an object of FILE type.  The object holds information about the file itself and keeps track of the next position to be accessed. 

File Data Structure

Allocating a pointer to a FILE object takes the form

 FILE *identifier;

where identifer is a pointer to a FILE object.  We call this pointer a handle to the object.  The structure type FILE is declared in the <stdio.h> header file. 

To allocate memory for a FILE pointer, we write

 #include <stdio.h>

 FILE *fp = NULL;

We initialize the pointer fp to NULL as a precaution against premature dereferencing.  If our program accesses data at fp before the file is open, our program will generate a segmentation fault.  (NULL is defined in the <stdio.h> header file.)

Opening a File

fopen() opens the named file and returns the address of the FILE object that connects to the file.  The prototype for fopen() is

 FILE *fopen(const char file_name[], const char mode[]);

The first parameter holds the address of a null-terminated string containing the file's name.  The second parameter holds the address of a null-terminated string specifying the connection mode. 

The most common connection modes are

  • "r" - read from the file
  • "w" - write to the file: if the file exists, truncate its contents and then write; if the file does not exist, create a new file and then write to that file
  • "a" - write to the end of the file: if the file exists, append to the end of the file; if the file does not exist, create it and then write to it

The less common connection modes for text files are

  • "r+" - opens the file for reading and possibly writing
  • "w+" - opens the file for writing and possibly reading; if the file exists, truncates its contents and then writes to the file; if the file does not exist, creates a new file and then writes to that file
  • "a+" - opens the file for writing to the end of the file and possibly reading; if the file exists, appends to the end of the file; if the file does not exist, creates it and then writes to the file

The mode parameter is a null-byte terminated string (NOT A CHARACTER). 

To open a file named alpha.txt for writing, we write

 // Open a file
 // openFile.c

 #include <stdio.h>

 int main(void)
 {
         FILE *fp = NULL;

         fp = fopen("alpha.txt","w");
         if (fp != NULL) {

                 // statements to be added later

         } else {
                 printf("Failed to open file\n");
         }
         return 0;
 }

fopen() returns NULL if it fails to connect to the file.  fopen() can fail due to lack of permission, premature removal of the secondary storage medium or a full device. 

Closing

fclose() disconnects the file from the calling program.  This library function takes as its only parameter the file pointer.  The prototype for fclose() is

 int fclose(FILE *);

If the file is open for writing or appending, fclose() writes any data remaining in the file's buffer to the file and appends the end of file mark after the last character written.  If the file is open for reading, fclose() ignores any data left in the file's buffer and closes the connection. 

To close a file named alpha.txt that is open for writing, we write

 // Close an Opened file
 // closeFile.c

 #include <stdio.h>

 int main(void)
 {
         FILE *fp = NULL;

         fp = fopen("alpha.txt","w");
         if (fp != NULL) {
                 // statements to be added later
                 fclose(fp);
         } else {
                 printf("Failed to open file\n");
         }
         return 0;
 }

fclose() returns 0 if successful, EOF if unsuccessful.  fclose() fails if the storage device is full, an I/O error occurs or the storage medium is prematurely removed. 


Communication

A C program communicates with an open file through the following library functions:

  • fprintf() - formatted write to file
  • fputc() - write single character to file
  • fputs() - write string to file
  • fscanf() - formatted read from file
  • fgetc() - read single character from file
  • fgets() - read string from file

Writing

Formatted Writing

fprintf() writes data to a connected file under format control.  The prototype for this library function is

 int fprintf(FILE *fp, const char [], ...);

The first parameter holds the address of the FILE object.  The second parameter holds the format string.  This string may contain text to be written directly to the file as well as conversion specifiers, if any, to be applied to the data values supplied as arguments. 

For example:

 // Writing to a File
 // writeToFile.c

 #include <stdio.h>

 int main(void)
 {
         FILE *fp = NULL;
         char phrase[] = "My name is Arnold";

         fp = fopen("alpha.txt","w");
         if (fp != NULL) {
                 fprintf(fp, "%s\n", phrase);
                 fclose(fp);
         } else {
                 printf("Failed to open file\n");
         }
         return 0;
 }

Unformatted Writing

fputc() writes a single character to a file.  The prototype for this library function is:

 int fputc(int ch, FILE *fp);

ch receives the character to be written and fp receives the address of the FILE object.  fputc() returns the character written, or EOF in the event of an error.

fputs() writes a null-terminated string to a file.  The prototype for this library function is:

 int fputs(const char str[], FILE *fp);

str receives the address of the string to be written and fp receives the address of the FILE object.  fputs() returns a non-negative value if successful; EOF in the event of an error.

Reading

Formatted Reading

fscanf() reads a sequence of bytes from a file under format control.  The prototype for this library function is

 int fscanf(FILE *, const char [], ...);

The first parameter receives the address of the FILE object.  The second parameter receives the address of the format string.  This string contains the conversion specifiers to be used in converting the file data.

For example:

 // Reading from a File
 // readFromFile.c

 #include <stdio.h>

 int main(void)
 {
         FILE *fp = NULL;
         char phrase[61];

         fp = fopen("alpha.txt","r");
         if (fp != NULL) {
                 fscanf(fp, "%60[^\n]\n", phrase);
                 printf("You read : %s\n", phrase);
                 fclose(fp);
         } else {
                 printf("Failed to open file\n");
         }
         return 0;
 }

Unformatted Reading

fgetc() reads a single character from a file.  The prototype for this library function is

 int fgetc(FILE *fp);

fp holds the address of the FILE object.  fgetc() returns the character read; EOF in the event of an error.

fgets() reads a stream of bytes from a file.  The prototype for this library function is

 char* fgets(char str[], int max, FILE *fp);

str receives the address of the string to be filled.  max receives the maximum number of bytes in str including space for the null byte.  fp receives the address of the FILE object.  fgets() appends the null byte to the stored string.  fgets() returns the address of str if successful; NULL in the event of an end of file or read error.


Management

Library functions for managing the state of a FILE object include:

  • rewind() - rewind the file
  • feof() - identify the end of the file

Rewind

rewind() resets the FILE object so that the next byte to be accessed will be the first byte in a file.  To jump to the beginning of a file, we simply rewind the file instead of disconnecting and re-connecting it.  The prototype for this library function is

 void rewind(FILE *fp);

Consider a text file named spring.txt that contains

 Light Jacket
 Long-Sleeved Shirts
 Large Skateboards

The following program reads and displays this data, rewinds the file and reads and displays again

 // Reading from a file
 // readFromFile.c

 #include <stdio.h>

 int main(void)
 {
         FILE *fp = NULL;
         char phrase[61];

         fp = fopen("spring.txt","r");
         if (fp != NULL) {
                 while (fscanf(fp, "%60[^\n]%*c", 
                  phrase) != EOF)
                         printf("%s\n", phrase);
                 rewind(fp);
                 while (fscanf(fp, "%60[^\n]%*c",
                  phrase) != EOF)
                         printf("%s\n", phrase);

                 fclose(fp);
         } else {
                 printf("Failed to open file\n");
         }
         return 0;
 }













 Light Jacket
 Long-Sleeved Shirts
 Large Skateboards


 Light Jacket
 Long-Sleeved Shirts
 Large Skateboards





End of File

feof() indicates whether or not the caller has attempted to read the end-of-file mark; that is, to read beyond the last character in the file.  The prototype for this library function is

 int feof(FILE *fp);

feof() returns false (0) if the caller has not attempted to read the end-of-file mark; true if the caller has attempted to read the end-of-file mark. 

If the next byte to be read is the end-of-file mark but the caller has not yet read the mark (that is, has only read the last character in the file), feof() returns false.  In other words, to receive true, the caller must have attempted to read the end-of-file mark at least once. 


Comparison

The functions for file communication share many common properties with the functions for user communication.  Both belong to the same library, follow the same rules for format control and share a common syntax.  Nevertheless, the file functions differ from the standard I/O functions in a few respects as shown in the table below.

 Return 
 Type
 Standard I/O  File I/O Notes
int scanf(...) fscanf(fp,...) check to see if the return value is EOF
int printf(...) fprintf(fp,...) returns the number of characters written
int getchar() fgetc(fp) check to see if the return value is EOF before converting it to a char type
int putchar(ch) fputc(ch, fp) check to see if the return value is EOF
char *   fgets(str, max, fp) adds the '\0' byte, does not discard the '\n' delimiter, returns NULL if it encountersan end of file mark
int puts(str) fputs(str, fp) puts appends a '\n' to the output stream; fputs does not append '\n' to the file;

Exercises