Tuesday, February 7, 2012

Standard C Library for Linux, Part 01, file functions

C is a very small language.  This is a good thing.  C programmers will use nearly the entire C language every time they write a fair sized program.  The standard C library extends the functionality of C in a way that is predictable on multiple systems.  The library gives us tools like scanf() and printf() that make reading and writing formatted output much easier than working with blocks of characters using read() and write().  Also when you move from one C programming environment to another, the functionality of printf() will be the same.  You don't have to relearn how to print formatted output every time you change machines. In this series of articles I will discuss the tools that are available for the programmers in the standard C library.  At the end is a bibliography of the books and articles that I used to get this information.  I refer to these books and magazines on a daily basis when I program.  If you want to work as a C programmer I strongly recommend that you buy and read these books and magazines.

Many times the standard functions are overlooked and reinvented by programmers (including myself!) to do things like seeing if a character is a letter by:

            (c=>'a' || c<='z' && c=>'A' || c<='Z')

instead of using

            isalpha(c);

The second form is much easier to read and figure out.  There is another reason to use the second form.  The first example only works for ASCII character sets, the second will work on _any_ machine.  If you want to write portable code (code that can be compiled and ran on any machine with very minor changes) then use the standard C library.

Several times in the past I have written code that took time to write and debug and interface to the rest of my program only to discover that there was already a function that did what I wanted to do.  A few months ago I was writing my own Multi User Dimension (MUD), based on a client/server article in Linux Journal, and I needed to process what the user had entered, one word at a time.  So I wrote my own string tokenizer.  Turns out I could have used the strtok() function to do almost the exact same thing.  And other people will know what the strtok() function does without having to decipher my code.

Make your life easier, use the standard C library.  It will also help all of us who try to update and maintain your code later.

The GNU compiler, gcc, comes with the GNU standard C library.  This compiler is one of the best in the world and the GNU standard C library conforms almost exactly to the standard.  In the places where the standard is imprecise, you can expect very reasonable behavior from both the compiler and the library.  I am going to discuss the GNU standard C library in these articles.

The <stdio.h> library handles the standard input and output functions for C programmers.  It is also by far the largest library.  Because the library is so large I am going to group the commands in these sections:  file operations, input and output.

Now before we talk about files we need to agree on the words that we are going to use.  In Linux a file or device is considered to be a stream of data.  This stream of data that is associated with a file or hardware device is accessed by your program opening the file or device.  Once the stream is opened then you can read and/or write to it.

Three streams are opened automatically when you execute a program.  Standard input (stdin), standard output (stdout), and standard error (stderr).  These can all be redirected by your shell when you run the program but normally stdin is your keyboard and stdout and stderr both go to your monitor.

After you are done with your streams you need to tell the operating system to clean up buffers and finish saving data to the devices.  You do this by closing the stream.  If you don't close your stream then it is possible to lose data.  stdin, stdout and stderr are all closed automatically the same way they are opened automatically.

One of the most important things to remember when dealing with devices and files is that you are dealing with the real world.  Don't assume that the function is going to work.  Evan something like printf can fail. Disks fill up or occasionally fail, users input the wrong data, processors get too busy, other programs have your files locked.  Murphy's Law is in full effect when it comes to computer systems. Every function that deals with the real world returns an error condition if the function failed.  Always check every return value and take the appropriate action when there is an error condition.  Exceptions are not errors unless they are handled badly. Exceptions are opportunities for extra computation (William Kahan, on exception handling.)

The first example is to basically show how to open a file for reading.  It just dumps a file called test in the current directory to the standard out.  All exceptions are reported to standard error and then program halted is with an error return.  It should produce an error if a file called test doesn't exist.

--------------------------------------------------------


#include <stdio.h> /* this is a compiler directive that tells the

the compiler that you are going to be using

functions that are in the standard input /

output library

*/

main (){

/* declare variables */

FILE *stream; /* need a pointer to FILE for the stream */

int buffer_character; /* need an int to hold a single character */

/* open the file called test for reading in the current directory */

stream = fopen("test", "r");

/* if the file wasn't opened correctly than the stream will be

equal to NULL. It is now customary to represent NULL by casting

the value of 0 to the correct type yourself rather than having the

compiler guess at the type of NULL to use.

*/

if (stream == (FILE *)0) {

fprintf(stderr, "Error opening file (printed to standard error)\n");

exit (1);

} /* end if */

/* read and write the file one character at a time until you reach

end-of-file on either our file or output. If the EOF is on file_descriptor

then drop out of the while loop. if the end-of-file is on report write

errors to standard out and exit the program with an error condition

*/

while ((buffer_character=getc(stream))!=EOF) {

/* write the character to standard out and check for errors */

if((putc(buffer_character, stdout)) == EOF) {

fprintf(stderr,"Error writing to standard out. (printed to standard error)\n");

fclose(stream);

exit(1);

} /* end if */

} /* end while */

/* close the file after you are done with it, if file doesn't close then report and exit */

if ((fclose(stream)) == EOF) {

fprintf(stderr,"Error closing stream. (printed to standard error)\n");

exit(1);

} /* end if */

/* report success back to environment */

return 0;

} /* end main*/
-------------------------------------------------------------
 
The above simple program is an example of  opening a file, reading the file, and then closing the file while also using stdout, and stderr.  I cut and pasted the code to a vi session and then saved, compiled, and ran the program.

What follows is a quick summary of the file operations in the <stdio.h> library.  These are the operations that work directly with streams.

Opening Streams

Before a stream can be used you must associate the stream with some device or file.  This is called opening the stream.  Your program is asking for permission from the operating system to read or write to a device.  If you have the correct permissions, the file exists or you can create the file and no-one else has the file locked then the operating system allows you to open the file and gives you back an object that is the stream.  Using this object you can read and write to the stream and when you are done you can close the stream.

Let me discribe the format of the descriptions that you will see here and in the man pages.  The first entry is the type that is returned by the function call.  The second part is the function name itself and the third part is the list of variable types that the function takes for arguments.

Looking at the first line below we see that the fopen function takes two pointers to strings, one is a path to a file and the other is the open mode of the program.  The function will return a pointer to FILE type which is a complex object that is defined in the <stdio.h> library. So in order to accept the return type you must have declared a variable of type pointer to FILE, like the stream variable in the example above on line 9.  On line 13 of the example you can see where I call the function fopen with the static filename of "test" and a mode of "r" and then accept the return value into the stream object.

A stream can be opened by any of these three functions:
    FILE   *fopen( char *path, char *mode)
    FILE  *fdopen( int fildes, char *mode)
    FILE *freopen( char *path, char *mode, FILE *stream)
 
char *path is a pointer to a string with the filename in it.

char *mode is the mode of opening the file (table follows.)

int fildes is a file descriptor which has already been opened and whose mode matches.
    You can get a file descriptor with the UNIX system function open.  Please note that a file descriptor is not a pointer to FILE.  You cannot close(stream), you must fclose(stream).  This is a very hard error to find if your compiler doesn't warn you about it.  If you are interested in Linux System calls type `man 2 intro`  for an introduction to the functions and what they do.
FILE *stream is an already existing stream.

These functions return a pointer to FILE type that represents the data stream or a NULL of type (FILE *)0 on any error condition.

fopen is used to open the given filename with the respective mode.  This is the function that is used the most to open files.

fdopen is used to assign a stream to a currently opened file descriptor.  The file descriptor mode and the fdopen mode must match.

freopen is normally used redirect stdin, stdout and stderr to file.  The stream that is given will be closed and a new stream opened to the given path with the given mode.

This table shows the modes and their results:
       open stream for  truncate create starting
mode   read    write    file     file   position
----   ----    -----    ----     ----   --------
"r"     y        n       n        n     beginning
"r+"    y        y       n        n     beginning
"w"     n        y       y        y     beginning
"w+"    y        y       y        y     beginning
"a"     n        y       n        y     end-of-file
"a+"    y        y       n        y     end-of-file
To read the first line, "r" will open a stream for read, the stream will not be opened for write, will not truncate the file to zero length, will not create the file if it doesn't already exist and will be positioned at the beginning of the stream.

Stream Flushing

Sometimes you want your program to ensure that what you have written to a file has actually gone to the disk and is not waiting in the buffer. Or you might want to throw out a lot of user input and get fresh input, for a game.  The following two functions are useful for emptying the streams buffers, though one just throws the data away while the other stores it safely on to the stream.

    int fflush(FILE *stream)
    int fpurge(FILE *stream)
 
FILE *stream is an already existing stream.

These functions return a 0 on success.  On a failure they return an EOF.


fflush is used to write out the buffers of the stream to a device or file.

fpurge is used to clear the buffers of unwritten or unread data that is in a buffer waiting.  I think of this as a destructive purge because it clears the read and write buffers by dumping the contents.

Closing Streams

When you are done with a stream you must clean up after your program.  When you close a stream the command ensures that the buffers are successfully written and that the stream is truly closed.  If you just exit a program without closing your files then more than likely the last few bytes that you wrote will be there.  But you won't know unless you check.  Also there is a limit to how many streams a single process can have open at one time. So if you keep on opening streams without closing the old streams you will use up system resources.  Only one command is used to close any stream.

    int fclose(FILE *stream)
 
FILE *stream is an already existing stream.

Returns a 0 on success, or an EOF otherwise.

fclose flushes the given streams buffers and then disassociates the stream from the pointer to FILE.

Renaming and Removing Files

These two commands work just like rm and mv, but without the options.  They are not recursive but your programs can be so watch that you don't accidentally build your own version of rm -rf / <<<by they way don't type this, it would delete your entire harddrive!!>>>

    int    remove(char *path)
    int    rename(char *oldpath, const char *newpath)
 
char *path, oldpath and newpath are all pointers to existing files.

Returns a 0 on success and a non-zero otherwise.

remove works just like rm to remove the file in the string pointed to by path.

rename works just like move to rename a file from oldpath to newpath, changing directories if need be.

Temporary Files

You can create your own temp files by using the following functions:

    FILE *tmpfile(void)
 
This command returns a pointer to a FILE of stream which is a temp file that magically goes away when your program is done running.  You never even know the files name.  If the function fails it returns a NULL pointer of type (FILE *)0.

    char  *tmpnam(char *string)
 
This function returns a filename in the tmp directory that is unique, or a NULL if there is an error.  Each additional call overrides the previous name so you must move the name somewhere else if you need to know the name after you open the file.

Stream Buffering

Normally a stream is block buffered, unless it is connected to a terminal like stdin or stdout.  In block buffered mode the stream reads ahead a set amount a and then gives you what the input that you ask for as you ask for it.  Sometimes you want this to be bigger or smaller to improve performance in some program.  The following four functions can be used to set the buffering type and the size of the buffers.  The defaults are normally pretty good so you shouldn't have to worry too much about these.

    int     setbuf( FILE *stream, char *buf);
    int  setbuffer( FILE *stream, char *buf, size_t size);
    int setlinebuf( FILE *stream);
    int    setvbuf( FILE *stream, char *buf, int mode , size_t size);
 
Where mode is one of the following:
    _IONBF unbuffered, output sent as soon as received.
    _IOLBF line buffered, output sent as soon as a newline is received.
    _IOFBF fully buffered, output isn't sent until size characters are received.

setbuf is an alias for     setvbuf(stream,  buf,  buf  ?  _IOFBF   :   _IONBF, BUFSIZ);

setbuffer is an alias for  setvbuf(stream,  buf,  buf  ?  _IOFBF   :   _IONBF, size);

setlinebuf is an alias for setvbuf(stream, (char *)NULL, _IOLBF, 0);
 
setvbuf sets a buffer for the given stream of size_t size and of buffer mode.

Stream Posistioning

Once you open a stream you are located at a certain postition depending on what mode you opened the stream in, as you read or write your position increases with each character.  You can see where you are at in the stream and jump to any position in the stream.  If you are writing a database program you don't want to have to read and ignore a million characters to get to the record that you want, you want to be able to jump right to the record and start reading.

Note that terminals cannot have their stream repositioned, only block devices (like hard drives) will allow this.

Also note that if you open a file for writing and use fseek to go out 10,000 bytes, write one character and then close the file that you will not have a file of 10,001 bytes.  The file will be much smaller.  This is called a sparse file.  If you move a sparse file using the mv command it will not change size because a mv is only a change to the directory structure, not the file.  If you cp or tar a sparse file then it will expand out to its true size.

    int   fseek( FILE *stream, long offset, int whence);
    long  ftell( FILE *stream);
    void rewind( FILE *stream);
    int fgetpos( FILE *stream, fpos_t *pos);
    int fsetpos( FILE *stream, fpos_t *pos);
 
FILE *stream is a pointer to an already existing stream.

long offset is added to the position indicated by the whence.

int whence is SEEK_SET, SEEK_CUR, or SEEK_END depending on where you want the offset to be applied to:  the beginning, the current position or the end.

fpos_t *pos is a complex file position indicator.  On some systems you must use this to get and set stream positions.
 
If these functions are successful fgetpos, fseek, fsetpos return  0,  and  ftell returns the current offset.  Otherwise, EOF is returned.  There is no return value from rewind.

fseek sets the file position in the stream to the value of offset plus the position indicated by the whence, either the begginning, the current or the end of file to get the new position in the stream.  This is useful for reading along, adding something to the end of the stream and then going back to reading the stream where you left off.

ftell returns the current position of the stream.

rewind sets the current position to the beginning of the stream.  Notice that no error code is returned.  This function is assumed to always suceed.

fgetpos is used like ftell to return the position of the stream.  The position is returned in the pos variable which is of type fpos_t.

fsetpos is used like fseek in that it will set the current postion of the stream to the value in pos.

On some systems you have to use fgetpos and fsetpos in order to reliably position your stream.

Error Codes

When any of the above functions return an error you can see what the error was and even get a text error message to display for the user.  There are a group of functions that deal with error values.  It is enough for now to be able to see that you have an errors and stop.

However, if you write a nice GUI word processor you don't want the program to stop everytime it can't open a file.  You want it to display the error message to the user and continue.  In a future article I will deal this error code functions, or someone else can summarize them for us and send in an article and some commented source code to show us how it's done.

If anyone is interested the functions are: clearerr, feof, ferror, and fileno.

Conclusion

Well that's enough for this month.  I have learned a lot and I hope you have as well.  Most of this information is available through man page system but the dates on these are 4 years old.  If anyone has updates on any of this information please send it to me and I will correct myself in further articles.

Next month I want to talk about input and output.  I will take the simple program above and add some functionallity to it to add a column of numbers and output the results to standard out.  Hopefully this example program can grow into something useful.

Bibilography:

The ANSI C Programming Language, Second Edition, Brian W. Kernighan, Dennis M. Ritchie, Printice Hall Software Series, 1988 The Standard C Library, P. J. Plauger, Printice Hall P T R, 1992
The Standard C Library, Parts 1, 2, and 3, Chuck Allison, C/C++ Users Journal, January, February, March 1995
STDIO(3), BSD MANPAGE, Linux Programmer's Manual, 29 November 1993
Unidentified File Objects, Lisa Lees, Sys Admin, July/August 1995
A Conversation With William Kahan, Jack Woehr, Dr Dobb's Journal, November 1997
Java and Client-Server, Joe Novosel, Linux Journal, January 1997

No comments:

Post a Comment