Monday, April 16, 2012

The Standard C Library for Linux, Part Two: character input/output

The last article was on file operations in the standard input/output library <stdio.h>.  This article is on reading and writing characters, strings and arrays to and from a stream.   I am assuming a knowledge of c programming on the part of the reader.  There is no guarantee of accuracy in any of this information nor suitability for any purpose.
As an example of character based processing we will use a program that reads the number of characters, words and lines of a file from standard input and prints the results out to standard out.  Any errors encountered will be printed to standard error.  This will be a weak version of  wc.  (type `man wc` for more information on this UNIX utility program.

The code examples given for each function will typically not run unless the the <angle bracked> items are replaced with real code.  Normally these are things that have to be treated differently depending on what you are trying to do.  As always, if you see an error in my documentation please tell me and I will correct myself in a later document.

-----------------------------------------------------------------
 
#include <stdio.h>     /* include the proper headers */
#define  IN 1          /* looking inside a word */
#define OUT 0          /* looking at white space */
/* count the number of lines, words and characters in standard input */
main() {
    int c,             /* holds the character returned by getchar */
        characters,    /* the number of characters */
        lines,         /* the number of lines */
        words,         /* the number of words */
        state;         /* are we currently in or out of a word */
    /* initialize the count and set the state to outside a word */
    state=OUT;
    characters = words = lines = 0;
    /* get one character at a time from standard in, until EOF */
    while ((c = getchar()) != EOF) {
        characters++;        /* increment the count of characters */
        switch(c) {
        case '\n' :
            lines++;         /* increment the count of lines */
            state = OUT;     /* new-line is white space, outside word */
            break;
        case ' ' :
            state = OUT;     /* space is white space, outside word */
            break;
        case '\t':
            state = OUT;     /* tab is white space, outsides word */
            break;
        default :            /* otherwise we are in a word */
            if (state == OUT) {
                             /* if state is still out and we are in a word */
                             /* then we are at the first letter of the word */
            state = IN;      /* set the state to in */
            words++;         /* increment the count of words */
            }
            break;
        }
    }
    /*  print the results with a formatted print statement */
    printf("%d %d %d\n", characters, words, lines);
}
-----------------------------------------------------------------
 
To read a character from a stream,
    int   getchar (void);
    int   getc(FILE *stream);
    int  fgetc(FILE *stream);
    int ungetc(int c, FILE *stream);
void this is left blank.
FILE *stream is an already existing stream.
int c is a character to be pushed back into the stream.
These functions return an int with the value of the next character from the stream.  If there are no more characters then the end-of-file indicator is set for the stream and the function returns EOF.  If there was a read error then the error indicator is set for the stream and the function returns EOF.

getchar is used to read a single character from standard input.
    int c;
    while((c = getchar()) != EOF) {
         <while not the end of file read and process each character>
    }
 
getc is used to read a single character from a stream.
    int c;
    FILE *stream;
    if((stream = fopen ("filename", "r")) != (FILE *)0) {
       while((c = getc(stream)) != EOF) {
         <process each character>
       }
    } else {
        <do error handling>
    }
 
fgetc is used to read a single character from a stream.  getchar and getc are written in terms of fgetc: getchar() is the same as fgetc(STDIN) and getc(x) is the same as fgetc(x).
    int c;
    FILE *stream;
    if((stream = fopen ("filename", "r")) != (FILE *)0) {
       while((c = fgetc(stream)) != EOF) {
         <process each character>
       }
    } else {
        <do error handling>
    }
 
ungetc is used to push a character back into the stream when you have read one character too many.  This is a common problem for compilers and pattern scanners.  It is possible to push back more than one character but this is not recommended as it is not portable.  An ungetc should follow a read and only push back a single character.
    int c;
    FILE *stream;
    if((stream = fopen ("filename", "r")) != (FILE *)0) {
       while((c = fgetc(stream)) != EOF) {
         <process each character>
         if (some_condition) {
            ungetc(c,stream);
            break;
         }
       }
    } else {
        do error handling
    }
 
To write a character to a stream,
    int  putchar(int c);
    int  putc(int c, FILE *stream);
    int fputc(int c, FILE *stream);
FILE *stream is an already existing stream.
int c is the character to be written to the stream.
These functions return the character written upon success.  If a write error occurs the error indicator is set for the stream and the function returns an EOF.
putchar writes a character to standard out.  putchar(x) is the same as fputc(x, STDIN)
 
    putchar('x');
putc writes a character to the stream.  putc(x,y) is the same as fputc(x,y)
    int c;
    FILE *stream;
    c='x';
    if((stream = fopen ("filename", "w")) != (FILE *)0) {
        putc(c, stream);
    } else {
        <error handling>
    }
 
fputc writes a character to the stream.
    int c;
    FILE *stream;
    c='y';
    if((stream = fopen ("filename", "w")) != (FILE *)0) {
        fputc(c, stream);
    } else {
        <error handling>
    }
 
To read a string from a stream,
    char  *gets(char *s);
    char *fgets(char *s, int n, FILE *stream);
char *s the string that will hold the result.
int n the maximum number of characters to read.
FILE *stream is an already existing stream.
If the read is successful then the pointer to s is returned.  If EOF is encountered and no characters have been read into the string then the string remains unchanged and a null pointer is returned.  If a read error occurs then the string contents are possibly changed in an undefined manner and a null pointer is returned.
gets reads from the stream into the string until the new line character or end-of-file marker is reached.  Never use this function.  Use fgets instead.  There is no bounds checking to see if the returned string fits into the space allowcated for it.  Many applications have been used as security holes in the past based on overwriting the end of a string.
fgets reads at most n characters from the stream into the string.
    char s[1024];
    FILE *stream;
    if((stream = fopen ("filename", "r")) != (FILE *)0) {
       while((fgets(s, 1023, stream)) != (char *)0 ) {
         <process each line>
       }
    } else {
        <do fopen error handling>
    }
 
To write a string to a stream,
    int  puts(const char *s);
    int fputs(const char *s, FILE *stream);
const char *s
FILE *stream is an already existing stream.
Returns a non-negative value upon success.  Returns an EOF on a write error.
puts writes the string pointed to by s to the stream STDIO and appends a new-line to the end.  The terminating null character is not written to the stream.
    char s[1024];
    FILE *stream;
    strcpy(s,"a typical string");
    if((stream = fopen ("filename", "w")) != (FILE *)0) {
       if(puts(s, stream) == EOF ) {
         <handle error on write>
       }
    } else {
        <handle error on open>
    }
 
fputs writes the string pointed to by s to the named stream.  The terminating null character is not written to the stream.
    char s[1024];
    FILE *stream;
    strcpy(s,"a typical string");
    if((stream = fopen ("filename", "w")) != (FILE *)0) {
       if(fputs(s, stream) == EOF ) {
         <handle error on write>
       }
    } else {
        <handle error on open>
    }
 
To read/write between arrays and streams,
    size_t fread(const void *ptr, size_t size, size_t nmemb, FILE *stream);
    size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);
const void *ptr is a pointer to the array.
size_t size is the size of each element of the array
size_t nmemb is the number of elements to be processed.
FILE *stream is an already existing stream.
fread reads into the array pointed to by ptr, no more than nmemb elements of the size size, from the stream.  The function returns the number of elements that were successfully read, this value can be less than what was requested, if the function encounters a read failure or an EOF.  A read failure leaves the element that failed in an undefine state. If size or nmemb are zero then the function returns a zero.
    int a[10];
    FILE *stream;
    if((stream = fopen ("filename", "r")) != (FILE *)0) {
        if (fread(a, sizeof(a), 10, stream) < 10){
            <handle a read error>
        }
    } else {
        <handle a file open error>
    }
 
fwrite writes from the array pointed to by ptr, no more than nmeb elements of the size size, to the stream.  The function returns the number of elements successfully written, which should match nmemb only if no write errors were encountered.
    int a[10];
    FILE *stream;
    if((stream = fopen ("filename", "w")) != (FILE *)0) {
        if (fwrite(a, sizeof(a), 10, stream) < 10){
            <handle a write error>
        }
    } else {
        <handle a file open error>
    }
 


Bibilography:

The ANSI C Programming Language, Second Edition, Brian W. Kernighan, Dennis M. Ritchie, Printice Hall Software Series, 1988 The Standard C Library, P. J. Plauger, Printice Hall P T R, 1992
The Standard C Library, Parts 1, 2, and 3, Chuck Allison, C/C++ Users Journal, January, February, March 1995
STDIO(3), BSD MANPAGE, Linux Programmer's Manual, 29 November 1993
The Standard C Library for Linux, Part One, James M. Rogers, January 1998




Copyright © 1998, James Rogers

No comments:

Post a Comment