Fall 2012

CSCI 551

Programming FAQ

(These are suggestions and not requirements.)

(Note: this page can change without notice!)

Quick index:

printf()
- how should I format binary data as hexstring?
Reading and writing data
Setting & testing bits
Parse string data
C++
stdin
gdb
- using the cat command doesn't work in gdb, what should I do?
- how do I set a conditional breakpoint?
UNIX (doesn't really belong here, but it's a convenient place to put it)

printf()

Q: I have an array of characters and I'm trying to print it out one character at a time as hexstring. I'm using printf("%x",my_char). If my buffer contains 0123456789abcdef in hex, I get 1234567ffffff89ffffffabffffffcdffffffef. What am I doing wrong?

The man pages of printf() says that for "%x" the argument is an unsigned int. If you pass a char (which is a signed character), then it gets sign extended first if it's a negative number and then get converted to an unsigned int. So, for 0x88, since it has its bit-7 on, it's a negative character and get sign-extended to 0xffffff88.

I think if you just typecast it to (unsigned char), then it should be okay. So, do:

printf("%x", (unsigned char)your_ch);

But then 0x01 shows up as "1". So, you need to force it to print in two digits with a leading zero if it's single digit. For that, you should do:

printf("%02x", (unsigned char)your_ch);

Reading & Writing Data

Q: How do I read a line from a text file?

You can use fgets().

But what if the input is too long? When you call fgets(), you must give a buffer and its size. If the input line you are reading is actually longer than the buffer size, then it doesn't really read a complete line.

One way is to write a wrapper function to read a possibly very very long line. In this function, you call fgets() using a local buffer.

You set the last character in the buffer to be '\0' before you call fgets().
Call fgets().
Check the last character and see if it's still '\0'.
1. If it's not '\0' and it's not '\n', then you have not finished reading a line yet. So, you allocate a new buffer and copy the data into this new buffer and go back to step (1) above. If there is already an allocated buffer, call realloc() to make it bigger.
2. Otherwise, you are done. Return the data in an allocated buffer.

Q: How do I read a byte from a binary file?

You can use fread().

If you don't know if you are reading a text file or a binary file, you should always assume that you are reading a binary file.

Also, if you want to know how many bytes you have read, check the return code of fread(). Don't use strlen() because strlen() only works for null-terminated string data.

Q:		How do I write a byte into a binary file?

A:		You can use `fwrite()`. You should always check the return code of `fwrite()` to see if it's the same as the number of bytes that you were writing. If not, may be the filesystem is full.

Q:		How do I write a 32-bit word into a binary file?

A:		You can use `fwrite()`. For example, to write a word into a binary file `(FILE*)fp`, you can do: uint32_t word; fwrite(&word, sizeof(uint32_t), 1, fp); You should check the return code and make sure that 4 bytes were written.

Q: How do I detect end-of-file when I read a file?

First of all, it's important to note that end-of-file (or EOF) is not a character, it's a condition on a file handle. The EOF condition is set when you attempt to read past the end of the corresponding file.

For example, you are reading one byte of data at a time from a file. When you just finished reading the last byte of a file, the EOF condition will not be set, yet. The next time you read from the file, read will return a failure code and the EOF condition will be set.

If the file handle you use is of the type (FILE*), you can use feof() to check if you have reached the end-of-file condition. But you don't have to call feof() to detect the end-of-file condition. If you use fread() to read data from the file handle, you should check the return code. I think if it returns anything less than or equal to zero, then you have reached the end-of-file. (It can also be a read error. But in either case, you should not read any more.)

If the file handle you use is of the type int (i.e., it's a file descriptor), then I'm not sure if there is a function you can use to check the end-of-file condition. If you use read() to read data from the file descriptor, you should check the return code. If it returns anything less than or equal to zero, then you have reached the end-of-file. (It can also be a read error. But in either case, you should not read any more.)

Setting & Testing Bits

Q:		How can I set a bit in a byte?

A:		Bit 0 of a byte is the least significant bit in a byte. To set a bit, you need to prepare a mask and perform a bitwise-OR of the mask with the input byte. To create a mask, you start with `0x01` and you left shift the right number of bits.

Q:		How can I clear a bit in a byte?

A:		Bit 0 of a byte is the least significant bit in a byte. To set a bit, you need to prepare a mask and perform a bitwise-AND of the bit complement of the mask with the input byte. To create a mask, you start with `0x01` and you left shift the right number of bits.

Q:		How do I check if a bit within a byte is set or not?

A:		Again, bit 0 of a byte is the least significant bit in a byte. You first create a mask by left-shifting `0x01` by the desired number of positions. Then you perform a bitwise-AND of the mask with the input byte. If the result is the same as the mask, then the corresponding bit is set. If the result is zero, then the corresponding bit is not set.

Parsing Data

Q:		How can I parse a `key=value` string?

A:		You can use `strchr()` to locate the `=` character. Let's say that you start with a pointer that points to the beginning of the buffer. Let's call this `key_ptr`. Then you call `strchr()` to locate the `=` character and use `value_ptr` to point to the `=` character. If `value_ptr` is not NULL, you can replace the `=` character with a null ('\0') character and advance `value_ptr`. Now `key_ptr` points to the key and `value_ptr` points to the value. It's probably a good idea to call `trim()` to remove leading and trailing whitespace characters in `key_ptr` and `value_ptr`.

Q: How can I remove leading and trailing whitespace characters from a string?

If your system has the trim() function, you can just call it. Otherwise, you can implement your own to remove leading and trailing whitespace characters from a string.

You first start from the last character of the buffer. If it's a whitespace character, you replace it with a null ('\0') character and keep moving toward the front. When you see the first non-whitespace character, you stop. At each step, you should check if you are at the beginning of the buffer just in case there are no non-whitespace characters in the string at all.

After you have removed all the trailing whitespace characters, you start from the beginning of the buffer and look for the first non-whitespace character. If this is the first character, then you are done. Otherwise, you copy the characters, one by one, from the current position to the beginning of the buffer until you have copied a null character.

Q: How do I parse commandline arguments?

Usually, UNIX comandline syntax looks like:

    program_name Arg1 Arg2 ... [optional stuff] ... Arg_N_minus_2 Arg_N_minus_1

All the information about the Args and the optinoal stuff are passed into your program in argv and argc. So, main() should be declared as:

    int main(int argc, char *argv[])
    {
        ...
        return 0;
    }

argc is the N mentioned in the commandline example above. So, your program should expect the following:

argv[1] maps to Arg1
argv[2] maps to Arg2
...
argv[argc-2] maps to Arg_N_minus_2
argv[argc-1] maps to Arg_N_minus_1

The ... part may contain additional required argments and optional commandline arguments. If you run into an optional commandline argument (i.e., argv[i][0] == '-'), you need to first determine which optional commandline argument it is (since they can come in any order) and then you need to get the value for that option, if applicable. To get the value part is easy; just access argv[++i] (well, you have to check if i is already argc-1).

Let's take the following as an example:

    program {fee|fie|foe} [-o offset] [-m] hostname:port string

You can write a commandline parser from scratch with something like the following:

    void Usage()
    {
        /* print out usage informaiton, i.e. commandline syntax */
        exit(1);
    }

    int ParseCommandLine(int argc, char *argv[])
    {
        argc--, argv++; /* skip the original argv[0] */
        if (argc <= 0) {
            /* ... print specific error */
            Usage();
        }
        if (strcmp(*argv, "fee") == 0) {
            msg_type = ADR_RQST;
        } else if (strcmp(*argv, "fie") == 0) {
            msg_type = FSZ_RQST;
        } else if (strcmp(*argv, "foe") == 0) {
            msg_type = GET_RQST;
        } else {
            /* ... print specific error */
            Usage();
        }
        for (argc--, argv++; argc > 0; argc--, argv++) {
            if (*argv[0] == '-') {
                if (strcmp(*argv, "-m") == 0) {
                   /* set a global flag */
                } else if (strcmp(*argv, "-o") == 0) {
                    argc--, argv++; /* move past "-o"
                    if (argc <= 0) {
                        /* ... print specific error */
                        Usage();
                    }
                    /* read offset from *argv */
                }
            } else {
                /*
                 * must be "hostname:port" followed by "string"
                 *
                 * the idea here is that you know that you
                 *     are looking for two things
                 * so, the first time you get here, you
                 *     should copy *argv into what holds
                 *     "hostname:port"
                 * the second time you get here, you
                 *     should copy *argv into what holds
                 *     "string"
                 */
            }
        }
    }

If you want to use the above structure, make sure you read it carefully and understand what's going on so you can adapt it to what you need. If it's not doing what you want, go into the debugger and find out why.

C++

Q:		How come there is nothing on C++ here?

A:		C is a proper subset of C++. Therefore, from C++, you can call any C library functions. C++ stream I/O functions can be confusing to use. If you are not 100% sure, you might as well just stick to the C subset. Once you are familiar with the C I/O functions, you can make them do exactly what you want and you can use them in either C or C++! Regarding bit/byte data manipulation, there is really not much difference between C and C++.

Q: I'm having trouble using the same function to read from either an ifstream or cin. I'm keep getting all kinds of compiler errors. Is there a way to do it?

Here is a sample C++ program (must be compiled with g++ version 4.2.1 or above) with the following synopsis:

    ./a.out [file]

A function called Process() is used to read lines (ASCII text assumed) from the input (either file or stdin) and simply print them out. If file is specified, an ifstream will be used to open the file and pass the ifstream to Process(). Otherwise, cin will be passed to Process(). The trick is to use istream as the formal parameter to Process(). This works because istream is a common base class for both ifstream and cin. You cannot do pass-by-value because it will make a copy of the ifstream or cin. So, you must either do pass-by-pointer or pass-by-reference. Pass-by-reference is used here because that's how we are suppose to use C++. By the way, this is just some sample code and not meant to be complete. You should do more error checking.

    #include <iostream>
    #include <fstream>
    #include <string>
 
    using namespace std;
 
    static
    void Process(istream& in)
    {
        string buf;
        getline(in, buf);
        while (!in.eof()) {
            cout << buf << endl;
            getline(in, buf);
        }
    }
 
    int main(int argc, char *argv[])
    {
        bool reading_from_file=false;
 
        if (argc > 1) {
            reading_from_file = true;
        }
        ifstream in;
 
        if (reading_from_file) {
            in.open(argv[1]);
            if (in.fail()) {
                cerr << "Cannot open " << argv[1] << " for reading." << endl;
                return 0;
            }
            Process(in);
            in.close();
        } else {
            Process(cin);
        }
        return 0;
    }

Q: You gave the code for C++, how about some C code for doing the same thing?

Alright then. (Again, this is just some sample code and not meant to be complete. You should do more error checking.)

    #include "stdio.h"
 
    static
    void Process(FILE *fp)
    {
        char buf[1024];

        while(fgets(buf, sizeof(buf), fp) != NULL) {
            /* buf may contain '\n' */
            printf("%s", buf);
        }
    }
 
    int main(int argc, char *argv[])
    {
        int reading_from_file=0;
 
        if (argc > 1) {
            reading_from_file = 1;
        }
        FILE *fp=NULL;
 
        if (reading_from_file) {
            fp = fopen(argv[1], "r");
            if (fp == NULL) {
                fprintf(stderr, "Cannot open %s for reading.\n", argv[1]);
                return 0;
            }
            Process(fp);
            fclose(fp);
        } else {
            Process(stdin);
        }
        return 0;
    }

stdin

Q: Does stdin mean reading from the keyboard?

Technically, stdin simply means "file descriptor 0". Usually, you launch a program from your Unix login shell. Then it's the job of the login shell to determine what stdin is mapped to. By default, your login shell maps stdin to keyboard input. But this behavior can be changed, and this is called I/O redirection. Let's say that your program is prog and you want the content of /tmp/xyz.txt to appear as your stdin, you can do:

    prog < /tmp/xyz.txt

This is called I/O redirection and it's a trick performed by your login shell. Another way to do the same thing is to use the cat program and the Unix pipe:

    cat /tmp/xyz.txt | prog

The vertical bar character pipes stdout (i.e., "file descriptor 1") of the program to the left of it (in this case, it would be the content of /tmp/xyz.txt) into stdin of the program to the right of it.

Please note that prog will read exactly the same data in both commands if prog is programmed correctly! Basically, if you are using a Unix pipe, functions like rewind(), fseek(), and lseek() do not guarantee to work.

Q: What is the difference between using file descriptors and (FILE*)?

A file descriptor is an integers. It's an index for a file descriptor table that the kernel maintains for the corresponding user process. To use a file descriptor, you need to make system calls (such as open(), read(), write(), and close()) and these system calls are not too "application programmer friendly".

A file pointer, which is of the type (FILE*), is a "wrapper" around a file descriptor. Lots of functions are built around file pointers to make it easier to deal with files.

When a user process is started, file descriptors 0, 1, and 2 are opened by default. File descriptor 0 is associated with the keyboard, and file descriptors 1 and 2 are associated with the display. The corresponding file pointers are also setup by default. The file pointers that's associated with file descriptor 0, 1, 2 are known as stdin, stdout, and stderr, respectively.

Similarly, in C++, cin, cout, and cerr are wrappers around file descriptors 0, 1, and 2, respectively.

Q:		How do I terminate the input stream if I'm reading from `stdin`?

A:		At the beginning of a line, type <Cntrl+D> on your keyboard (i.e., hold down the <Control> key and press the D key on your keyboard). It is important to understand that <Cntrl+D> signifies the end-of-input. It does not get turned into a character in the input stream for your program to detect and process. Therefore, your program needs to "detect" end-of-file condition and not "look for the <Cntrl+D> character".

gdb

Q: The script in the grading guidelines pipes the output of the cat commend to my program. Under gdb, it doesn't work. How can I debug this?

As mentioned above,

    cat ~csci570b/xyz/file | prog

is the same as:

    prog < ~csci570b/xyz/file

Therefore, in gdb, you can just do:

    run < ~csci570b/xyz/file

But, as mentioned above, they are only equivalent if your program reads the input correctly.

As a last resort, please check out this solution.

Q: How do I set a conditional breakpoint?

Just append an if expression to a regular breakpoint command. For example, if you would normally do:

    break foo.c:123

You can have gdb only break at line 123 of foo.c if x (any variable visible at line 123 of foo.c) is equal to 98765 by doing the following:

    break foo.c:123 if (x == 98765)

Pretty much any if expression would also work. For example, if there is a C-string called s that's visible at line 123 of foo.c, you can do something like:

    break foo.c:123 if (strcmp(s,"abc")==0)

But please note that you need to make sure that s can never be NULL at the time you specify the above conditional breakpoint. Otherwise, when gdb tries to evaluate strcmp(s,"abc"), your program will crash because it is dereferencing a NULL pointer.

UNIX

Q:		What's a good Unix tutorial?

A:		If you are not familiar with Unix, please read Unix for the Beginning Mage, a tutorial written by Joe Topjian.

Q: How do I read a csh/tcsh script (in the grading guidelines)?

For example, if you see the following in the grading guidelines:

        set srcdir=~csci531/public/cs531/hw1
        /bin/rm -f f?.hex f??.hex
        foreach f (0 1 2 3 4 5 6 7 8)
            echo "===> $srcdir/f$f"
            ./hw1 hexdump $srcdir/f$f > f$f.hex
            diff $srcdir/f$f.hex f$f.hex
        end

These are commands you can type into a csh/tcsh login shell terminal. The login shell reads the data you type into it line by line. The first thing in a line is a command (such as "set", "foreach", "echo", and "end" above) for the login shell. If it's not a command that the login shell understands, then it is treated as a program that you want the login shell to run (such as "/bin/rm", "./hw1", and "diff" above). Please do "man rm" and "man diff" to see what these programs do.

In each iteration of the "foreach" loop, $f gets set to one of the items in the specified list. In this case, the first iteration throught the "foreach" loop, $f will evaluate to "0". In the 2nd iteration, $f will evaluate to "1", and so on.

The "echo" command just prints the string given to the "echo" command followed by a '\n' (i.e., linefeed) character.

Q:		I write my code on my laptop on a PC, how do I get all my files to `nunki.usc.edu`?

A:		(Thanks to Alhad Rajurwar for providing some of the information here.) There is a bunch of software you can download for free at ITS. Some of them can be used for this. Here are the steps: Goto http://software.usc.edu/, select your OS and download and install FileZilla and PuTTy. Transfer files from your PC to `nunki.usc.edu`. Read the online tutorial for FileZilla. Start FileZilla: enter `nunki.usc.edu` as host, your USC user name and password and 22 as port and click on Quickconnect. If this is the first time you connect to `nunki` using FileZilla, you will get a popup box asking if you want to trust this host, you should click Yes and check the "always trust this host, add this key to cache" checkbox. For additional features of FileZilla, please read the FileZilla Quick Guide. Transfer files by dragging and dropping. Please note that FileZilla has its quirks, you need to know what they are to have the files between your PC and `nunki` stay in sync. Ssh to `nunki.usc.edu`. Start putty: enter `nunki.usc.edu` as hostname, 22 as port, and click on Open. If this is the first time you use putty to connect to `nunki`, you will get a popup box asking if you want to trust this host, you should click on yes (and the host key will be cached). Enter your user ID and press the <Enter> key on your keyword. Enter your password and press the <Enter> key on your keyword. Read the output on the screen. You are logged into `nunki` now. The program you are interacting with on `nunki` is called a UNIX Shell (or a Login Shell). Your default login shell is usually tcsh. The Login Shell has a commandline interface. It shows you a command prompt to indicate that it's ready to accept a command from you. Your "current working directory" is your "home directory" by default. To see the content of your current working directory, you can use the "ls" command. To change your current working directory, you can use the "cd" command. If the leading characters of a commandline argument is the "~/", it is interpreted as a reference to your home directory. If your programming assignment is in ~/cs###/hw1, you can use the following commands to change your current working directory to it, run "make", and run your "hw1" executable: cd ~/cs###/hw1 make hw1 ./hw1 If you created your Makefile on your PC, "make" will probably fail because of the extra <CR> character added at the end of every line. To remove them, you can do: mv Makefile Makefile.dos dos2unix Makefile.dos > Makefile If the programming you are running is an X11-based program, you need to run an X11 server on your PC. You need to download and install X-Win32 from ITS Software.

Q: How do I reset my login scripts on nunki.usc.edu?

When you login to nunki.usc.edu, if things don't look right or if basic commands such as gcc won't work, it's probably because your login script is all messed up. Fortunately, it's quite easy to reset them. The following procedure is for the case where your login shell is csh/tcsh. If your login shell is sh/bash, then you are a Unix expert and you should know how to convert the steps below for your environment.

First, make a backup of your current login scripts. I would use a date to remember when I performed these operations. Assume that today's date is "012914". You should use the actual date when you run these commands.

    cd ~
    cp .login .login.012914
    cp .cshrc .cshrc.012914

Then copy the "skeleton" login scripts into your home directory:

    cd ~
    rm .login
    rm .cshrc
    cp /usr/usc/skel/.login .login
    cp /usr/usc/skel/.cshrc .cshrc

When you logout and log back in, you will be running the "skeleton" login scripts. Any changes you have made to the original login scripts will not be applied. But you can look for them in the backup copies. To see the differences between your new login file and your previous login file, do:

    cd ~
    diff -C 2 .login .login.012914
    diff -C 2 .cshrc .cshrc.012914

Q: What should I do if even after I reset my login scripts on nunki.usc.edu, I still cannot run gcc?

This really shouldn't happen! So, plase try resetting your login scripts again, logout and log back in and type gcc to make sure that it's still not working.

If all else failed, run the following commands (please copy and paste from this web page and DON'T enter the command manually):

    echo 'source /usr/usc/gnu/gcc/default/setup.csh' >> ~/.login

Then logout and log back in. And if this still won't work, it means that something is really messed up in your nunki account! Then do the following:

    echo 'set path=(/usr/usc/bin $path)' >> ~/.login

Then logout and log back in.