USC CSD Home


CS 530 - Homework #4

OpenSSL and Cryptographic Hash Functions

 
Assignment
This assignment is to get familiar with the OpenSSL package, some popular cryptographic hash functions, and a modified RC4 stream cipher. Your code should link to the openssl library to perform hashing functions.

You will also need to implement extensions to these hash functions through specific External Message Expansion algorithms described in [Cheng08b].

Electronic submissions only.

 
Compiling
Please use a Makefile so that when the grader simply enters:
    make hw4
an executable named hw4 is created. (Minor variation on the make command is allowed.)

To use openssl on nunki.usc.edu, please see additional notes on openssl.

 
Commandline Syntax & Program Output
The commandline syntax for hw4 is as follows:
    hw4 arc4 {-states|-l len} [-s offset] [file]
    hw4 sa [-s offset] [file]
    hw4 xsa [-s offset] [file]
    hw4 md5 [{-sa|-xsa}] [-s offset] [file]
    hw4 sha1 [{-sa|-xsa}] [-s offset] [file]
Square bracketed items are optional. You must follow the UNIX convention that optional commandline options can come in any order. If file is not specified, your program should read from stdin. Unless otherwise specified, output of your program must go to stdout and error messages must go to stderr.

The meaning of the commands are:

arc4   :   Simulate the modified ARC4 (RC4) stream cipher.

If the -states commandline option is specified, please output the 256 internal states sequentially at the end of the key scheduling algorithm. If the -l commandline option is specified, len is the number of bytes to output after key scheduling algorithm has run to its completion. The file commandline argument (or data from stdin) contains the key for the key scheduling algorithm. If the length of the input data (file or stdin) is greater than 256 bytes, only the first 256 bytes should be used as the key.

 
sa   :   Expand the input data (file or stdin) using SA-strengthening.
 
xsa   :   Expand the input data (file or stdin) using XSA-strengthening.
 
md5   :   Produce a MD5 checksum of file (or stdin).

If the -sa or -xsa commandline option is specified, please apply SA-strengthening or XSA-strengthening, respectively, to expand the input data before hashing with MD5.

 
sha1   :   Produce a SHA-1 hash of file (or stdin).

If the -sa or -xsa commandline option is specified, please apply SA-strengthening or XSA-strengthening, respectively, to expand the input data before hashing with SHA-1.

For all commands, if the -s commandline is specified you should skip offset number of bytes from the beginning of the input data (file or stdin).

The output for various commands are as follows.

arc4   :   If the -states commandline option is specified, the output is a binary stream of bytes of length 256. If the -l commandline option is specified, the output is a binary stream of bytes of length len.
 
sa   :   The output is a binary stream of bytes which is expanded from the input data (file or stdin) using SA-strengthening.
 
xsa   :   The output is a binary stream of bytes which is expanded from the input data (file or stdin) using XSA-strengthening.
 
md5   :   The output should be a hexstring of length 32. The left most 2 character should correspond to byte 0 of the output buffer. Although formatting is different, you can check your result against the result of running the following at a command prompt:
    openssl md5 file
 
sha1   :   The output should be a hexstring of length 40. The left most 2 character should correspond to byte 0 of the output buffer. Although formatting is different, you can check your result against the result of running the following at a command prompt:
    openssl sha1 file
Pleaes output reasonable and useful error messages if the command is malformed or file does not exist, inaccessible, or corrupted. (You should do better than openssl.)
 
External Message Expansion
External Message Expansion is a deterministic way of modifying an input data stream by inserting data blocks into the data stream using a specified algorithm. When used with a cryptographic hash function, an input message is first expanded before feeding it into the hash function. The idea is that the actual message that got fed into the hash function has more structure, and therefore, it may be more difficult to launch a collision attack on the hash function.

Below we describe two specific algorithms described in [Cheng08b]. Please note that the description below is not meant to be complete. You should read [Cheng08b] for details.

SA-strengthening

Section II-A of [Cheng08b] described a way to derive a so-called ARC4 hash out of the well-known ARC4 (RC4) stream cipher. Using the notation in [Cheng08b], we summarize the algorithm here.
If length of the input message is strickly less than 256 bytes:
    ( 1)  for i from 0 to 255
    ( 2a)     S[i] := sbox[i]
    ( 3)  j := 0
    ( 4)  for i from 0 to 255 do
    ( 5a)     j := (j + S[i] + input[i mod n]) mod 256
    ( 6)      swap(S[i],S[j])
    ( 7)  end for
If length of the input message is greater or equal to 256 bytes:
    ( 1)  for i from 0 to 255
    ( 2a)     S[i] := sbox[i]
    ( 3)  j := 0
    ( 4a) for x from 0 to n-1 do
    ( 4b)     i := x mod 256
    ( 5b)     j := (j + S[i] + input[x]) mod 256
    ( 6)      swap(S[i],S[j])
    ( 7)  end for
Section II-B of [Cheng08b] described a way to shrink an array of data bytes which can be applied to the ARC4 hash.

Section II-C of [Cheng08b] defined SA-strengthening for a cryptographic hash function h() for an input message m as follows:

    h(msr(m)||sah(m))
where msr(m) is a message derived from m that satisfies the message self-repeat requirement and sah(m) is the Shrunken ARC4 Hash of m. For the sa command, you basically need to output msr(m)||sah(m). For the md5 and sha1 command, you basically need to output h(msr(m)||sah(m)) where h() is either the MD5 or SHA1 hash algorithm.

XSA-strengthening

Section III of [Cheng08b] described further enhancement to SA-strengthening. For every 256 bytes of the original input message, 8 or fewer noise bytes (4 bytes on the average) are inserted into the data stream, based on the state of the modified ARC4 cipher. The noise bytes are obtained by using a modified ARC4 output algorithm to "clock out" 16 bytes as follows (with r being 16):
    (10a) for i from 0 to (r-1) do
    (11)      u := (u + 1) mod 256
    (12)      v := (v + S[u]) mod 256
    (13)      swap(S[u],S[v]) 
    (14)      output S[(S[u] + S[v]) mod 256]
    (15a) end for
For the first 256 bytes of the input data, additional noise bytes are generated and shrinked and inserted into the input stream. According to section III-A of [Cheng08b], these additional noise bytes are generated as follows.

For k=1 through 8, after 2k bytes of the input data has been fed to the ARC4 key scheduling algorithm, you should run the ARC4 output algorithm 2*k times to generate 2*k noise bytes. Then you apply the shrink() operator to shrink the noise bytes and you insert the shrunken noise bytes into the expanded input.

After a total of 256 bytes of the input data has been fed to the ARC4 key scheduling algorithm, for every 256 bytes of input, you clock out 16 noise bytes and add the shrunken version of these bytes to the expanded input. When all data is processed, you append the SA hash.

 
Sample Output
Please note that the hexstring output are formatted for readability. Extra space characters and linebreaks have been inserted.

hw4 arc4 -states usctommy.gif

The first 256 bytes of usctommy.gif in hexstring representation is:
    4749463839614b006e00840000bf6060
    f9efef9f1010b95050990000dba5a5f2
    dfdfb34040d28f8feccfcfcc7f7fffba
    10c67070ac3030a62020ffffffe5bfbf
    00000000000000000000000000000000
    00000000000000000000000000000000
    0000000000000000000000000021f904
    0100000f002c000000004b006e000005
    fee0238ee3629e68aaae6ceba2644cbe
    746ddfb278127cefffc0a07048f4ad64
    c5a472c9848d98d0a8b4777a4eaf5861
    f591ed7a7926d1777cad0673e8b47acd
    0eee800bb67c4e27b917f0787dcfcfe5
    f37d81817f3f6182877484467a883109
    0507260e2907000501738a548c8d0105
    0337070c106b9a609c820101028f073c
The above is used as the key array in the key scheduling algorithm.

The output of the above command is posted here. Its hexstring representation is:

    803ce3b89df66f1204bf762a3fb51802
    bdb27dae669775a63bdde58cc517ad45
    5435851ba0e84f552f247e0fd2de4007
    ef5efb90e0d79ba584e2cf92380ecb29
    a7c7674e4781fe3df4824c6c469956d4
    1eedd1507f58a320897af28dbb2839d8
    611a6aa96ee6ba44cae4035165a8ee15
    8facdad3f82c1f70c80a19ceb774103e
    fdebd0ea8326064bbe4336b3df49a2db
    6933c3e9e1f0d9f959629e8e37413071
    9172f348c2776bafd61d2d7896b1520d
    93ab95cc3af19409319ac1e74d5b4208
    98a1d5c46d0bfc23c97c57888a872e14
    53bc340186aa7b32c673211cb911b0b6
    a40c684af7602b79052264255a2763ec
    cd9fff139c5ddcfa165c00c08bb45ff5
For your debugging pleasure, the internal state of RC4 at the end of each iteration of the key scheduling algorithm is provided.
hw4 arc4 -l 16 usctommy.gif
The output of the above command is posted here. Its hexstring representation is:
    f82f64f05c68a5cdb275fdbec5a05b9e
hw4 sa null.bin
hw4 xsa null.bin
For an empty input file, its shrunken ARC4 hash is posted here. It is 60 bytes long and its hexstring representation is:
    6bc501d7d4ccf1d8c7121a6ea0b384d1ed4ceffbf9a39db6ffd2ec97175ddc2a
    88db065c9579c8ea7a78a6c6dd1fbd8a66030eb911d99455df0d5416
hw4 sa testvec.bin
testvec.bin is a 16-byte long binary file with the following hexstring representation:
    000102030405060708090a0b0c0d0e0f
The output of the above command is posted here. Its hexstring representation is:
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    3f89e5827ccf1fcaf65acbad53914521
    5d05f078b133bd206039ea7554c51a3e
    d30b70a7307b809641460f37a6edac35
    6ab62b6336717215119b657d
hw4 xsa testvec.bin
The output of the above command is posted here. Its hexstring representation is (with lots of space and linebreak added):
    0001
    0203  e0d5
    04050607  03
    08090a0b0c0d0e0f  fa643697
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f  0c
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f  d83eef6a
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f
    000102030405060708090a0b0c0d0e0f  e254
    6608deeb510ca3af7c7912a4435b5e95564154307ed2b88a
    24bcfbcc5990c08211672e196c3e01ff776a87982abbd337
    5f4c4ecdbd768063736de948b538ba
Below are the noise bytes and their corresponding shrunken noise bytes (which are already listed above) for various values of k (please see section III-A of the paper for details about what k is):
    k    noise bytes                         shrunken noise bytes
    -------------------------------------------------------------
    1    35e1                                (none)
    2    08e0cad5                            e0d5
    3    c46793162903                        03
    4    bdfa2b641436a697                    fa643697
    5    6d58cf40933db4f54402                (none)
    6    7ef3a62a00e3693eea0c2913            0c
    7    d84e98d8333e4a09ebb394efde6a        d83eef6a
    8    bc444d8a39d98ee735b301e220727a54    e254
hw4 md5 -sa testvec.bin
The hexstring representation of output of the above command is:
    470debadfd0a26212dc806939b79b558
hw4 md5 -xsa testvec.bin
The hexstring representation of output of the above command is:
    a08d2d01a8a9e6e46fc8709283846b83
hw4 sha1 -sa testvec.bin
The hexstring representation of output of the above command is:
    bb3e476e73abbbb4834b15d0ecfd814b9ba67c2a
hw4 sha1 -xsa testvec.bin
The hexstring representation of output of the above command is:
    c120c4b6bb057558f474c07c9fbd1566e1908f46
 
Grading Guidelines
The grading guidelines has been made available. Please run the scripts in the guidelines on nunki.usc.edu. It is possible that there are bugs in the guidelines. If you find bugs, please let the instructor know as soon as possible.

Please note that although the grader will follow the grading guidelines to grade, the grader may use a different set of data files.

 
Miscellaneous Requirements and Hints
  • You must call MD5 and SHA-1 related functions in the OpenSSL library directly. You will lose a lot of points if you invoke the openssl program using popen(), fork your process and call one of the exec system calls, or call system(). Also, you will not receive credit if you use public source code that implements these functions.

  • Good hash function implementation should be one-pass where the input data is basically only looked at once and a large buffer is not used. A file is a form of a large external buffer. Therefore, you should not use an external file to store input data and read the data back from the file.

  • If the size of the input file is large, you must not read the whole file into a large memory buffer and then process the file data. The maximum size of a memory buffer is limited to 4,096 bytes.

  • It's important that every byte of your data is read and written correctly. You will lose a lot of points if one byte of data is generated incorrectly! The grading of this assignment will be harsh.

  • Please follow the UNIX convention that, when your output is an ASCII file (such as the output of the md5 and sha1 commands), append '\n' in the last line of the output if it's not a blank line. (This way, you don't get the commandline prompt appearing at the wrong place on the screen.)

  • String I/O functions such as fgets(), scanf(), and printf() are really meant for inputing/outputing strings. Do not use them to input/output binary data!

  • The Solaris workstations in the ISD lab in SAL have the same setup as nunki.usc.edu. So, if you are logged on to one of these workstations, please do your development locally and not to overload nunki unnecessarily.

  • Start working on this early! Please don't complain to the instructor that this assignment is too tedious or it takes too much work just to parse the commandline. Get it done early and get it done right!

  • There is quite a lot of information on the OpenSSL documentation site. Please do not ask the TA or the instructor what these functions do and where to put them. Read the documentation. Search the web. Try things out! (If you are really stuck after you have tried quite a few different things, then you may seek help from the TA or the instructor.)

    If you have trouble finding the documentation on some openssl functions, you can look for their man pages on nunki.usc.edu after you have setup the MANPATH environment variable to include openssl man pages. For example, you can do:

        man MD5
        man MD5_Init
        man SHA1
        man SHA1_Update
 

   [Please see copyright regarding copying.]