|
print $semester ?>
|
print $courseid ?>
|
CS 530 - Homework #2
(100 points total)
OpenSSL and Cryptographic Hash Functions
|
|
Assignment
|
This assignment is to get familiar with the OpenSSL package,
some popular cryptographic hash functions, and a modified RC4
stream cipher.
Your code should link to the openssl library to perform
hashing functions.
You will also need to implement extensions to these hash functions
through specific External Message Expansion algorithms described in
[Cheng08b].
Electronic submissions only.
|
|
Compiling
|
Please use a Makefile so that when the grader simply enters:
make hw2
an executable named hw2 is created.
(Minor variation on the make command is allowed.)
To use openssl on nunki.usc.edu, please
see additional notes on openssl.
|
|
Commandline Syntax & Program
Output
|
The commandline syntax for hw2 is as follows:
hw2 arc4 {-states|-l len} [-s offset] [file]
hw2 sa [-s offset] [file]
hw2 xsa [-s offset] [file]
hw2 md5 [{-sa|-xsa}] [-s offset] [file]
hw2 sha1 [{-sa|-xsa}] [-s offset] [file]
Square bracketed items are optional.
You must follow the UNIX convention that optional commandline options
can come in any order.
If file is not specified, your program should
read from stdin.
Unless otherwise specified, output of your program must go to stdout
and error messages must go to stderr.
The meaning of the commands are:
arc4 |
: |
Simulate the
modified ARC4 (RC4)
stream cipher.
If the -states commandline option is specified,
please output the 256 internal states sequentially at the end of the
key scheduling algorithm.
If the -l commandline option is specified,
len is the number of bytes to output after
key scheduling algorithm has run to its completion.
The file commandline argument
(or data from stdin) contains the key for the
key scheduling algorithm.
If the length of the input data
(file or stdin) is greater than 256 bytes,
only the first 256 bytes should be used as the key.
|
|
sa |
: |
Expand the input data (file or stdin) using
SA-strengthening.
|
|
xsa |
: |
Expand the input data (file or stdin) using
XSA-strengthening.
|
|
md5 |
: |
Produce a
MD5
checksum of file
(or stdin).
If the -sa or -xsa commandline option is
specified, please apply SA-strengthening
or XSA-strengthening, respectively, to
expand the input data before hashing with MD5.
|
|
sha1 |
: |
Produce a
SHA-1
hash of file
(or stdin).
If the -sa or -xsa commandline option is
specified, please apply SA-strengthening
or XSA-strengthening, respectively, to
expand the input data before hashing with SHA-1.
|
For all commands, if the -s commandline is specified
you should skip offset number of bytes from the
beginning of the input data (file or stdin).
The output for various commands are as follows.
arc4 |
: |
If the -states commandline option is specified,
the output is a binary stream of bytes of length 256.
If the -l commandline option is specified,
the output is a binary stream of bytes of length
len.
|
|
sa |
: |
The output is a binary stream of bytes which
is expanded from the input data (file or
stdin) using SA-strengthening.
|
|
xsa |
: |
The output is a binary stream of bytes which
is expanded from the input data (file or
stdin) using XSA-strengthening.
|
|
md5 |
: |
The output should be a hexstring of length 32.
The left most 2 character should correspond to byte 0
of the output buffer. Although formatting is different,
you can check your result against the result of running
the following at a command prompt:
openssl md5 file
|
|
sha1 |
: |
The output should be a hexstring of length 40.
The left most 2 character should correspond to byte 0
of the output buffer. Although formatting is different,
you can check your result against the result of running
the following at a command prompt:
openssl sha1 file
|
Pleaes output reasonable and useful error messages if the command
is malformed or file does not exist, inaccessible, or corrupted.
(You should do better than openssl.)
|
|
External Message Expansion
|
External Message Expansion is a deterministic way of modifying
an input data stream
by inserting data blocks into the data stream using a specified algorithm.
When used with a cryptographic hash function, an input message is
first expanded before feeding it into the hash function. The idea is
that the actual message that got fed into the hash function has more
structure, and therefore, it may be more difficult to launch a collision
attack on the hash function.
Below we describe two specific algorithms described in
[Cheng08b].
Please note that the description below is not meant to be complete.
You should read [Cheng08b] for
details.
SA-strengthening
Section II-A of [Cheng08b]
described a way to derive a so-called ARC4 hash out of the well-known
ARC4 (RC4) stream cipher.
Using the notation in [Cheng08b],
we summarize the algorithm here.
If length of the input message is strickly less than 256 bytes:
( 1) for i from 0 to 255
( 2a) S[i] := sbox[i]
( 3) j := 0
( 4) for i from 0 to 255 do
( 5a) j := (j + S[i] + input[i mod n]) mod 256
( 6) swap(S[i],S[j])
( 7) end for
If length of the input message is greater or equal to 256 bytes:
( 1) for i from 0 to 255
( 2a) S[i] := sbox[i]
( 3) j := 0
( 4a) for x from 0 to n-1 do
( 4b) i := x mod 256
( 5b) j := (j + S[i] + input[x]) mod 256
( 6) swap(S[i],S[j])
( 7) end for
Section II-B of [Cheng08b]
described a way to shrink an array of data bytes which can be
applied to the ARC4 hash.
Section II-C of [Cheng08b]
defined SA-strengthening for a cryptographic hash function h()
for an input message m as follows:
h(msr(m)||sah(m))
where msr(m) is a message derived from m that satisfies the
message self-repeat requirement and sah(m) is the
Shrunken ARC4 Hash of m.
For the sa command, you basically need to output
msr(m)||sah(m).
For the md5 and sha1 command, you basically need to output
h(msr(m)||sah(m)) where h() is either the MD5 or SHA1
hash algorithm.
XSA-strengthening
Section III of [Cheng08b]
described further enhancement to SA-strengthening.
For every 256 bytes of the original input message, 8 or fewer
noise bytes (4 bytes on the average) are inserted into
the data stream, based on the state of the modified ARC4 cipher.
The noise bytes are obtained by using a modified ARC4 output algorithm
to "clock out" 16 bytes as follows (with r being 16):
(10a) for i from 0 to (r-1) do
(11) u := (u + 1) mod 256
(12) v := (v + S[u]) mod 256
(13) swap(S[u],S[v])
(14) output S[(S[u] + S[v]) mod 256]
(15a) end for
For the first 256 bytes of the input data, additional noise bytes
are generated and shrinked and inserted into the input stream.
According to section III-A of
[Cheng08b], these additional
noise bytes are generated as follows.
For k=1 through 8, after 2k bytes of
the input data has been fed to the ARC4 key scheduling algorithm,
you should run the ARC4 output algorithm 2*k times to generate
2*k noise bytes. Then you apply the shrink() operator
to shrink the noise bytes and you insert the shrunken noise bytes
into the expanded input.
After a total of 256 bytes of the input data has been fed to the ARC4
key scheduling algorithm, for every
256 bytes of input, you clock out 16 noise bytes and add the
shrunken version of these bytes to the expanded input.
When all data is processed, you append the SA hash.
|
|
Sample Output
|
Please note that the hexstring output are formatted for
readability. Extra space characters and linebreaks have
been inserted.
hw2 arc4 -states usctommy.gif
The first 256 bytes of usctommy.gif
in hexstring representation is:
4749463839614b006e00840000bf6060
f9efef9f1010b95050990000dba5a5f2
dfdfb34040d28f8feccfcfcc7f7fffba
10c67070ac3030a62020ffffffe5bfbf
00000000000000000000000000000000
00000000000000000000000000000000
0000000000000000000000000021f904
0100000f002c000000004b006e000005
fee0238ee3629e68aaae6ceba2644cbe
746ddfb278127cefffc0a07048f4ad64
c5a472c9848d98d0a8b4777a4eaf5861
f591ed7a7926d1777cad0673e8b47acd
0eee800bb67c4e27b917f0787dcfcfe5
f37d81817f3f6182877484467a883109
0507260e2907000501738a548c8d0105
0337070c106b9a609c820101028f073c
The above is used as the key array in the
key scheduling algorithm.
The output of the above command is posted here.
Its hexstring representation is:
803ce3b89df66f1204bf762a3fb51802
bdb27dae669775a63bdde58cc517ad45
5435851ba0e84f552f247e0fd2de4007
ef5efb90e0d79ba584e2cf92380ecb29
a7c7674e4781fe3df4824c6c469956d4
1eedd1507f58a320897af28dbb2839d8
611a6aa96ee6ba44cae4035165a8ee15
8facdad3f82c1f70c80a19ceb774103e
fdebd0ea8326064bbe4336b3df49a2db
6933c3e9e1f0d9f959629e8e37413071
9172f348c2776bafd61d2d7896b1520d
93ab95cc3af19409319ac1e74d5b4208
98a1d5c46d0bfc23c97c57888a872e14
53bc340186aa7b32c673211cb911b0b6
a40c684af7602b79052264255a2763ec
cd9fff139c5ddcfa165c00c08bb45ff5
For your debugging pleasure, the
internal state of RC4 at the end of each iteration
of the key scheduling algorithm
is provided.
hw2 arc4 -l 16 usctommy.gif
The output of the above command is posted here.
Its hexstring representation is:
f82f64f05c68a5cdb275fdbec5a05b9e
hw2 sa null.bin
hw2 xsa null.bin
For an empty input file, its shrunken ARC4 hash
is posted here.
It is 60 bytes long and its hexstring representation is:
6bc501d7d4ccf1d8c7121a6ea0b384d1ed4ceffbf9a39db6ffd2ec97175ddc2a
88db065c9579c8ea7a78a6c6dd1fbd8a66030eb911d99455df0d5416
hw2 sa testvec.bin
testvec.bin is a 16-byte long binary file
with the following hexstring representation:
000102030405060708090a0b0c0d0e0f
The output of the above command
is posted here.
Its hexstring representation is:
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
3f89e5827ccf1fcaf65acbad53914521
5d05f078b133bd206039ea7554c51a3e
d30b70a7307b809641460f37a6edac35
6ab62b6336717215119b657d
hw2 xsa testvec.bin
The output of the above command
is posted here.
Its hexstring representation is (with lots of
space and linebreak added):
0001
0203 e0d5
04050607 03
08090a0b0c0d0e0f fa643697
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f 0c
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f d83eef6a
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f
000102030405060708090a0b0c0d0e0f e254
6608deeb510ca3af7c7912a4435b5e95564154307ed2b88a
24bcfbcc5990c08211672e196c3e01ff776a87982abbd337
5f4c4ecdbd768063736de948b538ba
Below are the noise bytes and their corresponding shrunken noise
bytes (which are already listed above) for various values of
k (please see section III-A of the paper for details about
what k is):
k noise bytes shrunken noise bytes
-------------------------------------------------------------
1 35e1 (none)
2 08e0cad5 e0d5
3 c46793162903 03
4 bdfa2b641436a697 fa643697
5 6d58cf40933db4f54402 (none)
6 7ef3a62a00e3693eea0c2913 0c
7 d84e98d8333e4a09ebb394efde6a d83eef6a
8 bc444d8a39d98ee735b301e220727a54 e254
hw2 md5 -sa testvec.bin
The hexstring representation of output of the above command is:
470debadfd0a26212dc806939b79b558
hw2 md5 -xsa testvec.bin
The hexstring representation of output of the above command is:
a08d2d01a8a9e6e46fc8709283846b83
hw2 sha1 -sa testvec.bin
The hexstring representation of output of the above command is:
bb3e476e73abbbb4834b15d0ecfd814b9ba67c2a
hw2 sha1 -xsa testvec.bin
The hexstring representation of output of the above command is:
c120c4b6bb057558f474c07c9fbd1566e1908f46
|
|
Grading Guidelines
|
The grading guidelines has been made available.
Please run the scripts in the guidelines on nunki.usc.edu.
It is possible that there are bugs in the guidelines. If you find
bugs, please let the instructor know as soon as possible.
Please note that although the grader will follow the grading guidelines
to grade, the grader may use a different set of data files.
|
|
Miscellaneous Requirements and Hints
|
- You must call MD5 and SHA-1
related functions in the OpenSSL library
directly. You will lose a lot of points if you invoke the
openssl program using popen(), fork your process and
call one of the exec system calls, or call system().
Also, you will not receive credit if you use public source code
that implements these functions.
- Good hash function implementation should be
one-pass where the input data is basically only looked at once
and a large buffer is not used. A file
is a form of a large external buffer.
Therefore, you should not use an external file to store
input data and read the data back from the file.
- If the size of the input file is large, you must not read the
whole file into a large memory buffer and then process the file data.
The maximum size of a memory buffer is limited to
4,096 bytes.
- It's important that every byte of your data is read and written
correctly. You will lose a lot of points if one byte of data is
generated incorrectly! The grading of this assignment will be harsh.
- Please follow the UNIX convention that, when your output is an ASCII file
(such as the output of the md5 and
sha1 commands), append '\n' in the last line of the output
if it's not a blank line. (This way, you don't get the commandline
prompt appearing at the wrong place on the screen.)
- String I/O functions such as fgets(), scanf(), and
printf() are really meant for inputing/outputing strings.
Do not use them to input/output binary data!
- The Solaris workstations in the ISD lab in SAL have the same
setup as nunki.usc.edu. So, if you are logged on to one of
these workstations, please do your development locally and
not to overload nunki unnecessarily.
- Start working on this early! Please don't complain
to the instructor that this assignment is too tedious or
it takes too much work just to parse the commandline.
Get it done early and get it done right!
- There is quite a lot of information on the
OpenSSL documentation site.
Please do not ask the TA or the instructor what these functions
do and where to put them. Read the documentation. Search the web.
Try things out! (If you are really stuck after you have
tried quite a few different things, then you may seek help from the
TA or the instructor.)
If you have trouble finding the documentation on some openssl
functions, you can look for their man pages on nunki.usc.edu
after you have setup the
MANPATH
environment variable to include openssl man pages.
For example, you can do:
man MD5
man MD5_Init
man SHA1
man SHA1_Update
|
|
|