In assignments A and B we developed two approaches to storing fish when
the population is dynamic and the fish world is unbounded, a binary search
tree (BST) and a hash table (HT). Here we outline the analysis needed
to compare the efficiency of these approaches and then suggest instrumenting
the programs to compare there actual performance empirically.
We need to analyze the following functions for the Environment class:
Now consider one Step for the Simulation. How many times (on average) are each of these functions called for one Step. Use this information to give a final big-O expression for the time needed for one step in the simulation for both BST and HT implementations. How do they compare? Which do you think will be faster in practice?
We can do some empirical timing by adding the timer facility given by the SysTimer class found in the files systimer.h/cpp.
For each version of the program, BST and HT, code a version of the program that uses the simulation run program to run through 1000 steps of the simulation. Use the functions from the SysTimer class to start and stop timing just before and after the call to the Simulation run function. (You may want to display before and after, but keep that outside the bounds of the timing.) Have the program print out the elapsed time for the run.
Do this experiment for each implementation, using fish files with differing numbers of fish. Be sure the fish are in the file in a random order, so that the BST is not skewed. Use your timings to compare the absolute performance of the different implementations. By setting the fish data so that fish can't die, and fish can't breed, you can work with fixed fish populations and use a spreadsheet to plot time versus data size to see if it fits the predicted growth rate from the asymptotic analysis.
Do the experiment with a version of the model with a dynamic population, using the two different data structures. Since the population varies over time for a run of the simulation, you cannot do a good analysis of time as a function of data size, but you can still compare the performance of the different models. If you calculate the average population size, you could use that for the time-data size analysis. To keep the comparisons fair you should do two things. First, use an explicit seed for RandGen and use the same seed for different data structures, so they are dealing with the same dynamic fish populations. Second, do this for several different seeds for a given starting population size and average the results.
What are your conclusions? Which implementation do you think is
better?