Re: Benchmarking Strategy?

Before even thinking about starting any large project I think long and hard about how it can be broken down into smaller testable components. This involves three kinds of analysis:

data analysis: identify each bit of data you will need. Then figure out how one bit of data relates to another. What "things" are you keeping data about? How are they identified? What attributes do you need to track about these things? Answering these questions will help you break down your project into smaller pieces that can be tested individually. You may find it helpful to dust off whatever you once learned about data normalization. Normalization gives you a particularly robust way of answering these kinds of questions
use case analysis:Now that you know what information you are working with, how do you need to use it? Is there a work flow involved? If so, what are the steps? If some of your data is derived how will you calcualte it?
functional analysis: Now that you know both the data and how you want to use it, what are the programming steps for each derivation or work flow component? Do some of the derivations or work flow components have common logic? Can this logic be factored out into smaller reusable subroutines?

These three kinds of analysis are often done in sequence and there are huge arguments in the design field about whether use cases or data come first. However, my feeling is that each kind of analysis simply represents a different way of focusing on the problem and actually answers questions about use, data, and functions all at once. So take the implied sequence with a big grain of salt. When you analyze use cases, you often find that some data that seemed important wasn't and other data that is really important was completely missed. When you analyze data, you often find that there is a lot of data that is very important to getting a job done but its presence is taken as a given so no one notices it is there until they do data analysis. The same thing can happen when you analyze functions - you may discover that an algorithm needs more data than you expected or that a use case that once seemed cut and dry is really a lot more ill defined than anyone realized.

Instead of testing your entire system at once, you can begin by defining objects (and maybe even database tables or flat files) that correspond to each of the "things" you identified during data analysis. Then you can define small tests to make sure that each object and/or data base table is correctly defined. That is, the data you use to construct it is in fact the data that is stored. Then if your actual application creates or modifies multiple objects at once, you can move onto testing entire transactions for creating or modifying objects and database tables (or whatever other form of persistence you choose to use). If one of your anticipated bottlenecks is the creation and storage of data, you can also test this.

In the same way you can test your use cases bit by bit. First you start with writing and testing each of these building block routines. If you expect a particular algorithm to be a bottle neck you can write and then compare the performance of each alternative algorithm. But I'd be a bit careful of your assumptions - sometimes the bottlenecks we expect turn out not to be problems at all. There is a significant risk of premature optimization at this point. My own rule of thumb is don't worry about what is best unless the choice of implementation is going to force some hard-to-back-away from choices when you are ready to combine the building blocks into more complex functionality.

Then when you are comfortable with the behavior of each building block, you can move onto writing and testing larger, more complex functionality. Finally, when you have all of the stages of a calculation or work flow tested, you can move onto testing each use case. This is a good time to apply the strategy suggested by Limbic~Region - profile for bottlenecks, then benchmark. Again, sometimes the bottlenecks you expect aren't the ones that cause the real trouble.

Best of luck with your project, beth

Comment on Re: Benchmarking Strategy?

Replies are listed 'Best First'.

Re^2: Benchmarking Strategy?
by Limbic~Region (Chancellor) on Jul 23, 2009 at 20:31 UTC

beth

Finally, when you have all of the stages of a calculation or work flow tested, you can move onto testing each use case. This is a good time to apply the strategy suggested by...

Very astute. I incorrectly assumed the planning and design analysis was a known entity since there was a prototype written several years ago. I have updated my initial response with a link to SDLC and planned to add a more in depth response later if someone hadn't already. Then I saw your response. In reading it, I think there is still additional information that may be valuable to pileofrogs. For large projects needing hardware, COTS licenses, architecture decisions before software development can begin, testing as you have described may not be option. Acquisitions is the part you need to get right the first time because of the cost (time and money) of being wrong. How then can you determine what to buy if it costs money to test which is best?

I know that this is probably very much not the case here as pileofrogs will find that disk I/O is the largest bottleneck and the best way to resolve the issue is to be smart with his database design, caching, and SQL. Assuming a free database and no solid state disks in the budget to eliminate IO, this all falls back to software and your points are spot on. I just think that, by their own admission, being unware of standard process for design decisions should mean doing more research. For instance, learning how to find and interpret independent market research instead of believing what a sales rep tells you about their product being perfect. I don't currently have the time to do this myself but since you seem to have far more experience than myself, if you have links handy that would be great :-)

Cheers - L~R

[reply]


laziness, impatience, and hubris
	PerlMonks