Cooking and software efficiency
Do you cook? A nasty saying about developers is that they live mainly on fast food like take-out pizzas, take-out chop suey, or hot-dogs. Some people may argue that this is why software applications often lack efficiency. In this weblog I want to encourage developers to apply the rules of cooking to software, because cooking can tell us a lot about efficiency. So let’s poke our noses into the cook’s pot:
Starter: the tidy kitchen
One ‘must have’ for running an efficient kitchen is order. Have you ever tried to cook in someone else’s kitchen? At first, you will waste a lot of time opening the wrong drawers and searching for the things you need: the knives, the strainer, the can opener (for cheaters only), and so on. After a while, you will be better oriented and learn where to find what you need. Unconsciously you have built up a ‘kitchen search index’ that helps you to find things faster. The clever cook will also keep his spices in a well-defined order on the shelf, for example in alphabetical order or according to their taste.
This is very similar to how you should access data in a program: every frequently executed database access, for example, should be supported by an appropriate database index. Otherwise, the database has to open too many ‘drawers’ when searching for the needed data. The same holds true for accesses to internal tables in ABAP. Do not access large internal tables sequentially, for example with a READ, w/o using an index or table key. If you do so, you act like a cook who scans dozens of identical looking spice flasks until he finds the one with the nutmeg. Instead, use sorted standard tables or internal tables with sorted or hashed key that allow a fast access.
Portions, packages, and bundles
A second thing that we can learn from cooking is portioning. If you want to cook a meal for you and your best friend, would you go to the cellar and shlep a 50kg sack of potatoes up the stairs? No – the efficient cook just fetches as much food as is needed. And if a fruit salad is on the menu, don’t get the whole fruit basket from the living room or the pantry. On the other hand, neither should you go once to get an apple, then again to fetch a pear, and once again for the banana. You are the cook, you know the recipe for the fruit salad – so just go once and get all the fruits you need, no more, no less.
Again, this gives us a hint of how to read data from the database: not more than we (that is our application) need to do the current job. But if we know for sure that several data records from the same database table will be needed, an array fetch is better than many single fetches. Portioning or bundling is also important for the communication between remote systems or between distinct system components, for example between a service provider and the consumer. In the same way as the cook does not want to overload himself with heavy sacks, or make superfluous walks to fetch single fruits, it is reasonable to bundle the data that is exchanged between remote systems into packages of adequate size. Packages that are too small lead to many round trips, thus increasing the overall response time. And very large packages can overload the system components and result in time outs.
Store, cache, and buffer
Speaking about superfluous round trips we return to the pantry, which saves the cook from going to the grocery store every day for food that is frequently needed like rice, sugar, or noodles (which kind of food is frequently needed in a kitchen of course heavily depends on the cook’s taste buds). So the pantry works as a kind of local store or buffer that minimizes the number of very slow round trips in favor of faster ones. Some ingredients like salt or certain spices are so frequently used in cooking, that no cook could afford even to fetch them from the pantry or from a drawer – they must be right beside the kitchen stove, always within reach.
In the world of software, buffering or caching also play an important role. Data that is frequently requested but only rarely modified (for example business configuration data) should not be read over and over again from the database. Identical information should not be extracted repeatedly from service methods that need to descend numerous levels in the program’s call hierarchy to retrieve it (or, even worse: to deliver nothing). Instead, such data or information must be buffered on the application server (in the case of database tables within the SAP table buffer), or locally in the program closest to the information consumer, that is, within reach. Not caching such data or bypassing the cache is like ignoring the salt shaker next to the kitchen stove and walking down to the cellar every other minute to get a pinch of salt.
Main Course: Concurrency and Scalability
Two other important aspects of efficient cooking and programming are scalability and the concurrent use of resources. Cooks are masters in doing things in parallel. While the water is heating in the pot, and the onions are sizzling in the pan, the cook can chop the carrots (and don’t forget the soufflé in the oven!). Good kitchens are designed for parallel work: the kitchen stove has 2 or more hot plates or burners, and you will find more than one pot and more than one sharp knife. If you expect several guests, you have two choices to feed them all: first, if you still work alone, you will have to invest more time, because you cannot do everything in parallel (ever tried to chop vegetables with your right and your left hand at the same time?). Or you can hire some helping hands to prepare the meal faster. If there are no bottlenecks in your kitchen like lack of knives, pots, or burners, in principle a recipe can be scaled-up from two persons to several dozens of eaters, while leaving the preparation time nearly constant (some people say that meals coming from a canteen never reach the quality of those prepared for only a few persons. But this weblog is about cooking efficiency, not about meal quality).
What can a developer learn from cooking with regard to scalability and concurrent use of resources? If your recipe – pardon, your program – scales (linearly or better) with the amount of data to be processed, it will serve hundreds of users as well as a single one just by adding burners (CPUs, memory, disk space on multiple application servers). This only holds true, as long as there are no bottlenecks like database locks or other single resources that get overloaded and lead to a serialization of users or processes. If you have only one sharp knife, you can add as many burners, pots and helpers as you want to – they will all be idle because the onions cannot be chopped fast enough. Another threat to scalability – besides locks and resource bottlenecks – is non-linear runtime behavior. In software engineering that means an algorithm efficiency with a worse than linear (or linear times logarithmic) runtime behavior as a function of the processed data volume. In business applications an algorithm with quadratic or even cubic runtime behavior is rarely requisite, but non-linear code can often be found in deficient application coding. A prominent example are internal tables that grow with the amount of the processed data and that are accessed sequentially and in a nested way (see for example in Performance Problems Caused by Nonlinear Coding). I never came across a quadratic effect in cooking. An example would be a big vegetable that has twice the weight of a small one of the same kind, but that needs to boil four times as long (and that cannot be cut into pieces for some reason). Have you ever encountered such a non-linear vegetable effect? Maybe the vegetables I tested were just too small. This is by all means the reason why application developers often ignore non-linear effects in their code: the amount of data they use for testing is too small. Or they only perform a single measurement from which nothing can be concluded with respect to scalability. If your application scales, you can keep the response times for concurrent users acceptable just by adding resources. In the same way you will be able to process a large set of independent objects in parallel tasks, thus reducing the overall processing time tremendously. But remember: precondition for this is the absence of bottlenecks and a linear or better runtime dependence on the amount of processed data.
I hope you found this excursion into cooking and software efficiency flavorful. The next time you implement a piece of program code, try to look at it with the eyes of a cook: Will your guests like its taste? And will it also be efficient?