Why Spreadsheets are Dangerous

The now familiar spreadsheet program was invented by Hewlett-Packard in the early 1980s for its first personal computers.

It was envisaged as a computerised version of the account book, and its computational facilities centered around adding up or adding together rows and columns of figures, or multiplying them by interest and VAT rates.

It is important to realise what a spreadsheet is and what it isn't. It is essentially a presentation tool with the ability to do calculations. Like an account book, it is something to be looked at to see the outcome of calculations. The accompanying text in a set of accounts describes, or often just implies, what the calculations were. Since these are simple, this is usually adequate, and it is also easy to check any individual calculation.

The spreadsheet is a very good tool for presenting information that has been manipulated in the ways in which its inventors envisaged. BUT...

IT WAS NEVER ENVISAGED THAT SPREADSHEETS WOULD BE USED FOR SERIOUSLY COMPLICATED CALCULATION.

Its emphasis is on presentation, not calculation. In fact, the calculations are deliberately hidden so as not to distract from an interpretation of their results.

Most spreadsheets in the world are used for financial calculations, some of them more complex than the original simple bookkeeping, but not comparable in sophistication to even quite simple engineering calculations. Despite this, studies have suggested that about 40% of all financial spreadsheets contain errors, and that 90% of large, i.e. more than 150 row, spreadsheets contain errors!

The most dangerous point is that these errors were only detected AFTER people had started using the spreadsheets. Some of them have cost companies a great deal of money.

Now imagine what could happen if someone tried to design a chemical plant handling explosive, toxic chemicals under extreme conditions with a spreadsheet...

Now start worrying that someone, somewhere, has probably done just that....

Why do most spreadsheets contain errors?

The problem with a spreadsheet is that it deliberately designed to hide what it is DOING in order to emphasise what it has DONE.

How then does one check what a spreadsheet does?

The usual procedure is to check the results of a calculation by some other method. Unfortunately, it has long been established, see Dijkstra [1], that this is not a practical method of verifying a general computation, unless it is exceptionallly simple.

Another way would be to examine and check the contents of each cell and see that it contains what it is supposed to contain. This requires that:

Spreadsheets are so difficult to check, that the only way to make sure it is correct is to use a design procedure that prevents errors being introduced. This is true of all computer programs, but conventional programming languages have evolved with this end in view. Spreadsheets have not.

How can I use a spreadsheet safely?

The best way is only to use a spreadsheet for the sort of task for which it is suitable. As I have implied above, these are quite limited. For example:

Avoid trying to do anything that involves:

The mechanics of typing into spreadsheeets and the use of obscure and similar names for cells makes it very easy to make mistakes in typing and very hard to spot them afterwards by examining individual cells.

Is it possible to create a spreadsheet which will reliably solve a complicated problem?

Yes, but the average user will not be able to do this unaided.

There are two ways of doing this.

  1. By very careful and systematic design of the spreadsheet.
  2. Get a computer to create the spreadsheet.

Systematic Design

Doing this properly is a job for a trained software engineer. There are companies which use teams of people and charge a lot of money do design spreadsheets for the financila services industry. See [3] (Unfortunately, this web link seems to have disappeared, try here.) where the involved, multi-layered process is outlined.

It is almost inconceivable that the average chemical engineer could produce a robust spreadsheet for e.g. a nonideal flash calculation.

If you MUST use a spreadsheet then follow the design steps listed below.

Computer Generation

Writing a spreadsheet is actually very similar to writing a computer program using the underlying instructions with which a computer works.

Almost from the earliest days of computers it was realised that this was not a job which most people would ever be able to do. Computer specialists therefore created special programs called compilers which allowed users to write out the instructions to solve their problems in a more easily understood form in high level programming languages. It later became clear that getting the correct form for these languages was very important, see [1].

The same approach can be adopted by writing a computer program to generate spreadsheets from a set of instructions which look very much like instructions in a high level language such as Fortran. Again, the design of the high level descriptive language is very important.

A language for describing process engineering models, called gPROMS, later renamed ABACUSS, was devised by Paul Barton [4] We develeoped a compiler for translating a simplified version of this into a spreadsheet.

Here is an example of a rather simple flash calculation written in this language.

model flash2
! Two component flash with fixed split

parameter
 f1 = 1 ; f2 = 1  ! equimolar feed
 alpha21 = 5      ! Component 1 is LVC, 2 is MVC
 vfrac = 0.5      ! now vapour fraction required   
 r1 = 0.5         ! initial guess for r1        
end parameter

variable
 v1, v2, l1, l2, s1, s2, r2 
 func  ! to become zero             
end 

equation
 v1 = r1*f1       ! 1. recovery of 1, r1 is tear variable
 l1 = f1 - v1     ! 6. mass balance 1
 s1 = v1/l1       ! 3. ratio 1
 s2 = s1*alpha21  ! 5. relative volatility
 r2 = s2/(1+s2)   ! 4. rearrangement of s2 = 1/r2 - 1
 v2 = r2*f2       ! 2. recover of 2
 l2 = f2 - v2     ! 7. mass balance 2
 func = vfrac - (v1+v2)/(f1+f2) ! 8. to become zero   
end equation

steadystate
end model
Even if you do not understand flash calculations, it is fairly obvious what is being calculated from what else on each line. For example the formula on the last line:
vfrac - (v1+v2)/(f1+f2)
is much easier to understand and check than its spreadsheet translation:
R2C6 - (R5C3+R5C4)/(R2C3+R2C4)

If I had made a mistake in writing one of the two component balances, e.g :

l2 = f2 - v1  instead of  l2 = f2 - v2
it is fairly easy to spot. But would you spot it if I introduce the same error into the cell formula?
R2C4 - R5C3

By contrast, try looking at the generated spreadsheet in e.g. Excel, and try to work out what is going on!

Note: I have now withdrawn this as an online facility, but the language still serves as a guide to model creation. See later links for its replacement.

As a further example, here is a spreadsheet that could not possibly have been written by hand. This spreadsheet solves a set of 18 simulataneous linear equations. It is relatively easy for a moderately competent programmer to to adapt the algorithm for solving such equations into one for writing out the instructions necessary fro a spreadsheet to do so. However the solution involves a large number cells (about 4000) containing formulas such as:

R44C15-R3C14*R22C14-R3C15*R22C15-R3C16*R22C16-R3C17*R22C17-R3C18*R22C18-R3C19*R22C19-R3C20*R22C20
It is beyond the skills of any normal human to have written this by hand without introducing errors.

See here for an outline of how you must write spreadsheets.

References and Links Further Information

[1] EW Dijkstra, `Notes on Structured programming', in `Structured Programming', APIC Studies in Data preocessing No 8, Academic Press, 1972

[2] R Panko, Spreadsheet Research website http://panko.cba.hawaii.edu/ssr/,p>

[3] Enumara, financial analysts, http://www.enumera.co.uk/spreaderr.htm

[4] PI Barton, Chemical Engineering, MIT, website http://web.mit.edu/cheme/people/faculty/barton.html

[5] ECOSSE spreadsheet generation programs: Linear equations and general models. (N.B. these are used at you own risk. There is no guarantee of correct operation or continuing availablility.)