In optimisation we are in a situation where a variety of acceptable answers are available to our problem. The aim is to determine which one is best in terms of some specified criterion.
Note that:
The experimental data comes in the form of a set of measurements , say of reactor temperature T which has changed as a result of adjustments or random changes in say feed rate F. These come in pairs (T1, F1), (T2, F2), ... (Ti, Fi), ... (Tn, Fn),
They are to be fitted to an equation which will give T in terms of F and one or more parameters e.g. a and b:
T = f(F, a, b)
The problem is to find a and b to give an optimal fit to the above equation for the measured data. One way of defining this optimum is in terms of the minimum square deviation between experimental and theoretical temperature points. The squared deviation for a single point will be:
[Ti - f(Fi, a, b)]²
We require to minimise the sum of these squared deviations over all n points.
The reason why this must be an optimisation problem rather than an equation solving one is that if we were to treat it as an equation solving problem there would be too many equations for the number of unknowns. There are two unknowns, a and b. However, for each pair of measurements we have effectively one equation, which we could write as:
T1 = f(F1, a, b)
for the first pair of measurements
T2 = f(F2, a, b)
for the second pair of measurements
... and so on.
Unless we had only two measurements we have more equations than unknowns and what is called an overdetermined system of equations. Data fitting is thus a particular case of `solving' such a set of equations, which involves finding the solution which most nearly satisfies all the equations.
Consider the fitting data to the
equation below
to estimate the parameter a.
We have a set of data paired points (yi, xi). These give an experimental value yi and with the fitted equation, enable a corresponding calculated value to be determined from a xi.
The difference between experimental and calculated values
or residual for each point is thus:
The aim of the the fitting procedure is to minimise this overall by
minimising the sum of the squares of these residuals over all data points.
This is an optimisation problem to find a so as to minimise the objective
function P(a):
Expanding:
Differentiating w.r.t. a:
P' must be zero at the minimum, so we get:
Consider the following data:
| x | y(measured) | x2 | xy |
| 0 | 0.5 | 0 | 0 |
| 1 | 1.5 | 1 | 1.5 |
| 2 | 4.5 | 4 | 9.0 |
| 3 | 5.5 | 9 | 16.5 |
| 4 | 8.5 | 16 | 34.0 |
| 5 | 9.5 | 25 | 47.5 |
| 55 | 108.5 | ||
Hence:
In fact the data were generated from y = 2 x with errors of
introduced at alternate readings.
It is also possible to fit linear parameters directly using a general procedure, such as is available in the Excel solver. A link to a spreadsheet which does this is here.
In some cases it is possible to transform
the problem so that it is linear in its parameters. For example:
y = exp( a x )
is not linear in the parameter a. However, if
we take logs of both sides then we have:
ln y = a x
This is linear in a which can be determined by fitting ln(y) as a function of x.
Statisticians will warn you that this procedure is not precisely equivalent to fitting the nonlinear parameters directly, which is what must be done if no suitable transformation is available.
Example: pipeline for constant flow rate.
Capital cost
pipe diameter, D - maybe should be D0.6 or D0.9.
Operating cost
D-4.75 - assumes friction factor
Re-0.25.
Typically, capital cost versus operating cost.
| f(x) | - the objective function |
| x | - the decision variable |
| constraints: |
|
|
|
Constraints define feasible region.
Minimise f(x) with respect to (w.r.t.) x, subject to (s.t.)
,
Example: (use D instead of x) minimise
f(D) = 2.0 D + 0.4 + 0.5 D-4.75 (s.t.)
,
.
If expression available for derivative..
Solve
Nature of each SP at xsp is given by sign of second derivative f''(x):
| + | implies local minimum |
| - | implies local maximum |
| 0 | don't know |
Example:
implies
m.
and
Hence local minimum.
Beware of discontinuities
Simple theory assumes that both f(x) and f'(x) are continuous. If not, global minimum may exist at or near a discontinuity in f or f'.
Profit, throughput, reactor conversion,...
`Maximise f(x)' is equivalent to `Minimise -f(x)'.
Without loss of generality, a computer code can treat all problems as minimisations.
One-dimensional search methods based only on comparison of function values
Use when derivative f'(x) is inconvenient or expensive to evaluate, or is not available.
Always pose the optimisation problem with constraints!
But if a local minimum is suspected within a much smaller region than defined by lower and upper bounds on x, then it may be worthwhile starting with an unrestricted search to establish a tighter bounded search region.
Acceleration provides a compromise between taking many short steps, and finding a very large bounded search region.
This version of the algorithm is slightly complicated because we assume there are bounds in the problem.
Minimise
f(D) = 2.0 D + 0.4 + 0.5 D-4.75
s.t.
,
.
DATA:
| Lower bound for diameter | 0.1 m |
| Upper bound for diameter | 2.5 m |
| Starting value for diameter | 0.2 m |
| Initial step in diameter | 0.1 m |
| Step acceleration factor | 2.0 |
| D = 0.1 | cost = 28117.666 |
| D = 0.2 | cost = 1045.707 |
| D = 0.3 | cost = 153.280 |
| D = 0.5 | cost = 14.854 |
| D = 0.9 | cost = 3.024 |
| D = 1.7 | cost = 3.840 |
RESULT:
Lower bound for bounded search 0.5 m
Upper bound for bounded search 1.7 m
When a region is known to contain a minimum for the problem, apply an iterative strategy for interval reduction.
Systematically exclude one or more sub-intervals in which the optimum does not lie. Thereby create a new interval for the next iteration.
Terminate when
Methods
Divide interval into 3 equal sub-intervals.
Eliminate the sub-interval which is not adjacent to the minimum f.
Cost: 2 function evaluations per iteration
Benefit: Reduce search range by 33.3 % at next iteration.
Drawback: the interior point already calculated cannot be re-used within the new interval.
Divide interval into 4 equal sub-intervals.
Cost: 2 or 3 function evaluations per iteration
Benefit: Reduce search range by 50 % at next iteration.
Advantage: after the first iteration which needs 3 function evaluations, the middle point in the surviving two sub-intervals can be re-used. Thus rapidly becomes more efficient than `third-interval' method.
Divide interval into 3 sub-intervals.
The sub-intervals are symmetric, but not equal.
Then ensure that sub-intervals for the new interval of length L' can be constructed with the same relative proportions as those in the previous iteration.
This
(a) permits re-use of one point from the previous iteration;
(b) does not prejudice the rest of the search by changing the relative sub-interval sizes.
To achieve this condition,
must satisfy
Cost: 1 or 2 function evaluations per iteration
Benefit: Reduce search range by 38.2 % at next iteration.
Advantage: after the first iteration which needs 2 function evaluations, the point within the surviving sub-interval can be re-used. The method was designed to achieve this.
Golden section is the most efficient method which makes no assumption (based on gradient information, for example) about where in the search interval the minimum is likely to be.
Reduce search range to 0.001 of initial length. Let this process take Ni iterations.
Third-interval
Half-interval
Golden section
Continue with the pipe diameter optimisation.
Start bounded search with 0.5 < D < 1.7 m.
Set diameter tolerance to 0.001m.
Use Golden Section.
See section 5.3.3.2
When no information is available except comparison of function values,
Given:
| xL | lower bound specified for optimisation |
| xU | upper bound specified for optimisation |
| x0 | starting point for search (should be feasible: xL < x0 < xU) |
| starting step size | |
| acceleration factor (greater than 1) |
and a procedure to evaluate f(x)
If either of the adjacent points crosses xL or xU reset the value to the constraint.
+ if the rightmost point was the minimum,Set the current step size = initial step size
- if the leftmost point was minimum.
Methods in section 5.3.2 involved function comparison. No use was made of function values. Compare solving 1 NLAE with interval halving.
Can we also use function values?
Compare quadratic approximation with golden section, the most efficient `dumb' method.
Three points are known: boundary points and one interior point of the search interval. Interior point should be minimum. Points are not equally spaced (in general).
The task:
fit a quadratic function
See section 5.3.3.1.
Use of quadratic approximation is analogous to a bounded secant search when solving 1 NLAE: the Regula-Falsi method.
In fact it is Regula-Falsi on
f'(x) = 0, where f'(x) is estimated from the function values using central differences:
When the minimum point xm of the quadratic is found, evaluate f(xm).
Note:
,
the value of the quadratic.
The quadratic approximation is used only to find the position (in x) for the next function evaluation.
As long as the search range has the interior point as minimum then xm lies within the search range (indeed between the mid points of the sub-intervals).
For next iteration, keep the two sub-intervals adjacent to the minimum.
However, if the minimum is constrained at a boundary of the feasible region this may not be so, and the minimum of the quadratic may lie outside the search interval. In this case, do not use quadratic approximation but revert to golden section and see whether the minimum is indeed precisely at the boundary.
The method can get stuck
Pipe diameter optimisation : D = 0.5, 1.1, 1.7m starting points. 25 iterations to converge! (Results)
Much slower than Golden section, like Regula-Falsi on a highly non-linear f(x) = 0.
There is a case for combining quadratic fitting with golden section, to keep the rate of interval reduction satisfactory.
Line search in multivariate optimisation
Line search (= univariate optimisation or improvement) is a subproblem in most practical multivariate optimisation methods.
Line search along direction
starting from point
:
But most methods do inexact line search not exact line search - `sufficient decrease' not rigorous minimisation.
Given
q(x) = A x2 + B x + C the stationary point exists at xm where
| A x12 + B x1 + C = f1 | (1) | ||
| A x22 + B x2 + C = f2 | (2) | ||
| A x32 + B x3 + C = f3 | (3) |
| A ( x22 - x12 ) + B (x2 - x1) = (f2 - f1) | (4) | ||
| A ( x32 - x22 ) + B (x3 - x2) = (f3 - f2) | (5) |
| A ( x22 - x12 ) (x3 - x2) + B (x2 - x1) (x3 - x2) = (f2 - f1) (x3 - x2) | (6) | ||
| A ( x32 - x22 ) (x2 - x1) + B (x3 - x2) (x2 - x1)= (f3 - f2) (x2 - x1) | (7) |
| A [ ( x32 - x22 ) (x2 - x1) - ( x22 - x12 ) (x3 - x2) ]= (f3 - f2) (x2 - x1) - (f2 - f1) (x3 - x2) | (8) |
| A (x3 - x2)(x2 - x1)[ ( x3 + x2 ) - ( x2 + x1 ) ]= (f3 - f2) (x2 - x1) - (f2 - f1) (x3 - x2) | (9) |
Given:
x_low lower bound on minimum
x_high upper bound on minimum
x_tol tolerance for convergence in x
and a procedure to evaluate f(x).
Evaluate f_low = f(x_low) and f_high = f(x_high)
Set iteration counter = 1;
set x(1) = x_low, x(4) = x_high
f(1) = f_low, f(4) = f_high
! Iteration loop
Do (until converged)
calculate length of interval: L = x(4) - x(1)
if iteration 1 or we kept the left hand sub-intervals
from previous iteration, then
set x(2) = x(1) + 0.381966 L;
calculate f(2) = f(x(2))
end if
if iteration 1 or we kept the right hand sub-intervals
from previous iteration, then
set x(3) = x(4) - 0.381966 L;
calculate f(3) = f(x(3))
end if
find point i with minimum f(i);
set i_min = i with min f(i) ! i_min is best point so far
if L is smaller than xtol
declare convergence
otherwise
if i_min is 1 or 2 then
keep the left two sub-intervals (set a flag)
end if
if i_min is 3 or 4 then
keep the right two sub-intervals (set a flag)
end if
if keeping the left sub-intervals
copy point 3 to point 4; copy point 2 to point 3
if keeping the right sub-intervals
copy point 2 to point 1; copy point 3 to point 2
! `copy' means copy both x and f values.
! It is important to copy the points
! in the stated order.
end if
if we have convergence
x_opt = x(i_min); f_opt = f(i_min) ! solves the problem
exit from the iteration loop
end if
end do
You will need a flag (integer or logical variable) to indicate whether the left
or right sub-intervals are being kept.
Remark: The three equal interval and (to some extent) the four-equal interval methods are easier to implement.
For 3 interval copy only the points adjacent to i_min to x(1), f(1) and x(4), f(4). This sets up the next iteration, but we must always do two function evaluations at the head of the loop at
x(2) = x(1) + 0.333333 L; x(3) = x(4) - 0.333333 L
4 interval has 5 points in the current range. Provided i_min is an interior point (not x(1) or x(5)), copy point i_min to point 3, and the adjacent points to new bound points 1 and 5. At each iteration do two function evaluations at the head of the loop at
x(2) = x(1) + 0.25 L; x(4) = x(5) - 0.25 L
At iteration 1 also evaluate f at x(3) = 0.5 ( x(1) + x(5) ).
Problems which appear to be multivariable but include equality constraints may sometimes be reduced to single variable form.
Find the dimension which minimises total area.
Let the overall dimensions of the gauze be x cm by y cm. The objective function to be minimised is then clearly:
P = x y
The mounting margins provide the active width of (x-8) and height of (y-12), which must give a total exposed area of 300cm¼ so that there will be an equality constraint:
(x-8) (y-12) = 300
By rearrangement of the equality constraint we can eliminate either of the variable in the o.f. E.g. rearranging to get x:
x = (204+8y) / (4-12y)
The unconstrained single variable o.f. then is:
P = x y = y (204+8y) / (4-12y)
The minimum can be found either by differentiating and setting equal to zero or by a direct search.
W/V = (P1/K) [(P2/P1)K - 2 + (P3/P2)K]
Here K=(k-1)/k where k is the ration of heat capacities, approximately 1.4.
For the case where P1 = 1 atm and P3 = 4 atm confirm the well known result that P2 should be 2 atm.
A requirement for the use of LP is that all cost functions are linear. This is often the case for operation, where the use of a resource is typically proportional to the time of use or throughput. In designing new plant the `7/10 power law' tends to apply to capital costs and so o.f.s are nonlinear and Lp cannot be used directly.
A company has two plants in which it can make a product. Plant A is an older plant on which production costs are £ 170/te and for the newer plant B they are £ 150/te.
Transport costs to the three main customers P, Q and R are significant and depend on their distances from the plants. These costs in £/te are as shown below.
Plant: A B Customer: P Q R P Q R Cost: 25 60 75 20 50 80
Plant A has a maximum production rate of 1.7 te/day and plant B of 0.75 te/day. Customers will take as much product as is offered (all pay the same price) but their minimum requirements, listed below, must always be satisfied.
Customer: P Q R Minimum: 0.9 0.7 0.4
Find the company's optimum production and distribution schedule.
Let the production for P, Q and R at A be x1 to x3 respectively, and the same at plant B be x4 to x6. The function to be minimised is then:
min: P = 170 (x1+x2+x3) + 150 (x4+x5+x6) + 25x1 + 60x2 + 75x3 + 20x4 + 50x5 + 80x6 ;Plant capacity imposes the following inequality constraints:
x1+x2+x3 <= 1.7 ; x4+x5+x6<= 0.75 ;The customers' minimum requirements give:
x1 + x3 >= 0.9 ; x2 + x4 >= 0.7 ; x3 + x6 >= 0.4 ;
Three products A, B and C might be manufactured using this capacity. the requirements in hours/batch of each are listed below.
Product: A B C Reactor 0.8 0.2 0.3 Crystalliser 0.4 0.3 - Centrifuge 0.2 - 0.1There is a sales limit equivalent to 20 batches/week for product C, but none for A or B.
The profit per batch is £20, £6 and £8 for A, B and C respectively. find the optimum weekly production schedule.
Letting the number of batches of A, B and C be x1, x2 and x2, the o.f. is:
max: 20 x1 + 6 x2 + 8 x3;Equipment constraints are then:
0.8 x1 + 0.2 x2 + 0.3 x3 <=20; 0.4 x1 + 0.3 x2 <= 10 ; 0.2 x1 + 0.1 x3 <= 5 ;And C production limit gives:
x3<=20 ;Strictly speaking the number of batches must be an integer so this really should be an MILP. However, when the numbers involved are quite large, it is usually possible to solve as an LP and round down.
Solving as an LP gives optimum as 13.75 batches of A, 15 of B and 20 of C for a profit of £525/week. Rounding A production up to 14 batches violates the first constraint, requiring 20.2 hours of reactor time per week. Rounding down to 13 batches gives a near-optimal o.f. of 510. A true MILP solution (section 5.4.3) gives a better o.f. of 524 with 14 batches of both A and B.
Note that this `solution' simply says that the equipment required will be available for the total number of hours required each week. it does not say what the operating schedule should be, nor indeed, since it will normally be necessary to have the equipment available at particular times and in a given sequence, whether a schedule is in fact feasible.
max: 20 x1 + 6 x2 + 8 x3; 0.8 x1 + 0.2 x2 + 0.3 x3 <=20; 0.4 x1 + 0.3 x2 <= 10 ; 0.2 x1 + 0.1 x3 <= 5 ; int x1, x2, x3 ;As noted earlier, this gives an optimum of 524 and 14, 14 and 20 batches of A, B and C respectively.
Suppose however we require to schedule on a daily rather than a weekly basis. The modified problem now has smaller values for the time constraints as shown in the model below. (The constraint of C production has also been removed.)
max: 20 x1 + 6 x2 + 8 x3; 0.8 x1 + 0.2 x2 + 0.3 x3 <=4; 0.4 x1 + 0.3 x2 <= 2 ; 0.2 x1 + 0.1 x3 <= 1 ;For the MILP formulation the additional line is also required:
int x1, x2, x3 ;The solution as an LP gives an optimum (for a day's production) of 111.111 with only B (6.667 batches) and C (8.889 batches) being made (i.e no A at all).
However, rounding neither up or down on either of these numbers gives the optimal integer solution which is no A, 5 batches of B and 10 of C, with an of of 110.
It can be seen that the assignment involves full utilisation of both reactor (total 4 hours) and centrifuge (total 1 hour). Suppose that the period of operation is a 4 hour slot, with the reactor continuously available, and the centrifuge available `on demand' at any time so long as its total utilisation does not exceed 1 hour.
The schedule will then be determined by the availability of the crystalliser. If this is for 4 half hour slots every hour starting after the first half hour, then it is clear that 5 batches of B cannot be produced. Some reactor time may also be lost from the need to synchronise the end of reaction with the start of crystallisation.
Fitting all these constraints together is another MILP problem where the o.f. may be either profit maximisation or maximum equipment utilisation. Setting up such problems is substantially harder than those so far described, and so will be regarded as beyond the scope of this course!
Empirically determined, the following schedule seems to be the only feasible
one, even assuming on demand availability of the centrifuge, which
greatly simplifies the problem:
C1 B1 C2 C3 C4 B2 C5 C6 C7 B3 C8 C9 B4 C10 (0.2 hours slack on both
reactor and crystalliser)