26
I R I S A I N S T IT U T D E R E C H E R C H E E N IN F O R M A TIQ U E E T S Y S T È M E S A LÉ A T OIR E S P U B L I C A T I O N I N T E R N E N o I R I S A CAMPUS UNIVERSITAIRE DE BEAULIEU - 35042 RENNES CEDEX - FRANCE ISSN 1166-8687 PI-1152 UNBOUNDED KNAPSACK PROBLEM: DYNAMIC PROGRAMMING REVISITED RUMEN ANDONOV AND VINCENT POIRRIEZ AND SANJAY RAJOPADHYE

static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Embed Size (px)

Citation preview

Page 1: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

I R

I S

AIN

STIT

UT D

E R

EC

HER

CH

E E

N IN

FO

RM

ATIQUE E

T SYSTÈMES ALÉATOIRES

P U B L I C A T I O NI N T E R N ENo

I R I S ACAMPUS UNIVERSITAIRE DE BEAULIEU - 35042 RENNES CEDEX - FRANCEIS

SN

116

6-86

87

PI-1152

UNBOUNDED KNAPSACK PROBLEM: DYNAMIC

PROGRAMMING REVISITED

RUMEN ANDONOV AND VINCENT POIRRIEZ AND

SANJAY RAJOPADHYE

Page 2: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …
Page 3: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES ALEATOIRES

Campus de Beaulieu – 35042 Rennes Cedex – FranceTel. : (33) 02 99 84 71 00 – Fax : (33) 02 99 84 71 71

http://www.irisa.fr

Unbounded Knapsack Problem: Dynamic ProgrammingRevisitedRumen Andonov* and Vincent Poirriez** and Sanjay Rajopadhye***Th�eme 1 | R�eseaux et syst�emesProjet APIPublication interne n�PI-1152 | Octobre 1997 | 24 pagesAbstract: We present EDUK, an e�cient dynamic programming algorithm for the un-bounded knapsack problem (UKP), a classic NP-hard combinatorial optimization problem.It is based primarilty on a new and useful dominance relations between object types calledthreshold dominance. We demonstrate that threshold dominance is a strict generalizationof all the previously known dominance relations. We then show that combining it with asparse representation of the iteration domain and the periodicity property leads to a drasticreduction of the solution space. Numerous computational experiments with various data in-stances are presented to validate our ideas and demonstrate their e�ciency. We also compareEDUK with the available and widely used exact algorithm MTU2.Key-words: Integer Programming, Dominances, Dynamic Programming, periodiciy, com-inatorial optimization

(R�esum�e : tsvp)* [email protected]** [email protected]*** [email protected] National de la Recherche Scientifique Institut National de Recherche en Informatique

(UPRESSA 6074) Universite de Rennes 1 – Insa de Rennes et en Automatique – unite de recherche de Rennes

Page 4: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Le probl�eme de sac �a dos non born�e : la programmationdynamique retour �a la programmation dynamiqueR�esum�e :On pr�esente EDUK, un algorithme e�cace de programmation dynamique pour le pro-bl�eme du sac �a dos non born�e. Il repose sur une nouvelle relation de dominances entre lestypes d'objets, appel�ee la dominance de seuil. On montre que cette dominance est une g�e-n�eralisation des di��erentes dominance connus dans la litt�erature. On montre ensuite que sacombinaison avec une repr�esentation creuse de l'espace d'it�eration et la propri�et�e de p�eriodi-cit�e donne une r�eduction consid�erable de l'espace de recherche de solution. De nombreusesexp�eriences avec divers jeux de donn�ees valident nos id�ees et montrent l'e�cacit�e d'EDUK.On compare aussi EDUK et MTU2 un algorithme connu dans la litt�erature.Mots cl�es : programmation en nombre entiers, dominance, programmation dynamique,p�eriodicit�e, optimisation combinatoire

Page 5: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Unbounded Knapsack Problem: Dynamic Programming Revisited 31 IntroductionThe unbounded knapsack problem (UKP) is a classic NP-hard combinatorial optimizationproblem with a wide range of applications [6, 10, 12]. It may be formulated as follows: weare given a knapsack of capacity c, into which we may put m types of objects. Each objectof type i has a pro�t, pi, and a weight, wi, (wi, pi, m and c are all positive integers, andwe have an unbounded number of copies of each object type). Determine, for i = 1 : : :m,the number xi, of i-th type objects to be chosen so as to maximize the total pro�t withoutexceeding the capacity, i.e.,max ( mXi=1 pixi : mXi=1 wixi � c; xi � 0 integer; i = 1; 2; : : : ;m) (1)The two classic approaches for solving this problem exactly are branch and bound (B&B)[12] and dynamic programming (DP) [3, 6, 10], the subject of this paper. It takes O(mc)time, and works in two phases: in the forward phase, we calculate the optimal value of thepro�t function (by �lling up an m� c table), and then we use this in the backtracking phaseto determine an actual solution which has this optimal pro�t.In 1963, Gilmore and Gomory proposed the notion of dominance [7]. Intuitively, if anobject type has a larger (or no smaller) weight, and smaller (or no larger) pro�t than ano-ther, the former will never occur in an optimal solution (recall that we have an unboundednumber of copies). Martello and Toth introduce a more powerful dominance [12], basedon the fact that an object type can be dominated by multiple copies of another. In bothcases, a dominated object type can be discarded from consideration. Often, this is de-tected by preprocessing the inputs. Dudzinski observes that dominance is a partial order[5] and proposes e�cient preprocessing techniques. All previous work on dominance (wellsurveyed by Pisinger [13]) is in the context of branch and bound algorithms (or at leastas a preprocessing step for other algorithms). Exploiting dominance (also called coe�cientreduction) is very important. Empirical evidence shows that the number of non dominateditems is usually extremely small. For uniform distribution of the weights and pro�ts, thishas been analytically con�rmed [11], which actually raises questions about the suitability ofthe \uncorrelated" data sets [12].Another well known approach to reduce the search space is available for dynamic pro-gramming based algorithms. In 1966 Gilmore and Gomory gave a periodicity property [8]which allows signi�cant reduction of the capacity dimension of the search space.A third technique, again applicable for dynamic programming was recently presentedby Andonov and Rajopadhye [2]. It is based on the fact that the standard recurrence ismonotonic. This allows a \sparse" representation of the computation. A similar propertyalso holds for the 0/1 knapsack problem, and an algorithm using a sparse representation(LIFO lists) was proposed at least twenty years ago [9], but this had not previously beenused for the unbounded knapsack problem.PI n�PI-1152

Page 6: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

4 Rumen Andonov and Vincent Poirriez and Sanjay RajopadhyeIn this paper, we use the notion of collective dominance, where an object type isdominated by (a subset of) the other object types, i.e., there is a linear combination of theseobject types which collectively has a larger pro�t and lower weight. Then too, the formercan be discarded. This observation was also made independently by Pisinger [13]. However,detecting such a dominance necessitates the solution of a knapsack (sub) problem, and it isunacceptable as a preprocessing step.Our �rst key idea is that collective dominance could be naturally exploited in a dyna-mic programming algorithm because the solutions of all knapsack sub problems of smallercapacity (and the same set of object types) are available \for free".Our second key idea is based on our empirical observation that even among objecttypes not dominated collectively by the others, many contribute very few times in theoptimal solution. Note that every such object type contributes at least once, namely forthe (sub) problem with capacity equal to its weight. This leads to the notion of thresholddominance, which occurs when a linear combination of the other object types collectivelyhas a larger pro�t and lower weight than a �nite number, say �, instances of the object type.Hence no optimal solution will have more than � copies of this object type. As a result,we can discard this object type from consideration beyond a threshold capacity, namely �times its weight.Our goal is to exploit all these properties (collective and threshold dominance, sparserepresentation, periodicity, and e�cient backtracking). Our main contributions are as fol-lows.� We introduce and formalize the notions of collective dominance and thresholddominance. We present a complete classi�cation and show that threshold dominancestrictly includes collective dominance, which is a strict generalization of the previouslystudied versions of dominance. We show that collective dominance is maximal, interms of coe�cient reduction.� We also show that threshold dominance enables easy testing for periodicity, in thesense that periodicity is attained if and only if the cardinality of the set of objecttypes not dominated beyond a given threshold is 1.� We design a dynamic programming algorithm which incorporates these properties intoa slice wise evaluation of a well known (but not commonly used) recurrence for theforward phase. This eliminates preprocessing, which is incorporated into the mainalgorithm itself. Furthermore, we adapt the sparse representation and algorithm [2]to this recurrence.� We present a backtracking algorithm with the same performance Hu's well know ideaof computing an auxiliary recurrence [10], but without any auxiliary information. Inaddition, our technique enables us to determine all solutions.Irisa

Page 7: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Unbounded Knapsack Problem: Dynamic Programming Revisited 5� We validate our algorithm by numerous computational experiments with a number ofdata sets:{ the standard random (uncorrelated, and weakly and strongly correlated) datasets used in the literature [12, 5];{ similar sets but with signi�cantly larger weight interval (in order to preclude the\false" advantages due to simple dominance);{ realistically random data sets, in which no object type is simply dominated by(i.e., has higher weight and smaller pro�t than) another, since such a situation isalmost inconceivable in real life;{ \hard instances" generated either by analytical formul� or randomly by �lteringthe runs of hundreds of examples;{ data drawn from real life cutting stock problems.We had excellent performance (even for very large instances of the UKP). We arestill searching for problems that are di�cult for our algorithm, whether arti�ciallygenerated (some of which are illustrated here) or from real data.Our implementation, �rst described in [1], preserves two useful properties of dynamicprogramming: (i) knowledge of the solution for any capacity lower than the given capacity;and (ii) ability of reuse the known solutions for solving larger capacity problems. Theremainder of this paper is organized as follows. In Section 2 we develop collective andthreshold dominance, give their properties and their relation with periodicity. Section 3describes how our algorithm by slices incorporates all the above features and also presentsthe backtracking algorithm. Section 4 presents our experimental results, and we concludein Section 5.Notation: Z+ denotes the set of non negative integers. We use the m-vectors, ~w and ~p,respectively, to denote the weights and pro�ts for a given problem instance. An object typecan be identi�ed either by its index, i, or by the ordered pair (wi; pi). M = f1; : : : ;mgdenotes the set of all object types. Any m-vector ~x, in Zm+ , may be viewed as a candidatesolution, whose i-th element, xi speci�es the number of chosen instances of the i-th objecttype. For any candidate solution ~x, W (~x) = Xi2M xiwi and P (~x) = Xi2M xipi are called itsweight and pro�t. We denote by UKPSj a knapsack (sub)-problem with capacity j andwhose object types are restricted to S �M ; the original problem (1) is thus UKPMc . F (j; S)is the optimal pro�t that can be achieved for knapsack UKPSj .De�nition 1 We associate to each candidate solution, ~x, a solution point, namely thepair, (s; f) = (W (~x); P (~x)). We say that ~x is a feasible solution for UKPSc if 8i 2 MnS,xi = 0 (the operator n represents set di�erence), and if its weight, W (~x) is no greater thanthe knapsack capacity, c.PI n�PI-1152

Page 8: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

6 Rumen Andonov and Vincent Poirriez and Sanjay Rajopadhye

i j

Profit

W W

i

j

P

P

I J

Capacity

Profit

(2 x i) (3x i) j CapacityWi

iP

W

jP

I

I

I

JSimple dominance j �s i multiple dominance j �m iFigure 1: Illustration of the two pairwise dominance relations: simple and multiple. Anobject type is shown as a triangle of width w and height p.De�nition 2 Any feasible solution ~x for UKPSc such that P (~x) = F (c; S) is said to bean optimal solution for UKPSc . Opt(UKPSc ) denotes the set of the optimal solutions ofUKPSc .De�nition 3 The pro�tability of an object type is the ratio of its pro�t to weight, piwi .Two object types with equal pro�tability are said to be equipro�table, i � j.In certain algorithms, we may sort the object types by order of pro�tability, i.e., thetotal order relation in M2 de�ned as j v k i� either pjwj < pkwk or ( pjwj = pkwk and wj � wk).This order is often used in the literature, typically in a preprocessing phase either on all theobject types [6, 12] or on pre-selected core problems [5, 12].Let b be the index of the best object type (the one with greatest pro�tability).2 Dominance RelationsWe now de�ne the di�erent dominance relations and describe their properties and inter-actions. We �rst introduce dominance between solution points (actually, any two pairs ofintegers), and then describe dominances between object types. Next, we give the strict in-clusion relations between them, and �nally describe how periodicity can be detected usingthreshold dominance.De�nition 4 A pair, (s0; f 0) dominates another, (s; f), denoted as (s; f)�(s0; f 0), i� s � s0and f � f 0.i is simply dominated by j, written as i�s j, i� (wi; pi)� (wj ; pj).i is multiply (simply) dominated by j, written as i�m j, i� jwiwj k � pipjThese two relations (called pairwise dominances) are illustrated in Fig 1. It is easy toverify that � is a partial order (as noted by Dudzinski for the�m relation [5]). Furthermore,Irisa

Page 9: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Unbounded Knapsack Problem: Dynamic Programming Revisited 7

(2 x i) (i+l) kiWW

lW

LK

II

I

i

P

P

l

Pk

Profit

CapacityFigure 2: Collective (maximal) dominance: k � fi; lgif, for some ~x and ~x0, feasible solutions of respectively UKPTc and UKPT 0c0 , their respectivesolution points satisfy (s; f)� (s0; f 0), then either ~x cannot be an optimal solution, for anycapacity, and for any subproblem which includes at least T [ T 0, or f = f 0 and so ~x0 is alsooptimal.De�nition 5 Let J be a set of object types (i.e., J �M) and i 62 J . The i-th object typeis collectively dominated by J , written as i � J i� 9~x 2 Zm+ j (Pj2J xjwj � wi, andPj2J xjpj � pi). In other words, a positive linear combination of the objects in J yields a\better" (or at least, no worse) solution than i. = fi 2M j i 6�Mnfigg is the set of object types that are not collectively dominated.wmax is the weight of the heaviest object type in .Collective dominance (illustrated in Fig. 2) is maximal in the following sense: if i� J ,the i-th object type can be discarded from consideration without changing the optimalvalue for any capacity, c. Conversely, whenever an object type i is in , it contributes tothe optimal solution for some capacity (in particular, for c = wi).Finally, we observed empirically that, many object types contribute relatively few timesto the optimal solution (for any capacity). The explanation lies in the fact that for eachof these object types, there is a threshold, beyond which it can be substituted by a linearcombination of other objects. This led us to the following de�nition.De�nition 6 Let J �M , i 62 J; t > 0 and consider t0 = �wi = �b twi c+ 1�wi, the smallestmultiple of wi strictly greater than t. We say that the i-th object type is dominated abovethe threshold t by J , written as i�t J i� 9~x 2 Zm+ jXj2J xjwj � t0, and Xj2J xjpj � �pi.Note that if j � i there exists a t (namely l � 1 where l is the lcm of wi and wj), suchthat i �t fjg and j �t fig. To avoid ambiguity, we consider that only the object withthe larger weight is dominated above t by the other. Threshold dominance is illustrated inFig. 3.PI n�PI-1152

Page 10: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

8 Rumen Andonov and Vincent Poirriez and Sanjay RajopadhyekW hj

<<j

hjjW

jj

Capacity Capacityi WjW

W

W

W

ii

jj

k

kp

p

p

W

p

jj

ip

Wi

ProfitProfit

<<j {i}{i;k}

iW

hhFigure 3: Illustration of threshold dominance. (on the left:) object type j is dominated byfi; kg above 2wj , and (on the right:) a particular case, when two object types have the sameinterest (right); here, j �3wj fig.Observe that if i �t J , then for any positive integer, k, i �t+k J and for t0 = b twi cwi,i �t0 J . Hence, we may de�ne the threshold hi of the i-th object type as the smallestmultiple of its weight such that i �hi Mnfig if such an hi exists. Formally, let Hi = ft ji�t Mnfigg. Then, hi is the minimum of Hi (if Hi is empty, hi =1).Proposition 1 If i�t J , for any c > t, there exists ~x0 2 Opt(UKPMc ) such that x0iwi < t.Proof: Recall that t0 = �wi = �b twi c+ 1�wi is the smallest multiple of wi strictlygreater than t. By de�nition of threshold dominance, 9~z : Xj2J zjwj � �wi andXj2J zjpj � �pi. Let ~x be a solution vector in Opt(UKPMc ). If xi < � then ~x isthe desired solution. Otherwise let xi = k� + r for positive integers k; r such that0 � r < �. We construct the desired optimal solution by \replacing" the k� copiesof the object type, i by k copies (of the part corresponding to the object types in J)of ~z. Thus the vector ~x0 given byx0j = 8<: x0j = xj if j 2MnfignJx0j = xj + kzi if j 2 Jx0j = r if j = iis the desired optimal solution.Hence if i�t J the i-th object may be discarded beyond a capacity of t. The thresholdhi of an object type is thus the largest multiple, �, of its weight, for which the optimalsolution ~x of UKPMhi ) is unique and given by xk = 0; k 6= i and xi = �. This gives asu�cient condition to detect the threshold.Proposition 2 If y mod wi = 0 and 9~x 2 Opt(UKPMy ) such that xi < y=wi then hi < y.Proof: A feasible solution for UKPMy consists of just y=wi copies of i, and has thepro�t ypi=wi. Since ~x 2 Opt(UKPMy ), we have xipi + Xj2Mnfig xjpj � ypi=wi and

Irisa

Page 11: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Unbounded Knapsack Problem: Dynamic Programming Revisited 9Xj2Mnfig xjwj � y � xiwi � y. Therefore the object type i is dominated above t =y � xiwi and hence hi < y.We now present some useful properties the dominances and as well as some relationshipsbetween them.1. i is collectively dominated by fjg i� i 6= j and i is multiply dominated by j.2. i is collectively dominated by the set J i� i is threshold dominated above 0 by J .3. The object type with the smallest weight is never collectively dominated.4. There is no threshold beyond which the best object type is threshold dominated.5. If i is threshold dominated above some t by a set J then there exists at least one objecttype j 2 J such that i v j.The above properties lead to the following result.Theorem 1 The four dominance relations are in strictly increasing order of generality.Note that the cost required to check for a dominance relation is increasing from �s to�t. In fact, � and �t require the solution of knapsack sub-problems.2.1 PeriodicityWe now describe the relation between threshold dominance and periodicity, a well knownproperty of the UKP [6, 8]. Periodicity states that beyond a capacity, y, only the best objecttype contributes again to the solution. Hence, knowing the solutions for each capacity belowy is su�cient to solve the problem for any capacity. Formally it can be formulated as follows:9y such that 8c � y; F (c;M) = d(c� y)=wbepb + F (c� d(c� y)=wbewb;M) (2)We de�ne the period level, y? as the smallest y satisfying (2). The following result relatesthe periodicity property with the threshold dominance and gives a simple test to check ifthe period level is reached. For any capacity y, we de�ne U(y) = fj 2M j j 6�y Mg as theset of object types that are not dominated above a threshold y.Theorem 2 The set U(y) is a singleton, i� the capacity y satis�es Eqn. 2.Proof:) Obvious.( Let k be an object type di�erent from the best object type. We will show thatk �y Mnfkg.Let t0 be the smallest multiple of wk such that t0 > y. According to Eqn. (2) thereexists ~x 2 Opt(UKPMt0 ) such that xb > 0 which implies xk < y=wk. Hence fromproposition 2 we obtain hk < y. Therefore only the best object type is not dominatedabove y.Corollary 1 y? = minfy such that j U(y) j= 1gPI n�PI-1152

Page 12: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

10 Rumen Andonov and Vincent Poirriez and Sanjay Rajopadhye3 An E�cient ImplementationWe now describe how the above properties are combined into an e�cient algorithm, EDUK(E�cient Dynamic programming for the Unbounded Knapsack problem). The basic recur-rence for the forward phase is given below. It is a standard, though not common recurrence,and can be easily derived from the principle of optimality. F (c;M) = g(c;m) where,g(j; k) =8><>: j < 0 and k = m ) �1j = 0 or k = 0 ) 0j > 0 ) max(g(j; k � 1); g(j � wk ;m) + pk) (3)This is viewed as an m� c table, having m columns, one for each object type. The columnshave height c and are \�lled from top to bottom" (origin at the top-left). If and when wedetect that an object type is (collectively) dominated by the other remaining object types,its column is deleted, thus achieving coe�cient reduction. If and when we detect that thecapacity has exceeded the threshold beyond which an object type does not contribute anymore, the remainder of its column is deleted, thus achieving capacity reduction. And �nally,we also exploit sparsity : in the (sub) columns where the recurrence is to be evaluated, wedo not evaluate it for all values of j. Since the function g(j; k) is monotonically increasingin j, we need to compute it only when it changes value (at the so called critical points).Our challenges are to detect dominance (collective as well as threshold) as (i) cheaplyand (ii) early as possible, and (iii) to integrate it with the sparse algorithm, i.e., the datarequired for this detection must be available only by inspecting the results at critical points.The recurrence is evaluated by horizontal slices. This can be done easily if the lastcolumn of the table is retained up to the start of the slice (say capacity c0), and the boundaryconditions of Eqn. 3 are modi�ed appropriately. The height of each slice could be constant (aparameter of the algorithm) or variable. For each slice, EDUK maintains g0 = F (c0;M), thevalue of Opt(UKPMc0 ) and the index of the current best object, b0. The main data structuresare three lists:� R, a list of residual object types. It is initialized to M , and (at the start of anyslice) contains the set of object types whose weights are larger than c0 (hence we don'tyet know whether they are collectively dominated or not), and which have not beendiscarded by some \easy" tests (described later).� I, a list of size p, a parameter, which contains the p object types introduced duringthis slice (i.e., instead of constant-height slices, we have slices of variable height, butwith exactly p new object types introduced in each slice).� U , the list of \currently non-dominated" object types (at the start of each slice).These are object types that (i) have weights smaller than c0, (ii) are not collectivelydominated by the other object types, and whose (iii) thresholds are not less than c0.Irisa

Page 13: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Unbounded Knapsack Problem: Dynamic Programming Revisited 11Evaluation of a slice: The computation performed in evaluating a slice consists of thefollowing.1. Pre-processing: Make a single pass through R in order to determine the p lightestobject types to introduce in the slice. The list I is maintained sorted by increasingweight. For each element from R, EDUK perform two simple tests: test if it is collec-tively dominated by g0, and test whether it is multiply dominated by the best objecttype so far. It is a tradeo� whether this test, which involves a couple of arithmeticoperations should be done here for all elements of R, or in the post-processing stepfor only a few object types. If so it is immediately discarded. If it passes these tests,insert it in I, and while doing so, check that it is not simply dominated by the objecttypes already there. Then too it is discarded (these are the \easy tests" mentionedabove).2. Main: For each i in I, resolve the successive knapsack sub-problems of capacity wi,using only object types in U . This is done by evaluating the recurrence g(j; k), usingthe sparse algorithm (see Sec. 3.1). At the end of each such evaluation, check if theobject type i is collectively dominated by the others. If so, it is discarded, otherwiseit is appended at the end of U .3. Post-processing Update the current best object type (using those elements of Ithat were added to U in step 2 above. Also update the current optimal value,(wp; F (wp;M)). Finally, for all object types in U , test to see if their threshold haspassed, and if so, remove them from U . If U is sorted in decreasing order of pro�tabi-lity, it is possible to detect threshold dominance, at the earliest possible moment, andthis may lead to some savings (this implementation tradeo� is not discussed here).Standard Phase The slice-wise evaluation is performed repeatedly until R becomesempty. This concludes the reduction phase. Assuming that the capacity c and the periodlevel have not yet been reached (otherwise we are done), we move on to the standard phase.All that remains is to continue to evaluate the recurrence, and this done with constant-heightslices (but without the preprocessing above). As before, we detect threshold dominance atthe end of each slice, and thus U may continue to decrease. Each time this occurs we testto see if U is a singleton. When this happens, we know that we have attained periodicity,and the solution is computed with the closed form formula of Eqn. 2, and we are done!3.1 A sparse representationYet another method of search space reduction is the so called sparse representation [2]. Acomplete discussion is beyond the scope of this paper, and we simply present an overview ofthe method here. It is based on the key observation that the function g(j; k) is monotonicallyPI n�PI-1152

Page 14: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

12 Rumen Andonov and Vincent Poirriez and Sanjay Rajopadhyeincreasing in j. As a result, there is no need to compute the recurrence for all values of j(i.e., points in a column). It su�ces to calculate it at only those j indices where its valuechanges. If this is done, the value of g(j; k) over the entire k-th column can be representedas a sequence of pairs (s; f), similar to the representation used to store and manipulatesparse matrices (hence the name). The sparse representation, which is fully exploited inour algorithm requires lazy data structures known in functional languages as lazy lists,(or streams). It is for this reason that we have chosen to implement our algorithm in aprogramming language in the functional style (although an implementation in C, Fortranor a more conventional language could arguably have been more e�cient, we found that theprogram development time was considerably reduced).3.2 Backtracking phaseIn our entire discussion so far, we have not mentioned the backtracking phase of the dynamicprogramming algorithm, which consists of determining the actual solution, ~x, once therecurrence g(j; k) has been evaluated. A common criticism of DP is that like its worst caserunning time, its space complexity is claimed to be �(mc), i.e., the entire table has to besaved in order to compute the solution. Often overlooked is the fact that, for the unboundedknapsack problem, we can compute an auxiliary recurrence during the forward phase, andthen, backtracking can be performed by traversing only the last column of the table. Thiswas initially presented in Hu's text in 1969 [10]. Another algorithm was presented byGar�nkel and Nemhauser [6]. Both algorithms use an auxiliary function during the forwardphase and thus two lists of size c are kept in the memory. We now present an algorithmwhich achieves the same space and time complexity as these, but with no overhead in theforward phase. In addition, our technique enables us to determine all solutions, unlike theothers.We start with recursions and notations similar to these of Gar�nkel and Nemhauser [6],but we use them in a slightly di�erent manner and also take into account the notion ofdominance.Proposition 3 Let D(y) = fk : xk > 0 in some ~x 2 Opt(UKPMy )g. Then when F (y;M) 6=0 and for some k 2M such that y � wk � 0 we haveF (y;M)� pk = F (y � wk;M), k 2 D(y) (4)Recall that = fj 2 M j j 6� Mnfjgg and let us initialize S := . We then have thefollowing algorithm where the vector ~x is initialized to ~0.while S <> empty and y > 0 dolet k in Sif y-w_k < 0 then S:= S-{k}else if F(y-w_k,M) = F(y,M)-p_k then (x_k := x_k + 1; y := y - w_k)else S:= S-{k} Irisa

Page 15: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Unbounded Knapsack Problem: Dynamic Programming Revisited 13Clearly, the algorithm is correct if were replaced by M , since it follows directly fromEqn. (4). Because collective dominance is maximal, no solution is lost when using . Mo-reover is known at the end of the forward phase and is often very small. Furthermore,the choice of k in the second line above allows us to systematically determine all solutions.A useful heuristic for rapidly obtaining one solution is to follow the order of the decreasingpro�tability of object types (i.e., the best object type to be considered �rst). Finally, thealgorithm above can be directly adapted to exploit sparsity, and this is implemented in theworking version of our algorithm.4 Experimental ResultsWe now evaluate the performance of the EDUK algorithm. All experiments were run on a64 bit DEC/Alpha 2100 5/250 machine. Our code was written in objective caml1. Forcomparison purposes, we also give the results of the running time (on the same machine)with the standard branch and bound algorithm, MTU2 due to Martello and Toth [12],written in fortran 77, available in the public domain. It is recognized as the de factostandard program for the UKP. Although the two are implemented di�erently, our resultsnevertheless serve to indicate the trends. Moreover, the EDUK was developed using arigorous program development approach, where the transformations on the program werebased on formal properties of the recurrences involved, and as a result the �nal ocamlprogram was very close to the recurrences. We expect that if the algorithm was recodedin fortran 77, the performance would improve by a small constant factor, since we expectthat the implementation of arrays is better optimized in fortran 77.Typically, three kinds of data sets have been used in the literature for comparing theperformance of knapsack algorithms [12]:� Random data sets, where weights and pro�ts belong to a uniform random distribution(URD) within a speci�ed range. The bulk of our results are based on such data sets.Our goal is to separately highlight the performance gains due to each of the propertieswe have presented. Our experiments highlight some limitations in current methods ofgenerating random data sets for the UKP, and we also present results with what wecall \realistic random" data sets.� Pathological examples which exhibit worst case behavior. Since the knapsack problemis NP-complete, it is fairly easy to contrive such problem instances, for almost any givenalgorithm, strategy or heuristic (for example, Chung et al. [4] give hard instances forthe branch and bound algorithm). We give formul� to generate a family of problems1ocaml is a language in the ML family. It was developed at INRIA, Rocquencourt, France. Documentationmay be found at: http://pauillac.inria.fr/camlPI n�PI-1152

Page 16: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

14 Rumen Andonov and Vincent Poirriez and Sanjay Rajopadhyefor which the EDUK algorithm has high running times, and try to identify which ofthe properties contribute to an improvement.� Real life examples drawn from actual problem instances. We have posted a requestfor data sets on the sci.op-research newsgroup, and have also contacted researchersin the domain. So far we have one class of such data, which we present here. Thecode will be shortly made publicly available at our ftp site, and we welcome additionalfeedback.4.1 Random ExamplesAs described by Martello and Toth [12], one can generate three kinds of uniformly randomdata sets. In the uncorrelated case, the weights and pro�ts are both chosen randomly,independent of each other. In the strongly correlated case, the weights are chosen randomly,but the pro�t is a linear (actually a�ne) function of the weight (i.e., pi = awi + b for somepredetermined constants, a and b). In weakly correlated data sets the wi's are random, andpi is chosen randomly in the interval awi� b. The interval for weights used by Martello andToth is [10; 103] and for weakly and strongly correlated data sets, the constants are de�nedas a = 1 and b = 100.Johnson and Khan [11] proved that for uncorrelated data sets, there are very few ob-ject types that are not multiply dominated, and that the expected value of the number of(multiply) non-dominated object types is as low as 1.6! An intuitive explanation is given inFig 4. As a result, algorithms that exploit (at least) multiple dominance such as MTU2 andEDUK are bound to have excellent performance on such data sets, which are not thereforetruly representative.Another interesting point concerns the strongly correlated data sets. Since the pro�t ofan object type is a function of its weight, the number of distinct object types is simply therange of possible weights. For the interval [10; 103], none of the corresponding 990 objecttypes is (simply) dominated by any other, whereas the 10 smallest object types are su�cientto (multiply) dominate all the other object types (it is easy to check that the object type ofweight 10 (multiply) dominates any object type with weight greater or equal than 20). Now,if m � 1000, all distinct object types will be chosen with high probability, and hence, thenumber of non-dominated object types (for both the � and �s relations) will be constant(resp. 10 and 990) independent of m. To avoid this e�ect, the range of possible weightsshould be reasonably large compared to m.In spite of the above limitations, such data sets are widely used in the literature, and arealmost de rigeur in comparing various algorithms for the knapsack and related problem(s).For this reason, we �rst present a series of experiments using these data sets, but with ane�ort to reduce the e�ects of duplicates. We then present experiments illustrating the beha-Irisa

Page 17: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Unbounded Knapsack Problem: Dynamic Programming Revisited 15

wwmaxwminpminpmaxpA randomly selected object type (weight-pro�tpair) may be viewed as a point in the rectanglede�ned by [wmin; pmin], [wmin; pmax], [wmax; pmin]and [wmax; pmax] as shown. Now, if any other ob-ject type is chosen randomly, and happens to liein the shaded region, it either dominates or is do-minated by the �rst point.Thus, there is high probability that uncorrelateddata sets contain many object types that are (sim-ply) dominated by some other, especially if thenumber of object types is reasonably large.Figure 4: Why completely uncorrelated data sets are unrealistic for the UKPvior of both algorithms as the capacity is increased, and �nally we de�ne our realisticallyrandom data sets, and describe the performance on such examples.In designing our experiments, our goal was to show a progression of increasing runningtimes and compare the performance of the two algorithms, and also to reduce the impactof duplicates. Now, the running time is very sensitive to the number of (multiply) nondominated object types rather than m (the explanation is the one described above: a largenumber of object types are dominated in uncorrelated data sets, and thus tend to be prunedout early by any reasonably smart algorithm). Pisinger recently observed [13] that thenumber of non dominated object types is very sensitive to wmin: the larger the value ofwmin, the greater the number of non-dominated object types.Based on these observations, we chose to �x m at a reasonably large value and allowedwmin to vary. In particular, we used the following parameters: m = 105, wmax = 105,and wi 2 [wmin; wmax], with wmin varying between 1 and 104. The capacity was �xed to alarge value greater than 2� 108. We generated the three standard data sets (uncorrelated,weakly and strongly correlated). Fig. 5 summarizes our results, showing the running timesof EDUK and MTU2. The column UD gives the number of (collectively) non-dominatedobject types. Note that the number of multiply non-dominated object types determined byMTU2 is not available, but is no larger than this value. For the uncorrelated data sets, piwas chosen randomly in the interval [pmin; pmax], for the weakly correlated data sets, it waschosen randomly in the interval [wi � 100; wi +100], and for the strongly correlated case, itwas set to wi+100. The random number generator we used was the ocaml module random.As noted above, EDUK is implemented in ocaml and MTU2 in fortran 77.As can be seen, EDUK consistently outperforms MTU2, except possibly for the earlypart of the uncorrelated data sets. However, in this case (i) both algorithms are fast, (ii)EDUK starts having better performance as the number of non dominated object types startsto increase, and (iii) we have already argued that such data sets are not realistic. Beforepresenting our results with more realistic data sets, we �rst discuss how the running timedepends on the capacity.PI n�PI-1152

Page 18: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

16 Rumen Andonov and Vincent Poirriez and Sanjay Rajopadhyewmin UD eduk mtu21 1 0.150 0.0623 1 0.151 0.06210 2 0.153 0.06430 3 0.150 0.061100 4 0.156 0.060300 4 0.155 0.0611000 6 0.165 0.1163000 8 0.180 0.19310000 9 5.638 21.03015000 7 0.425 6.42620000 17 0.466 10.206 wmin UD eduk mtu21 1 0.161 0.0653 1 0.171 0.13510 3 0.165 0.07430 5 0.161 0.056100 10 0.151 0.059300 30 0.183 0.0561000 87 0.281 0.1183000 260 1.293 1.1915000 471 2.834 1 > 20007000 1084 5.123 2 > 20009000 1890 8.668 4 > 200010000 2776 10.960 5 > 2000wmin UD eduk mtu21 1 0.150 0.0533 2 0.158 0.05310 6 0.158 0.05930 18 0.161 0.058100 64 0.168 0.062300 187 0.313 10 > 20001000 645 2.638 7 > 20003000 1932 32.315 8 > 20005000 6026 108.516 10 > 20007000 10642 242.971 10 > 20009000 16639 444.568 10 > 200010000 23355 564.393 7 > 2000

0

5

10

15

20

25

5000 10000 15000 20000

seco

nds

wmin

EDUKMTU2

0

10

20

30

40

50

60

70

80

90

100

5000 10000

seco

nds

wmin

EDUKMTU2

0

200

400

600

800

1000

1200

1400

1600

1800

2000

5000 10000

sec

onds

wmin

EDUKMTU2

Running time and the number of (collectively) non dominated object types vs wmin, for m = 105 foruncorrelated (top), weakly correlated (middle) and strongly correlated (bottom) data sets. Each tableentry (and point on the graphs) is the average of 10 runs (truncated for the UD column). The entryn > 2000 indicates that n out of the 10 problem instances took more than 2000 seconds (timed out). Thishappens only for mtu2, and we generously assume that the running time is 2000 seconds in this case.Figure 5: Comparison of eduk and mtu2 for the standard random experiments.4.2 Sensitivity of Running time to the CapacityDynamic programming is often rejected out of hand for very large capacities. Hence it isimportant to study how the capacity a�ects the running time. We conducted the followingIrisa

Page 19: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Unbounded Knapsack Problem: Dynamic Programming Revisited 17av eduk min eduk max eduk av mtu2 min mtu2 max mtu20.088 0.083 0.100 0.074 0.039 0.1220.083 0.066 0.116 0.049 0.038 0.1330.086 0.066 0.100 0.047 0.037 0.0560.091 0.083 0.100 0.051 0.040 0.1220.105 0.083 0.116 0.054 0.040 0.1400.094 0.066 0.116 0.048 0.039 0.0860.094 0.083 0.116 0.082 0.040 0.1780.082 0.066 0.100 0.068 0.040 0.1260.094 0.083 0.133 0.093 0.040 0.2260.095 0.083 0.116 0.066 0.040 0.1020.091 0.066 0.116 0.063 0.037 0.226av eduk min eduk max eduk av mtu2 min mtu2 max mtu21.203 1.133 1.283 0.211 0.048 10.9741.107 1.016 1.416 0.869 0.048 4.3550.905 0.883 0.933 0.668 0.049 2.3621.109 1.083 1.133 0.517 0.047 1.7951.124 1.083 1.166 0.092 0.049 2.1541.157 1.116 1.183 0.271 0.048 9.8361.083 1.050 1.116 0.105 0.048 3.5371.374 1.216 1.533 3.184 0.048 115.9921.141 1.100 1.166 0.337 0.049 20.9491.221 1.100 1.483 1.348 0.049 9.6701.143 0.883 1.533 0.760 0.047 115.992Average, maximum and minimum running times (in seconds) of EDUK and MTU2, for uncorrelated(above) and weakly correlated (below) data sets. Each line corresponds to 500 instances (each for arandom capacity in the interval [105; 2:6� 106]) for a given set of weights and pro�ts chosen randomlywith the following parameters: m = 5 � 104, wmax = 105 and wmin = 5 � 103. The last line in eachtable is the average of the columns.Figure 6: Capacity sensitivity of EDUK and MTU2experiment: we set wmin = 5000, wmax = 105 and m = 50000. For each set of values ofweights and pro�ts, we measured the running time for 500 values of capacity, randomlychosen in the interval [105; 2:6� 106]. Fig. 6 presents the average, minimum and maximumof these values for the two algorithms (since MTU2 very often requires more than 1000seconds for strongly correlated data sets with these parameters, we report here only resultsfor non-correlated and weakly correlated data sets).These �gures illustrate that the time required by EDUK is stable, the average time is atthe middle of the interval and the maximal time is two times the minimal one. On the sameexamples the maximal time required by MTU2 can be 2500 times its minimal time.In order to try to illustrate the behavior more precisely, we conducted 1000 similarruns for a particular (weakly correlated) data set from the above experiment, but chosethe capacity randomly in the range [105; 119400]. Fig 7 reports the running time of bothalgorithms for each value of the capacity. We observe that the running time of MTU2 isPI n�PI-1152

Page 20: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

18 Rumen Andonov and Vincent Poirriez and Sanjay Rajopadhye

0

10

20

30

40

50

60

70

80

90

100000 105000 110000 115000

sec

onds

cap

MTU2EDUK

Running times for 1000 runs of EDUK and MTU2, for a particular instance of uncorrelated (?) dataset, each for a random capacity in the interval [105; 119400]. The average (max/min) times for EDUKand MTU2 were respectively, 1.119 (1.08/1.25) seconds, and 6.731 (0.028/82.184) seconds.Figure 7: Capacity Sensitivity of EDUK and MTU2 (cont'd)nearly periodic with a period equal to the weight of the best object type (here, about 5000)while that of EDUK is constant.We conclude that EDUK is more stable than MTU2 to variations in the capacity. Thisis due to the way in which EDUK exploits threshold dominance dominance (and henceperiodicity), and also the sparse representation.4.2.1 Realistic Random Data SetsIn real life, it is almost inconceivable that a heavier object has smaller pro�t (i.e., therecannot be any simple dominance). Realistically random data sets are simply those thatpreclude simple dominance by �at. They may be generated as follows: �rst generate mdistinct values of weights, randomly in the interval [wmin; wmax], and m values of pro�tsrandomly in the interval [pmin; pmax]. Then (i) sort these two sequences, (ii) pair up theweights and pro�ts, and (iii) permute this list randomly (or return it to the original order inwhich (one of) the weights (or pro�ts) were generated). This yields a realistic uncorrelated22Actually, the term uncorrelated is a misnomer, since there is a correlation (though not the linear/a�neone of the strongly correlated data sets) between the weights and pro�ts: a larger weight implies a greaterpro�t.Irisa

Page 21: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Unbounded Knapsack Problem: Dynamic Programming Revisited 19wmin UD eduk mtu21 6 0.098 0.0213 69 0.873 1 > 100010 460� 1 > 1000 3 > 100030 1780 111.075 3 > 1000100 7581� 4 > 1000 9 > 1000300 27724� 10 > 1000 9 > 10001000 �� 10 > 1000 10 > 10003000 �� 10 > 1000 10 > 1000Realistic data sets are those where there is no simple dominance: larger weight implies a larger pro�t.The parameters were: m = 5 � 104, wmax = 105, wi 2 [wmin; wmax], pi 2 [wi � 100; wi + 100] andc > 2 � 108. Both EDUK and MTU2 have fairly poor performance on such data sets. Each line is theaverage of 10 runs on di�erent instances, except for the lines containing an asterix, where we were unableto solve (some instances of) the problem within 3000 sec. In such cases, the number in the UD columnis the average of the available data.Figure 8: EDUK and MTU2 running times on realistic random data setsdata set, and although weakly correlated data sets could also be generated, they do not seemvery interesting. Of course, strongly correlated data sets already preclude simple dominance(provided duplicates are ruled out).These results demonstrate that for this family of problems, the UKP is a di�cult problemfor both EDUK and MTU2 although EDUK remains faster. This is due to the fact thathere, the number of undominated object types (for�) grows rapidly with wmin and then the\real" size to be considered, which depends of this number, is actually large. We conjecturethat interesting real life problems fall in this family.4.3 Hard ProblemsSince the UKP problem is known to be NP-complete, it is possible to contrive examples ofproblems that force any given algorithm into exponential running time. Here, we exhibittwo kinds of such hard problems. We found the �rst one by chance when generating randomexamples. Problems of the second kind are built with formul� that ensure that no objecttype is collectively dominated by the others.4.3.1 A random exampleAs seen above, the performance of both EDUK and MTU2 appears to be poor for the realisticrandom data sets, and one of our hard example arose in the context of such experiments.It was generated with m = 105, wmin = 30, wmax = 105, pmax = 106 c = 106. It turnedout that for this example, there were 22,077 (collectively) non dominated items, and thethreshold of the best object type was greater than the capacity (which was 106). EDUKtook 3,430 seconds to solve it, while MTU2 failed to solve it in 20,000 seconds.PI n�PI-1152

Page 22: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

20 Rumen Andonov and Vincent Poirriez and Sanjay RajopadhyeProb eduk mtu2F1 4.316 113.734F2 4.183 103.137F3 3.983 98.432F4 3.800 96.533F5 3.866 93.107F6 3.866 94.688F7 3.766 103.925F8 3.666 104.185F9 3.633 86.464F10 3.600 92.914F11 3.616 77.667F12 3.733 111.204F13 3.566 83.674F14 3.666 78.156F15 3.533 64.176F16 3.600 55.614F17 3.633 91.784F18 3.600 87.066F19 3.516 83.365F20 3.516 72.872F21 3.483 75.390F22 3.583 100.651

Prob eduk mtu2G1 1.716 3.355G2 4.950 > 20000G3 7.816 0.601G4 17.016 0.365G5 14.816 2.276G6 20.983 > 20000G7 21.033 > 20000G8 26.016 > 20000G9 22.350 0.081G10 37.716 76.941G11 30.100 2.331G12 18.833 > 20000G13 25.133 > 20000G14 14.233 > 20000G15 15.983 > 20000G16 12.683 1.358G17 13.366 > 20000G18 19.150 > 20000G19 11.966 > 20000G20 13.133 > 20000G21 20.683 76.413G22 8.366 77.602Prob eduk mtu2C 1643.483 > 20000Di�cult 3430.333 > 20000Figure 9: Hard problems constructed from a particular realistically random data set.We used this particular problem to construct two families of smaller but still hard pro-blems (each with 1001 object types). Let C be the set (called the core set) of the 22,077non dominated object types, sorted by increasing weight and let Ci denote the i-th elementof C. Thus, u = C22;077 is the heaviest non dominated object type. The �rst family, F ,is constructed by simply partitioning C \by blocks" into subsets of 1000 consecutive objecttypes, and throwing in u, i.e., Fi = fCj j(i � 1) < d j1000e � ig [ fug, yielding 22 problems.The second one G is constructed by partitioning \cyclically", i.e., taking every 20-th elementof C. Formally, Gi = fCj jj mod 20 = ig[fug. Fig. 9 reports the time needed by EDUK andMTU2 to resolve these problems (as before, > 20000 denotes that the time required exceeds20,000 seconds). The last table gives the times required to solve the core problem, C itself,and the original problem with 105 object types.Irisa

Page 23: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Unbounded Knapsack Problem: Dynamic Programming Revisited 215000

10000

15000

20000

5000 10000 15000 20000

pro

fit

weight

formula 1x=y

5000

105000

205000

305000

405000

505000

605000

5000 10000

pro

fit

weight

formula 3x=y

Figure 10: Sample data sets for Eqns. 6 and 7. Observe how the pro�tability decreases(resp. increases) with weight.These experiments indicate that EDUK seems to consistently outperform MTU2 on thisclass of data.4.3.2 Arti�cially constructed hard problemsWe now construct pathological examples which are hard for the EDUK algorithm. They arebased upon the following observations: if the pro�t monotonically increases with the weight,then there is no simply dominated object type; if, in addition, wmax < 2wmin, no object typeis maximally dominated. Three such data sets (illustrated in Fig. 10) can be constructed asfollows. Choose the weights randomly and sort them in increasing order. The pro�ts are asgiven by Eqns. (5{7) below (for the latter two, p1 is chosen randomly).pi = wi (5)pi = max �pi�1;�wipi�1wi�1 �� (6)pi = �wipi�1wi�1 �+ i� 1 (7)It is easy to verify that in each of the three cases, no object type is collectively dominated,and are distinguished by the fact that the pro�tability of the object types is (i) constant,(ii) decreases or (iii) increases with the weight. The three data sets will have di�erentperformance because of threshold dominance. For example, it is well known that if severalobject types have maximum pro�tability (as is the case with Eqn. 5) then the periodicity isgreater [6]. The weight-pro�t pairs so generated were randomly permuted before we ran theexperiments3. Fig. 11 presents the running times which clearly highlight that the problemremains di�cult.3As an aside, we observed that MTU2 was very sensitive to this. For example, for a particular data set(with wi = pi with 1000 object types and with wmin = 5000 and wmax = 10000), if the object types aresorted by increasing weight, MTU2 is unable to resolve the problem in 10,000 seconds, while if the samedata set is not sorted, it takes 0.007 seconds!PI n�PI-1152

Page 24: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

22 Rumen Andonov and Vincent Poirriez and Sanjay RajopadhyeFormula 1m eduk mtu21000 40.48 > 10002000 116.87 > 10004000 304.83 > 10005000 396.55 > 100010000 > 1000 > 100020000 > 1000 > 1000Formula 2m eduk mtu21000 44.43 0.022000 63.38 0.014000 147.33 0.015000 197.87 0.0110000 551.73 0.0220000 > 1000 0.02

Formula 3m eduk mtu21000 2.78 126.022000 13.17 > 10004000 52.97 > 10005000 88.22 > 100010000 403.32 > 100020000 > 1000 > 1000Figure 11: Hard problems generated by formul�4.4 Other ExamplesChung et al. have given a formula to build a family of instances of the UKP [4].wi = i+ w0; pi = wi + zwhere w0 and z are parameters. These problems are shown to be hard for the branch andbound algorithm family, in the sense that the B&B algorithm takes provably exponentialtime. Observe that these problem instances are strongly correlated in the Martello andToth sense, except that the weights are generated in order. Chung et al. present a numberof instances of the problem, which for even relatively small values of m and c provokeexponential time for branch and bound algorithms (some of them exceed their machinecapacity). All the \hard" instances were resolved by EDUK in less than 0:02 seconds4!In practice, one often encounters problems with small capacity (the UKP often arises as afrequently called subproblem in other combinatorial optimization problems). For these pro-blems, it seems that for preprocessing may not be prohibitively expensive, and the overheadof computing by slices may be high. For this reason we wanted to compare the EDUK al-gorithm with a straightforward dynamic programming algorithm, with a preprocessing stepremoving object types according to simple dominance, �s. We have done that comparisonwith a set of 265 problems generated with a generator that Val�erio de Carvalho and Rode-riguez [14] have kindly allowed us to use. The values shown below are the average values of:the number of object types, the capacities, the time of EDUK, the time for SDP (standarddynamic programming plus sorting), the time for MTU2 and the number of non-dominatedobject types (using the two dominance relations).We give also the minimum and maximumtime for EDUK and MTU2. They lead us to conclude that the EDUK algorithm is bet-ter than the standard DP with preprocessing to remove (simply) dominated object types.Also note that for these example, maximal dominance does not give increased savings (as4We do not claim that this improved performance is due it is due to EDUK. It is well known that evennaive dynamic programming gives better results than branch and bound on these examples.Irisa

Page 25: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

Unbounded Knapsack Problem: Dynamic Programming Revisited 23compared to simple dominance) and we conjecture that the improvement comes merely byavoiding the sorting.m c EDUK MTU2 SDP � �s edukmin mtu2min edukmax mtu2max1288 1035 0.033 0.023 0.039 124.61 125.17 0. 0. 0.066 0.4385 ConclusionsWe have developed some fundamental properties of the unbounded knapsack problem,namely dominance relations, periodicity, sparse representation and e�cient backtracking.We incorporated all these properties in the framework of dynamic programming recur-rence, using a rigourous transformation program approach (not described here due to spaceconstraints). This allowed us to design an e�cient algorithm with very low sensitivity tovariations of the knapsack capacity. This stability is one of the most signi�cant di�erencesbetween dynamic programming and branch and bound algorithms. We illustrate this advan-tage by comparing our results with MTU2, which is the most widely used today branch andbound algorithm for solving UKP. Our results indicate the viability of dynamic programmingfor this NP-hard problem.We validate our ideas by a large number of computational experiments and comparisons.Our results con�rm again that even very large randomly generated instances of UKP havefew undominated items and can be reduced to small core problems. Therefore the numberof object types (m) is not a good measure for the di�culty of the problem. We observe thatthe value of the minimum weight (wmin) is more relevant to indicate the hardness of theinstance. However this parameter is not su�cient by itself since independently of its valuean UKP is easily solvable when the instance contains simply dominated object items.We further show that the problem remains time consuming in the case of the so calledrealistic random instances, i.e., where no object type is simply dominated and wmin is assmall as 10.We have identi�ed non-trivial data instances for UKP by using these criteria in conjunc-tion, no simply dominated object types and increasing wmin. We believe that it should beinteresting to compare other exact algorithms on these instances, and for this purpose weare developing a database of benchmarks5.Acknowledgments The authors are very thankful to Nicola Yanev for many helpful and encouragingdiscussions, and for his numerous suggestions which contributed signi�cantly for the current version of thepaper.5The current version is at http://www.univ-valenciennes.fr/limav/knapsacks/ukp/datasPI n�PI-1152

Page 26: static.aminer.org · INSTITUT DE RECHERCHE EN INFORMATIQUE ET SYSTEMES` ALEATOIRES´ Campus de Beaulieu – 35042 Rennes Cedex – France Tel. : (33) 02 99 84 71 00 – Fax : …

24 Rumen Andonov and Vincent Poirriez and Sanjay RajopadhyeReferences[1] R. Andonov, V. Poirriez, and S. Rajopadhye. E�cient Dynamic Programming for the UnboundedKnapsack Problem. Technical Report 96-07, LIMAV, Universit�e de Valenciennes, Le Mont Houy, B.P.311, 59304 Valenciennes Cedex, France, October 1996.[2] R. Andonov and S. V. Rajopadhye. A sparse knapsack algo-tech-cuit and its synthesis. In InternationalConference on Application-Speci�c Array Processors (ASAP-94), pages 302{313, San Francisco, August1994. IEEE.[3] R. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.[4] C-S. Chung, M. S. Hung, and W. O. Rom. A hard knapsack problem. Naval Research Logistics,35:85{98, 1988.[5] K. Dudzinski. A note on dominance relation in unbounded knapsack problem. Operations ResearchLetters, 10:417{419, 1991.[6] R. Gar�nkel and G. Nemhauser. Integer Programming. John Wiley and Sons, 1972.[7] P. C. Gilmore and R. E. Gomory. A linear programming approach to the cutting stock problem { PartII. Operations Research, 11:863{888, 1963.[8] P. C. Gilmore and R. E. Gomory. The theory and computation of knapsack functions. OperationsResearch, 14:1045{1074, 1966.[9] E. Horowitz and S. Sahni. Fundamentals of Computer Algorithms. Computer Science Press, Maryland,USA, 1978.[10] T. C. Hu. Integer Programming and Network Flows. Addison-Wesley, 1969.[11] R. Johnson and L. Khan. A note on dominance in unbounded knapsack problems. Asia-Pasi�c Journalof Operationnal Research, (12):145{160, 1995.[12] S. Martello and P. Toth. Knapsack Problems: Algorithms and Computer Implementation. John Wileyand Sons, 1990.[13] D. Pisinger. Dominance Relations in Unbounded Knapsack Problems. Technical Report 94/33, DIKU,University of Copenhagen,DK-2100 Copenhagen,Denmark, December 1994.[14] J.M. Val�erio de Carvalho and A.J. Rodrigues. An LP-based Approach to a Two-Stage Cutting StockProblem. European Journal of Operational Research, 84:580{589, 1995.

Irisa