Tutorial: A Simple Framework For Optimization Programming In Python Using PuLP, Gurobi, CPLEX, and XPRESS

A simple structure for optimization modeling in Python using commercial and open-source solvers

Photo by Chris Ried on Unsplash

For those who are in a hurry!

You may also check out my colleague’s blog where he covered high-level optimization modeling in Python with Gurobi, CPLEX, and PuLP.

For those who can stick around!

So go to this GitHub repository, download the files, fire up Python, and let’s start!

Update: The blog was originally intended to cover PuLP and Gurobi. Since its publication, modeling with CPLEX and XPRESS was also added to the GitHub repository.

Problem Description

Suppose that we are responsible for scheduling the monthly production plan of a product for a year. Here are some assumptions:

  • The demand of the product, unit production cost, and production capacity in each month are given (“/data/csv/input_data.csv” ).
  • Inventory holding costs are assessed at the end of each month.
  • Holding cost per unit is invariant from month to month (“/data/csv/parameters.csv” ).
  • There are 500 units of inventory available at the beginning of the first month (“/data/csv/parameters.csv” ).
  • No shortage is allowed.

Mathematical Formulation

Sets:
T: the set of time periods (months)

Parameters:
h: unit holding cost
p: production capacity per month
I₀: the initial inventory
cₜ: unit production cost in month t
dₜ: demand of month t

Variables:
Xₜ: amount produced in month t
Iₜ: inventory at the end of period t

Model:

Code Structure

  1. Use a single script to load the data, write and solve the optimization model, and write the results into csv files. That’s why we have execute_pulp.py and execute_grb.py modules ( I cheated here a little bit by writing functions for loading the data and writing the outputs to csv files.)
  2. Use more modular thinking and object-oriented (OO) programming, making use of multiple scripts to accomplish these tasks. This is why we have execute_oo.py, optimization_model_pulp.py, and optimization_model_grb.py modules.

Here is a rundown of files and modules in the repository:

  • A “data folder that contains two sub-folders (“csv” and “excel”). If our input data is in csv format, we place them in “csv” folder, and vice versa for Excel.
  • An “output” folder (optional). If you run the code successfully, this folder will be created and your output files will be saved there.
  • parameters.py: the module that stores the user-defined parameters which are mainly used in the optimization model. This is different from the problem parameters (e.g. initial inventory), which are stored in a csv file.
  • helper.py: helper module with several general functions.
  • process_data.py: a module with several functions for loading inputs and writing outputs. The functions here are more specific to the problem at hand.
  • execute_pulp.py: the main module that calls functions to load data, creates the pulp optimization model (variables, constraints, objective function), solves said model, and calls functions to write the outputs to csv files.
  • execute_grb.py: similar to execute_pulp.py, but the model creation and solution are done with gurobipy.
  • execute_oo.py: the main module for the OO approach that calls all the other functions.
  • optimization_model_pulp.py: similar to execute_pulp.py, except (a) the model creation is done in an OptimizationModel class, and (b) model solution is done in an optimize method.
  • optimization_model_gurobi.py: similar to optimization_model_pulp.py, but in gurobipy.

Code Explanation

execute_pulp.py

#1:

production_variables = {
index: pulp.LpVariable(name='X_' + str(row['period']),
lowBound=0,
cat=pulp.LpContinuous)
for index, row in input_df_dict['input_data'].iterrows()}

#2:

production_variables = pulp.LpVariable.dicts(
name='X',
indexs=input_df_dict['input_data'].index,
lowBound=0,
cat=pulp.LpContinuous)

Although here the second method is faster, it’s good to know different ways of defining your variables. In some larger cases with multi-index variables, you may not be able to use the second way.

Also, I encourage you to get into the habit of timing your objects. You will be surprised by the performance of one method against another!

Now, let’s take a look at two different ways the “inventory balance” constraint could be defined:

#1:

for period, value in input_df_dict['input_data'].iloc[1:].iterrows():
model.addConstraint(pulp.LpConstraint(
e=inventory_variables[period - 1]
+ production_variables[period]
- inventory_variables[period],
sense=pulp.LpConstraintEQ,
name='inv_balance' + str(period),
rhs=value.demand))

#2:

inv_balance_constraints = {
period: model.addConstraint(pulp.LpConstraint(
e=inventory_variables[period - 1]
+ production_variables[period]
- inventory_variables[period],
sense=pulp.LpConstraintEQ,
name='inv_balance' + str(period),
rhs=value.demand))
for period, value in
input_df_dict['input_data'].iloc[1:].iterrows()}

You may notice that #1 just creates the constraints in a loop, while #2 actually saves them in a dictionary. One advantage of #2 is that it’s easier to access a constraint later on (in case, for example, you want its slack variable).

However, something’s not quite right about #2. In pulp, the addConstraint method doesn’t return anything , so the values of all our keys are None! Therefore, to get the constraint object that we are interested in, we can define an additional function:

def add_constr(model, constraint):
model.addConstraint(constraint)
return constraint

Now, we modify #2 to get the result we want:

#2-modified:

inv_balance_constraints = {
period: add_constr(model, pulp.LpConstraint(
e=inventory_variables[period - 1]
+ production_variables[period]
- inventory_variables[period],
sense=pulp.LpConstraintEQ,
name='inv_balance' + str(period),
rhs=value.demand))
for period, value
in input_df_dict['input_data'].iloc[1:].iterrows()}

Note that in PuLP, the right-hand-side (rhs) of a constraint cannot have any variables. All the variables should be on the left-hand (or expression) side, e.

You may also notice that in both #1 and #2, rather than adding the constraint to the model object using + (as is done in many of the online tutorials, including the beginner guide on PuLP website), I’ve directly used the addConstraint method. Since each part of the constraint (lhs, sense, name, rhs) is explicitly named, it provides more readability, flexibility, and in larger problems, speed gain (e.g. in #2, you could take advantage of dictionary comprehension rather than a for loop).

execute_grb.py

It is worth noting that when defining a continuous decision variable, the default lower-bound in Gurobi is 0, while it is negative infinity in PuLP.

optimization_model_pulp.py

The optimize method can solve the optimization model using four different solvers: CBC (default), Gurobi, CPLEX, and GLPK. We can also pass parameters to each solver. Those parameters are all defined in the parameters.py module. For example, we can pass the mip_gap, time_limit, or even specify whether the constructed model should be saved in a .lp format file.

There is a lot going on in this method, so I encourage you to investigate it carefully. For starters, you see that we no longer have a model.solve() command, because now we want to control the solver and the parameters passed to the solve method. You may notice that, depending on the solver, we call different classes, each class having its own signature.

optimization_model_grb.py

execute_oo.py

Why OO?

First off, this is a tutorial blog, and learning is important. Secondly, I’m sure you agree that the OO version is more general, flexible, and easier to debug. You’ll thank yourself when you need to design the code structure for much larger problems!

IPO Structure

  1. process_data.py is part of an “input” process, but within that module, I made a call to load raw data (input), modify it (process), and send it to another part of the pipeline (output).
  2. execute_pulp.py is a bigger script, but at least I tried to separate different parts of the model from one another, even if only visually and with the help of some comments (all these little pieces count!)
  3. optimization_model_pulp.py seems like a pure “process” function, but looking closer, it has methods to instantiate the optimization model (input), create different parts of the model (process), and solve it (output).
  4. Even creating the model followed the same pattern: Define variables (input), specify their relationships in the constraints (process), and finally, define the measurement metrics in the objective function (output).

Obviously, input-process-output are not always separable, but I think you already got that point.

Although this last example about model creation may feel like a stretch (we naturally define a model with variables first, then constraints, and then the objective), I wanted to emphasize the separability of model components. We could put all of them in one big method, but by separating them, the flow becomes more natural, and there’s more flexibility for future expansion.

As an exercise, assume this problem was “part a” of a homework. For “part b”, we need a simple modification. No inventory should be left at the end of the last period. What would you do?

Naturally, you’ll have to add a constraint. But how would you tell me to alter the model for “part b”? I hope you don’t write a constraint and ask me to comment it for “part a” and uncomment for “part b”! (Hint: I’m the user. All I like to do is to get the desired result by changing the value of one parameter.)

Well, that was it! You made it to the end of the tutorial. If you liked this blog, let’s stay in touch on Medium, Twitter, and LinkedIn.

Operations Research Scientist. I write about optimization, logistics, and occasionally Python!