Tutorial: A Simple Framework For Optimization Programming In Python Using PuLP, Gurobi, CPLEX, and XPRESS

A simple structure for optimization modeling in Python using commercial and open-source solvers

Ehsan Khodabandeh
9 min readNov 13, 2018
Photo by Chris Ried on Unsplash

For those who are in a hurry!

Go to this GitHub repository, take a look at the short description in the README file, read the comments in the scripts, and play around with them. I hope you find it useful. And of course, your comments are really appreciated!

You may also check out my colleague’s blog where he covered high-level optimization modeling in Python with Gurobi, CPLEX, and PuLP.

For those who can stick around!

In a previous blog, I introduced the simple “input →process →output” (IPO) framework and discussed how it can improve the structure of our code. So if you need some backstory, check out that blog first. In this post, I want to show how different parts of IPO are used in action, and what better way than to model and solve an optimization problem! Hopefully, you find the code and concepts of this post useful. Maybe they even become a starting point for your projects!

So go to this GitHub repository, download the files, fire up Python, and let’s start!

Update: The blog was originally intended to cover PuLP and Gurobi. Since its publication, modeling with CPLEX and XPRESS was also added to the GitHub repository.

Problem Description

This is a very simple production planning problem. You can find all the data in the “data” folder. For educational purposes, the input data is saved in both “.xlsx” and “.csv” formats (you can find them in the “excel” and “csv” sub-folders, respectively).

Suppose that we are responsible for scheduling the monthly production plan of a product for a year. Here are some assumptions:

  • The demand of the product, unit production cost, and production capacity in each month are given (“/data/csv/input_data.csv” ).
  • Inventory holding costs are assessed at the end of each month.
  • Holding cost per unit is invariant from month to month (“/data/csv/parameters.csv” ).
  • There are 500 units of inventory available at the beginning of the first month (“/data/csv/parameters.csv” ).
  • No shortage is allowed.

Mathematical Formulation

It’s easy to derive the formulation from the problem description.

Sets:
T: the set of time periods (months)

Parameters:
h: unit holding cost
p: production capacity per month
I₀: the initial inventory
cₜ: unit production cost in month t
dₜ: demand of month t

Variables:
Xₜ: amount produced in month t
Iₜ: inventory at the end of period t

Model:

Code Structure

I’ve modeled the above optimization problem in PuLP. To show how simple it is to change the code and use another package, I also wrote the model with the Gurobi Python API package, gurobipy. For educational purposes, I’ve organized the code in two different ways:

  1. Use a single script to load the data, write and solve the optimization model, and write the results into csv files. That’s why we have execute_pulp.py and execute_grb.py modules ( I cheated here a little bit by writing functions for loading the data and writing the outputs to csv files.)
  2. Use more modular thinking and object-oriented (OO) programming, making use of multiple scripts to accomplish these tasks. This is why we have execute_oo.py, optimization_model_pulp.py, and optimization_model_grb.py modules.

Here is a rundown of files and modules in the repository:

  • A “data folder that contains two sub-folders (“csv” and “excel”). If our input data is in csv format, we place them in “csv” folder, and vice versa for Excel.
  • An “output” folder (optional). If you run the code successfully, this folder will be created and your output files will be saved there.
  • parameters.py: the module that stores the user-defined parameters which are mainly used in the optimization model. This is different from the problem parameters (e.g. initial inventory), which are stored in a csv file.
  • helper.py: helper module with several general functions.
  • process_data.py: a module with several functions for loading inputs and writing outputs. The functions here are more specific to the problem at hand.
  • execute_pulp.py: the main module that calls functions to load data, creates the pulp optimization model (variables, constraints, objective function), solves said model, and calls functions to write the outputs to csv files.
  • execute_grb.py: similar to execute_pulp.py, but the model creation and solution are done with gurobipy.
  • execute_oo.py: the main module for the OO approach that calls all the other functions.
  • optimization_model_pulp.py: similar to execute_pulp.py, except (a) the model creation is done in an OptimizationModel class, and (b) model solution is done in an optimize method.
  • optimization_model_gurobi.py: similar to optimization_model_pulp.py, but in gurobipy.

Code Explanation

(I’ll skip parameters.py, helper.py, and process_data.py; They should be self-explanatory.)

execute_pulp.py

Pay close attention to how decision variables and constraints are defined. Let’s go over one of them. We can define production_variables in several ways — here are two approaches:

#1:

production_variables = {
index: pulp.LpVariable(name='X_' + str(row['period']),
lowBound=0,
cat=pulp.LpContinuous)
for index, row in input_df_dict['input_data'].iterrows()}

#2:

production_variables = pulp.LpVariable.dicts(
name='X',
indexs=input_df_dict['input_data'].index,
lowBound=0,
cat=pulp.LpContinuous)

Although here the second method is faster, it’s good to know different ways of defining your variables. In some larger cases with multi-index variables, you may not be able to use the second way.

Also, I encourage you to get into the habit of timing your objects. You will be surprised by the performance of one method against another!

Now, let’s take a look at two different ways the “inventory balance” constraint could be defined:

#1:

for period, value in input_df_dict['input_data'].iloc[1:].iterrows():
model.addConstraint(pulp.LpConstraint(
e=inventory_variables[period - 1]
+ production_variables[period]
- inventory_variables[period],
sense=pulp.LpConstraintEQ,
name='inv_balance' + str(period),
rhs=value.demand))

#2:

inv_balance_constraints = {
period: model.addConstraint(pulp.LpConstraint(
e=inventory_variables[period - 1]
+ production_variables[period]
- inventory_variables[period],
sense=pulp.LpConstraintEQ,
name='inv_balance' + str(period),
rhs=value.demand))
for period, value in
input_df_dict['input_data'].iloc[1:].iterrows()}

You may notice that #1 just creates the constraints in a loop, while #2 actually saves them in a dictionary. One advantage of #2 is that it’s easier to access a constraint later on (in case, for example, you want its slack variable).

However, something’s not quite right about #2. In pulp, the addConstraint method doesn’t return anything , so the values of all our keys are None! Therefore, to get the constraint object that we are interested in, we can define an additional function:

def add_constr(model, constraint):
model.addConstraint(constraint)
return constraint

Now, we modify #2 to get the result we want:

#2-modified:

inv_balance_constraints = {
period: add_constr(model, pulp.LpConstraint(
e=inventory_variables[period - 1]
+ production_variables[period]
- inventory_variables[period],
sense=pulp.LpConstraintEQ,
name='inv_balance' + str(period),
rhs=value.demand))
for period, value
in input_df_dict['input_data'].iloc[1:].iterrows()}

Note that in PuLP, the right-hand-side (rhs) of a constraint cannot have any variables. All the variables should be on the left-hand (or expression) side, e.

You may also notice that in both #1 and #2, rather than adding the constraint to the model object using + (as is done in many of the online tutorials, including the beginner guide on PuLP website), I’ve directly used the addConstraint method. Since each part of the constraint (lhs, sense, name, rhs) is explicitly named, it provides more readability, flexibility, and in larger problems, speed gain (e.g. in #2, you could take advantage of dictionary comprehension rather than a for loop).

execute_grb.py

This module is very similar to execute_pulp.py, so some of the details are excluded to avoid repetition. For example, two ways of defining variables are introduced, but only one for constraints. Check for yourself to see the differences and similarities between pulp and gurobipy approaches.

It is worth noting that when defining a continuous decision variable, the default lower-bound in Gurobi is 0, while it is negative infinity in PuLP.

optimization_model_pulp.py

This module does what execute_pulp.py does, but in a more structured way. There is an OptimizationModel class that accepts the input data and creates the pulp.LpProblem object. This class has five methods: create the decision variables, create the constraints, set the objective function, solve the model, and create the outputs. Every method should be self-explanatory. But since the optimize method is fairly complex, let’s dig a little deeper.

The optimize method can solve the optimization model using four different solvers: CBC (default), Gurobi, CPLEX, and GLPK. We can also pass parameters to each solver. Those parameters are all defined in the parameters.py module. For example, we can pass the mip_gap, time_limit, or even specify whether the constructed model should be saved in a .lp format file.

There is a lot going on in this method, so I encourage you to investigate it carefully. For starters, you see that we no longer have a model.solve() command, because now we want to control the solver and the parameters passed to the solve method. You may notice that, depending on the solver, we call different classes, each class having its own signature.

optimization_model_grb.py

This module is similar to optimization_model_pulp.py, but the model is created using gurobipy. The optimize method in this module can also call the Gurobi solver with all the user-defined parameters. So make sure to compare it with the one in optimization_model_pulp.py.

execute_oo.py

Last but not least, we have execute_oo.py, which is the main module and the starting point for solving the problem using the OO approach. First, the load_data function is called. Then, an instance of the OptimizationModel class is created. But the call to this class depends on the user’s choice of module, which can be either Gurobi or PuLP. You can find more details in the comments of parameters.py. Finally, theoptimize and create_output methods are called.

Why OO?

The problem we discussed here was fairly simple, and you may wonder why I bothered with the OO code anyway. I’d offer two responses.

First off, this is a tutorial blog, and learning is important. Secondly, I’m sure you agree that the OO version is more general, flexible, and easier to debug. You’ll thank yourself when you need to design the code structure for much larger problems!

IPO Structure

You should have spotted instances of IPO throughout the code.

  1. process_data.py is part of an “input” process, but within that module, I made a call to load raw data (input), modify it (process), and send it to another part of the pipeline (output).
  2. execute_pulp.py is a bigger script, but at least I tried to separate different parts of the model from one another, even if only visually and with the help of some comments (all these little pieces count!)
  3. optimization_model_pulp.py seems like a pure “process” function, but looking closer, it has methods to instantiate the optimization model (input), create different parts of the model (process), and solve it (output).
  4. Even creating the model followed the same pattern: Define variables (input), specify their relationships in the constraints (process), and finally, define the measurement metrics in the objective function (output).

Obviously, input-process-output are not always separable, but I think you already got that point.

Although this last example about model creation may feel like a stretch (we naturally define a model with variables first, then constraints, and then the objective), I wanted to emphasize the separability of model components. We could put all of them in one big method, but by separating them, the flow becomes more natural, and there’s more flexibility for future expansion.

As an exercise, assume this problem was “part a” of a homework. For “part b”, we need a simple modification. No inventory should be left at the end of the last period. What would you do?

Naturally, you’ll have to add a constraint. But how would you tell me to alter the model for “part b”? I hope you don’t write a constraint and ask me to comment it for “part a” and uncomment for “part b”! (Hint: I’m the user. All I like to do is to get the desired result by changing the value of one parameter.)

Well, that was it! You made it to the end of the tutorial. If you liked this blog, let’s stay in touch on Medium, Twitter, and LinkedIn.

--

--

Ehsan Khodabandeh

Operations Research Scientist. I write about optimization, logistics, and occasionally Python!