Tutorial: A Simple Framework For Optimization Programming In Python Using PuLP, Gurobi, CPLEX, and XPRESS
A simple structure for optimization modeling in Python using commercial and open-source solvers
For those who are in a hurry!
Go to this GitHub repository, take a look at the short description in the README file, read the comments in the scripts, and play around with them. I hope you find it useful. And of course, your comments are really appreciated!
You may also check out my colleague’s blog where he covered high-level optimization modeling in Python with Gurobi, CPLEX, and PuLP.
For those who can stick around!
In a previous blog, I introduced the simple “input →process →output” (IPO) framework and discussed how it can improve the structure of our code. So if you need some backstory, check out that blog first. In this post, I want to show how different parts of IPO are used in action, and what better way than to model and solve an optimization problem! Hopefully, you find the code and concepts of this post useful. Maybe they even become a starting point for your projects!
So go to this GitHub repository, download the files, fire up Python, and let’s start!
Update: The blog was originally intended to cover PuLP and Gurobi. Since its publication, modeling with CPLEX and XPRESS was also added to the GitHub repository.
Problem Description
This is a very simple production planning problem. You can find all the data in the “data” folder. For educational purposes, the input data is saved in both “.xlsx” and “.csv” formats (you can find them in the “excel” and “csv” sub-folders, respectively).
Suppose that we are responsible for scheduling the monthly production plan of a product for a year. Here are some assumptions:
- The demand of the product, unit production cost, and production capacity in each month are given (“/data/csv/input_data.csv” ).
- Inventory holding costs are assessed at the end of each month.
- Holding cost per unit is invariant from month to month (“/data/csv/parameters.csv” ).
- There are 500 units of inventory available at the beginning of the first month (“/data/csv/parameters.csv” ).
- No shortage is allowed.
Mathematical Formulation
It’s easy to derive the formulation from the problem description.
Sets:
T: the set of time periods (months)
Parameters:
h: unit holding cost
p: production capacity per month
I₀: the initial inventory
cₜ: unit production cost in month t
dₜ: demand of month t
Variables:
Xₜ: amount produced in month t
Iₜ: inventory at the end of period t
Model:

Code Structure
I’ve modeled the above optimization problem in PuLP. To show how simple it is to change the code and use another package, I also wrote the model with the Gurobi Python API package, gurobipy
. For educational purposes, I’ve organized the code in two different ways:
- Use a single script to load the data, write and solve the optimization model, and write the results into csv files. That’s why we have
execute_pulp.py
andexecute_grb.py
modules ( I cheated here a little bit by writing functions for loading the data and writing the outputs to csv files.) - Use more modular thinking and object-oriented (OO) programming, making use of multiple scripts to accomplish these tasks. This is why we have
execute_oo.py
,optimization_model_pulp.py
, andoptimization_model_grb.py
modules.
Here is a rundown of files and modules in the repository:
- A “data” folder that contains two sub-folders (“csv” and “excel”). If our input data is in csv format, we place them in “csv” folder, and vice versa for Excel.
- An “output” folder (optional). If you run the code successfully, this folder will be created and your output files will be saved there.
parameters.py
: the module that stores the user-defined parameters which are mainly used in the optimization model. This is different from the problem parameters (e.g. initial inventory), which are stored in a csv file.helper.py
: helper module with several general functions.process_data.py
: a module with several functions for loading inputs and writing outputs. The functions here are more specific to the problem at hand.execute_pulp.py
: the main module that calls functions to load data, creates thepulp
optimization model (variables, constraints, objective function), solves said model, and calls functions to write the outputs to csv files.execute_grb.py
: similar toexecute_pulp.py
, but the model creation and solution are done withgurobipy
.execute_oo.py
: the main module for the OO approach that calls all the other functions.optimization_model_pulp.py
: similar toexecute_pulp.py
, except (a) the model creation is done in anOptimizationModel
class, and (b) model solution is done in anoptimize
method.optimization_model_gurobi.py
: similar tooptimization_model_pulp.py
, but ingurobipy
.
Code Explanation
(I’ll skip parameters.py
, helper.py
, and process_data.py
; They should be self-explanatory.)
execute_pulp.py
Pay close attention to how decision variables and constraints are defined. Let’s go over one of them. We can define production_variables
in several ways — here are two approaches:
#1:
production_variables = {
index: pulp.LpVariable(name='X_' + str(row['period']),
lowBound=0,
cat=pulp.LpContinuous)
for index, row in input_df_dict['input_data'].iterrows()}
#2:
production_variables = pulp.LpVariable.dicts(
name='X',
indexs=input_df_dict['input_data'].index,
lowBound=0,
cat=pulp.LpContinuous)
Although here the second method is faster, it’s good to know different ways of defining your variables. In some larger cases with multi-index variables, you may not be able to use the second way.
Also, I encourage you to get into the habit of timing your objects. You will be surprised by the performance of one method against another!
Now, let’s take a look at two different ways the “inventory balance” constraint could be defined:
#1:
for period, value in input_df_dict['input_data'].iloc[1:].iterrows():
model.addConstraint(pulp.LpConstraint(
e=inventory_variables[period - 1]
+ production_variables[period]
- inventory_variables[period],
sense=pulp.LpConstraintEQ,
name='inv_balance' + str(period),
rhs=value.demand))
#2:
inv_balance_constraints = {
period: model.addConstraint(pulp.LpConstraint(
e=inventory_variables[period - 1]
+ production_variables[period]
- inventory_variables[period],
sense=pulp.LpConstraintEQ,
name='inv_balance' + str(period),
rhs=value.demand))
for period, value in
input_df_dict['input_data'].iloc[1:].iterrows()}
You may notice that #1 just creates the constraints in a loop, while #2 actually saves them in a dictionary. One advantage of #2 is that it’s easier to access a constraint later on (in case, for example, you want its slack variable).
However, something’s not quite right about #2. In pulp
, the addConstraint
method doesn’t return anything , so the values of all our keys are None
! Therefore, to get the constraint object that we are interested in, we can define an additional function:
def add_constr(model, constraint):
model.addConstraint(constraint)
return constraint
Now, we modify #2 to get the result we want:
#2-modified:
inv_balance_constraints = {
period: add_constr(model, pulp.LpConstraint(
e=inventory_variables[period - 1]
+ production_variables[period]
- inventory_variables[period],
sense=pulp.LpConstraintEQ,
name='inv_balance' + str(period),
rhs=value.demand))
for period, value
in input_df_dict['input_data'].iloc[1:].iterrows()}
Note that in PuLP, the right-hand-side (rhs
) of a constraint cannot have any variables. All the variables should be on the left-hand (or expression) side, e
.
You may also notice that in both #1 and #2, rather than adding the constraint to the model
object using +
(as is done in many of the online tutorials, including the beginner guide on PuLP website), I’ve directly used the addConstraint
method. Since each part of the constraint (lhs, sense, name, rhs) is explicitly named, it provides more readability, flexibility, and in larger problems, speed gain (e.g. in #2, you could take advantage of dictionary comprehension rather than a for
loop).
execute_grb.py
This module is very similar to execute_pulp.py
, so some of the details are excluded to avoid repetition. For example, two ways of defining variables are introduced, but only one for constraints. Check for yourself to see the differences and similarities between pulp
and gurobipy
approaches.
It is worth noting that when defining a continuous decision variable, the default lower-bound in Gurobi is 0, while it is negative infinity in PuLP.
optimization_model_pulp.py
This module does what execute_pulp.py
does, but in a more structured way. There is an OptimizationModel
class that accepts the input data and creates the pulp.LpProblem
object. This class has five methods: create the decision variables, create the constraints, set the objective function, solve the model, and create the outputs. Every method should be self-explanatory. But since the optimize
method is fairly complex, let’s dig a little deeper.
The optimize
method can solve the optimization model using four different solvers: CBC (default), Gurobi, CPLEX, and GLPK. We can also pass parameters to each solver. Those parameters are all defined in the parameters.py
module. For example, we can pass the mip_gap
, time_limit
, or even specify whether the constructed model should be saved in a .lp
format file.
There is a lot going on in this method, so I encourage you to investigate it carefully. For starters, you see that we no longer have a model.solve()
command, because now we want to control the solver and the parameters passed to the solve
method. You may notice that, depending on the solver, we call different classes, each class having its own signature.
optimization_model_grb.py
This module is similar to optimization_model_pulp.py
, but the model is created using gurobipy
. The optimize
method in this module can also call the Gurobi solver with all the user-defined parameters. So make sure to compare it with the one in optimization_model_pulp.py
.
execute_oo.py
Last but not least, we have execute_oo.py
, which is the main module and the starting point for solving the problem using the OO approach. First, the load_data
function is called. Then, an instance of the OptimizationModel
class is created. But the call to this class depends on the user’s choice of module, which can be either Gurobi or PuLP. You can find more details in the comments of parameters.py
. Finally, theoptimize
and create_output
methods are called.
Why OO?
The problem we discussed here was fairly simple, and you may wonder why I bothered with the OO code anyway. I’d offer two responses.
First off, this is a tutorial blog, and learning is important. Secondly, I’m sure you agree that the OO version is more general, flexible, and easier to debug. You’ll thank yourself when you need to design the code structure for much larger problems!
IPO Structure
You should have spotted instances of IPO throughout the code.
process_data.py
is part of an “input” process, but within that module, I made a call to load raw data (input), modify it (process), and send it to another part of the pipeline (output).execute_pulp.py
is a bigger script, but at least I tried to separate different parts of the model from one another, even if only visually and with the help of some comments (all these little pieces count!)optimization_model_pulp.py
seems like a pure “process” function, but looking closer, it has methods to instantiate the optimization model (input), create different parts of the model (process), and solve it (output).- Even creating the model followed the same pattern: Define variables (input), specify their relationships in the constraints (process), and finally, define the measurement metrics in the objective function (output).
Obviously, input-process-output are not always separable, but I think you already got that point.

Although this last example about model creation may feel like a stretch (we naturally define a model with variables first, then constraints, and then the objective), I wanted to emphasize the separability of model components. We could put all of them in one big method, but by separating them, the flow becomes more natural, and there’s more flexibility for future expansion.
As an exercise, assume this problem was “part a” of a homework. For “part b”, we need a simple modification. No inventory should be left at the end of the last period. What would you do?
Naturally, you’ll have to add a constraint. But how would you tell me to alter the model for “part b”? I hope you don’t write a constraint and ask me to comment it for “part a” and uncomment for “part b”! (Hint: I’m the user. All I like to do is to get the desired result by changing the value of one parameter.)