Mathematical Foundations of Nonlinear Optimization

Multivariable Calculus Essentials

Key Objects in Optimization

For a function \(f: \mathbb{R}^n \to \mathbb{R}\):

Partial derivative: \(\frac{\partial f}{\partial x_i}\) - rate of change w.r.t. \(x_i\)
Gradient: \(\nabla f(x) = \left[\frac{\partial f}{\partial x_1}, \dots, \frac{\partial f}{\partial x_n}\right]^T\)
Hessian: \(\nabla^2 f(x)_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}\)

Example

For \(f(x,y) = x^2 + 2xy + y^3\):

\[\nabla f = \begin{bmatrix} 2x + 2y \\ 2x + 3y^2 \end{bmatrix}, \quad \nabla^2 f = \begin{bmatrix} 2 & 2 \\ 2 & 6y \end{bmatrix}\]

Partial Derivatives - Geometric Interpretation

\(\frac{\partial f}{\partial x_i}\) measures rate of change along \(x_i\)-axis
Keep all other variables constant
Slope of tangent line to curve \(f(x_1, \ldots, x_i, \ldots, x_n)\)

Chain Rule (Multivariate)

If \(f(x(t), y(t))\), then:

\[\frac{df}{dt} = \frac{\partial f}{\partial x}\frac{dx}{dt} + \frac{\partial f}{\partial y}\frac{dy}{dt}\]

Gradient: Directional Steepest Ascent

Points in direction of maximum increase of \(f\)
Orthogonal to level sets (\(\nabla f \perp \{x \mid f(x)=c\}\))
Critical point: \(\nabla f(x^*) = 0\)
Magnitude: \(\|\nabla f(x)\|\) = maximum directional derivative

Directional Derivative

Along unit vector \(d\):

\[D_d f(x) = \nabla f(x)^T d = \|\nabla f(x)\| \cos \theta\]

Hessian Matrix: Second-Order Information

Definition

\[\nabla^2 f(x) = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}\]

Symmetric (if \(f \in C^2\)): \(\frac{\partial^2 f}{\partial x_i \partial x_j} = \frac{\partial^2 f}{\partial x_j \partial x_i}\)
Determines curvature of \(f\)
Used in optimality conditions and Newton’s method

Jacobian: Vector-Valued Functions

For \(F: \mathbb{R}^n \to \mathbb{R}^m\) with \(F(x) = [F_1(x), F_2(x), \ldots, F_m(x)]^T\):

\[J_F(x) = \begin{bmatrix} \frac{\partial F_1}{\partial x_1} & \cdots & \frac{\partial F_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial F_m}{\partial x_1} & \cdots & \frac{\partial F_m}{\partial x_n} \end{bmatrix}\]

Example

\[\begin{aligned} \text{For}~F(x,y) &= \begin{bmatrix} x^2y \\ \sin(x+y) \\ e^{xy} \end{bmatrix}: \\ J_F &= \begin{bmatrix} 2xy & x^2 \\ \cos(x+y) & \cos(x+y) \\ ye^{xy} & xe^{xy} \end{bmatrix} \end{aligned}\]

Note: When \(m=1\), \(J_F = \nabla F^T\)

Taylor’s Theorem in \(\mathbb{R}^n\)

Theorem: First-Order Approximation

For \(f \in C^1\):

\[f(x + \Delta x) \approx f(x) + \nabla f(x)^T \Delta x\]

Theorem : Second-Order Approximation

For \(f \in C^2\):

\[f(x + \Delta x) \approx f(x) + \nabla f(x)^T \Delta x + \frac{1}{2} \Delta x^T \nabla^2 f(x) \Delta x\]

Error Analysis

First-order error: \(O(\|\Delta x\|^2)\)
Second-order error: \(O(\|\Delta x\|^3)\)

Why Taylor Series Matter in Optimization

Algorithm Design Foundation

Gradient descent: Uses 1st-order approximation
Newton’s method: Uses 2nd-order approximation
Quasi-Newton methods: Approximate 2nd-order info

Example: Quadratic Model

At each iteration, approximate \(f\) by quadratic:

\[q(s) = f(x_k) + \nabla f(x_k)^T s + \frac{1}{2} s^T B_k s\]

where \(B_k \approx \nabla^2 f(x_k)\)

Local vs. Global Behavior

Taylor approximations are local - accuracy decreases with distance from expansion point

Convex Sets and Their Properties

Definition : Convex Set

A set \(S \subseteq \mathbb{R}^n\) is convex if:

\[\forall x, y \in S, \forall \lambda \in [0,1]: \lambda x + (1-\lambda) y \in S\]

Examples of Convex Sets

Hyperplanes: \(\{x \mid a^T x = b\}\)
Half-spaces: \(\{x \mid a^T x \leq b\}\)
Balls: \(\{x \mid \|x - c\| \leq r\}\)
Intersection of convex sets is convex

Convex Functions

A function \(f: S \to \mathbb{R}\) on convex set \(S\) is convex if:

\[\forall x, y \in S, \forall \lambda \in [0,1]: f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1-\lambda) f(y)\]

Strictly Convex

If inequality is strict for \(x \neq y\) and \(\lambda \in (0,1)\)

Characterizations of Convex Functions

First-Order Condition

For \(f \in C^1\) on convex set \(S\), \(f\) is convex iff:

\[f(y) \geq f(x) + \nabla f(x)^T (y-x) \quad \forall x,y \in S\]

Theorem (Second-Order Condition)

For \(f \in C^2\) on convex set \(S\), \(f\) is convex iff:

\[\nabla^2 f(x) \succeq 0 \quad \forall x \in S\]

(Hessian is positive semidefinite)

Key Insight

Convex functions have no local minima that are not global

Properties of Convex Functions

Preservation Under Operations

Positive combination: \(\alpha f + \beta g\) convex if \(\alpha, \beta \geq 0\)
Pointwise maximum: \(\max\{f_1(x), f_2(x), \ldots\}\) convex
Composition: \(g(f(x))\) convex if \(g\) convex increasing, \(f\) convex

Example (Common Convex Functions)

Linear: \(a^T x + b\)
Quadratic: \(\frac{1}{2}x^T Q x + q^T x + r\) (if \(Q \succeq 0\))
Exponential: \(e^{ax}\)
Norms: \(\|x\|_p\) for \(p \geq 1\)
Logarithmic barrier: \(-\sum_{i=1}^n \log(x_i)\) on \(\{x \mid x > 0\}\)

Unconstrained Optimality Conditions

Problem

\[\min_{x \in \mathbb{R}^n} f(x)\]

Theorem (First-Order Necessary Condition (FONC))

If \(x^*\) is a local minimum and \(f \in C^1\), then:

\[\nabla f(x^*) = 0\]

Theorem (Second-Order Necessary Condition (SONC))

If \(x^*\) is a local minimum and \(f \in C^2\), then:

\[\nabla f(x^*) = 0 \quad \text{and} \quad \nabla^2 f(x^*) \succeq 0\]

Critical Points

Points where \(\nabla f(x) = 0\) - could be minima, maxima, or saddle points

Sufficient Conditions

Theorem: (Second-Order Sufficient Condition (SOSC))

If \(\nabla f(x^*) = 0\) and \(\nabla^2 f(x^*) \succ 0\), then \(x^*\) is a strict local minimum.

Classification of Critical Points

\(\nabla^2 f \succ 0\): Local minimum
\(\nabla^2 f \prec 0\): Local maximum
\(\nabla^2 f\) indefinite: Saddle point
\(\nabla^2 f\) singular: Inconclusive

Example

For \(f(x,y) = x^2 - y^2\): \(\nabla f = [2x, -2y]^T\), \(\nabla^2 f = \begin{bmatrix} 2 & 0 \\ 0 & -2 \end{bmatrix}\)

At \((0,0)\): \(\nabla f = 0\), but \(\nabla^2 f\) is indefinite \(\Rightarrow\) saddle point

Necessary vs. Sufficient Conditions

Necessary Conditions

Must hold at optimum
Help find candidates
Not sufficient for optimality

Example

\(f(x) = x^4\) at \(x=0\):

\(f'(0) = 0\) \(\checkmark\)
\(f''(0) = 0\) (inconclusive)
But \(x=0\) is global minimum!

Sufficient Conditions

Guarantee optimality
Help verify candidates
Often stronger than necessary

Gap Between Necessary and Sufficient

There exist points satisfying necessary conditions but not sufficient conditions that are still optimal.

Constrained Optimization Setup

Problem (Equality Constrained Problem)

\[\begin{aligned} \min_{x \in \mathbb{R}^n} &\quad f(x) \\ \text{subject to} &\quad g_i(x) = 0, \quad i = 1, 2, \ldots, m \end{aligned}\]

Key Insight

At optimum, gradient of objective must be orthogonal to feasible directions

At optimum \(x^*\):

\(\nabla f(x^*)\) parallel to \(\nabla g(x^*)\)
Both orthogonal to constraint
Feasible directions have zero projection on \(\nabla f\)

Lagrange Multiplier Theorem

Theorem: First-Order Necessary Conditions

Let \(x^*\) be a local minimum of \(f(x)\) subject to \(g_i(x) = 0\), \(i = 1, \ldots, m\). If \(\nabla g_1(x^*), \ldots, \nabla g_m(x^*)\) are linearly independent, then there exist multipliers \(\lambda_1^*, \ldots, \lambda_m^*\) such that:

\[\nabla f(x^*) + \sum_{i=1}^m \lambda_i^* \nabla g_i(x^*) = 0\]

Lagrangian Function

\[L(x, \lambda) = f(x) + \sum_{i=1}^m \lambda_i g_i(x)\]

KKT System

\[\begin{aligned} \nabla_x L(x^*, \lambda^*) &= 0 \quad \text{(stationarity)} \\ g_i(x^*) &= 0 \quad \text{(feasibility)} \end{aligned}\]

Geometric Interpretation

\(\lambda_i^*\) measures sensitivity of optimal value to constraint \(i\)
\(\lambda_i^* > 0\): constraint "pushes" solution away from unconstrained optimum
\(\lambda_i^* < 0\): constraint "pulls" solution toward unconstrained optimum

Economic Interpretation

\(\lambda_i^*\) = shadow price of constraint \(i\)

\[\lambda_i^* \approx \frac{\partial f^*}{\partial b_i}\]

where \(g_i(x) = b_i\) (perturbed constraint)

Simple Example: Lagrange Multipliers

Example

Minimize \(f(x,y) = x^2 + y^2\) subject to \(g(x,y) = x + y - 1 = 0\)

Solution:

Form Lagrangian: \(L(x,y,\lambda) = x^2 + y^2 + \lambda(x + y - 1)\)
KKT conditions:

\[\begin{aligned} \frac{\partial L}{\partial x} &= 2x + \lambda = 0 \\ \frac{\partial L}{\partial y} &= 2y + \lambda = 0 \\ \frac{\partial L}{\partial \lambda} &= x + y - 1 = 0 \end{aligned}\]
Solve: From (1) and (2): \(x = y = -\frac{\lambda}{2}\)
Substitute into (3): \(-\frac{\lambda}{2} - \frac{\lambda}{2} - 1 = 0 \Rightarrow \lambda = -1\)
Therefore: \(x^* = y^* = \frac{1}{2}\), \(f^* = \frac{1}{2}\)

Preview: Inequality Constraints

Problem (General Nonlinear Program)

\[\begin{aligned} \min_{x \in \mathbb{R}^n} &\quad f(x) \\ \text{subject to} &\quad g_i(x) = 0, \quad i \in \mathcal{E} \\ &\quad g_j(x) \leq 0, \quad j \in \mathcal{I} \end{aligned}\]

Karush-Kuhn-Tucker (KKT) Conditions

At optimal point \((x^*, \lambda^*, \mu^*)\):

\[\begin{aligned} \nabla f(x^*) + \sum_{i \in \mathcal{E}} \lambda_i^* \nabla g_i(x^*) + \sum_{j \in \mathcal{I}} \mu_j^* \nabla g_j(x^*) &= 0 \\ g_i(x^*) &= 0, \quad i \in \mathcal{E} \\ g_j(x^*) &\leq 0, \quad j \in \mathcal{I} \\ \mu_j^* &\geq 0, \quad j \in \mathcal{I} \\ \mu_j^* g_j(x^*) &= 0, \quad j \in \mathcal{I} \quad \text{(complementarity)} \end{aligned}\]

Next lecture: Detailed KKT analysis and constraint qualifications

Key Takeaways

Calculus Tools: Gradients, Hessians, Jacobians are fundamental
Taylor Approximations: Foundation for iterative algorithms
Convexity: Guarantees global optimality of local solutions
Optimality Conditions:
- Necessary: Must hold at optimum (help find candidates)
- Sufficient: Guarantee optimality (help verify candidates)
Lagrange Multipliers: Handle equality constraints elegantly

Mathematical Foundation Complete

We now have the mathematical tools to tackle general nonlinear optimization problems!

Next lecture: KKT conditions and constraint qualifications

Practice Problems

Compute gradient and Hessian of \(f(x,y,z) = x^2y + e^{yz} + \sin(xz)\)
Show that \(f(x) = \|x\|_2^2\) is convex using second-order conditions
Find all critical points of \(f(x,y) = x^3 - 3xy^2 + y^3\) and classify them
Use Lagrange multipliers to minimize \(f(x,y) = xy\) subject to \(x^2 + y^2 = 1\)
Prove that the intersection of two convex sets is convex

Review of Multivariable Calculus

Multivariable Calculus Essentials

Key Objects in Optimization

Example

Partial Derivatives - Geometric Interpretation

Chain Rule (Multivariate)

Gradients, Hessians & Jacobians

Gradient: Directional Steepest Ascent

Directional Derivative

Hessian Matrix: Second-Order Information

Definition

Jacobian: Vector-Valued Functions

Example

Taylor Series Approximations

Taylor’s Theorem in \(\mathbb{R}^n\)

Theorem: First-Order Approximation

Theorem : Second-Order Approximation

Error Analysis

Why Taylor Series Matter in Optimization

Algorithm Design Foundation

Example: Quadratic Model

Local vs. Global Behavior

Convex Analysis

Convex Sets and Their Properties

Definition : Convex Set

Examples of Convex Sets

Convex Functions

Strictly Convex

Characterizations of Convex Functions

First-Order Condition

Theorem (Second-Order Condition)

Key Insight

Properties of Convex Functions

Preservation Under Operations

Example (Common Convex Functions)

Optimality Conditions

Unconstrained Optimality Conditions

Problem

Theorem (First-Order Necessary Condition (FONC))

Theorem (Second-Order Necessary Condition (SONC))

Critical Points

Sufficient Conditions

Theorem: (Second-Order Sufficient Condition (SOSC))

Classification of Critical Points

Example

Necessary vs. Sufficient Conditions

Necessary Conditions

Example

Sufficient Conditions

Gap Between Necessary and Sufficient

Introduction to Lagrange Multipliers

Constrained Optimization Setup

Problem (Equality Constrained Problem)

Key Insight

Lagrange Multiplier Theorem

Theorem: First-Order Necessary Conditions

Lagrangian Function

KKT System

Geometric Interpretation

Economic Interpretation

Simple Example: Lagrange Multipliers

Example

Preview: Inequality Constraints

Problem (General Nonlinear Program)

Karush-Kuhn-Tucker (KKT) Conditions

Key Takeaways

Mathematical Foundation Complete

Practice Problems