\[\displaystyle{\min_{x \in \mathbb{R}^n} f(x)} \quad or \quad \displaystyle{\max_{x \in \mathbb{R}^n} f(x)}\]
(For constraint problem: Let constraint region set be $F \subset \mathbb{R}^n$. Then $x \in F$. F is called a feasible region)
\[\displaystyle{\min_{x \in F} f(x)} \quad or \quad \displaystyle{\max_{x \in F} f(x)} \quad (\text{s.t.} \ x \in F)\]Now a feasible region (set) $F$ can be defined as
\[F = \{ x \in \mathbb{R}^n \ | \ h(x)=0, \ g(x)\leq 0 \}\]Let $x^{\ast} \in \mathbb{R}^n$ is an optimal solution.
Local Optimal Solution | Global Optimal Solution | |
---|---|---|
Condition | $f(x^{\ast}) \leq f(x)$ | $f(x^{\ast}) \leq f(x)$ |
Region | for $\epsilon >0$, $x\in F \cap B(x^{\ast}, \epsilon)$ where $B(x^{\ast}, \epsilon) = {x \in \mathbb{R}^n: | x - x^{\ast} | < \epsilon }$ | $x, x^{\ast} \in F$ |
Local optimization finds an optimal solution within region where feasible reagion $F$ intersects with a bounded region $B(x^{\ast}, \epsilon)$. Global optimaztion finds an optimal solution just within a feasible region $F$.
Let $L(x, \lambda)$ be a Lagrange function.
\[L(x, \lambda) = f(x) + \lambda h(x)\]No Constraint Optimization
Condition | 1D | Vector |
---|---|---|
1st order Stationary | $\frac{\partial f}{\partial x}(x^{\ast})=0$ | $\nabla_x f(x^{\ast})=0$ |
2nd order Neccessary | $\frac{\partial f}{\partial x}(x^{\ast})=0$, $\frac{\partial^2 f}{\partial x^2}(x^{\ast})\geq 0$ | $\nabla_x f(x^{\ast})=0$, $t^T \nabla_x^2 f(x^{\ast})t\geq 0$ where $\forall t \in \mathbb{R}^n$ |
2nd order Sufficient | $\frac{\partial f}{\partial x}(x^{\ast})=0$, $\frac{\partial^2 f}{\partial x^2}(x^{\ast}) > 0$ | $\nabla_x f(x^{\ast})=0$, $t^T \nabla_x^2 f(x^{\ast})t> 0$ where $t \in \mathbb{R}^n \ \& \ t \neq 0$ |
With Constraints
Condition | Equality Constraint | Inequality Constraint |
---|---|---|
1st order Stationary | $\nabla_x L(x^{\ast}, \lambda^{\ast})=0$ $\nabla_{\lambda} L(x^{\ast}, \lambda^{\ast})=h(x^{\ast})=0$ where $\lambda^{\ast} \in \mathbb{R}^n$ | $\nabla_x L(x^{\ast}, \mu^{\ast})=0$ $\mu_j^{\ast}g_j(x^{\ast})=0$ $\mu_j^{\ast} \geq 0, \ g_j(x^{\ast})\leq 0$ where $(j=1, \dots, m)$ |
2nd order Neccessary | $t^T \nabla_x^2 L(x^{\ast}, \lambda^{\ast})t\geq 0$ where $\forall t \in M(x^{\ast})$ | $t^T \nabla_x^2 L(x^{\ast}, \mu^{\ast})t\geq 0$ where $\forall t \in M(x^{\ast})$ |
2nd order Sufficient | $t^T \nabla_x^2 L(x^{\ast}, \lambda^{\ast})t> 0$ where $\forall t \in M(x^{\ast}) \ \& \ t \neq 0$ | $t^T \nabla_x^2 L(x^{\ast}, \mu^{\ast})t\geq 0$ where $\forall t \in M(x^{\ast}) \ \& \ t \neq 0$ |
where $M(x^{\ast})=\{ t \in \mathbb{R}^n | \nabla_x h_i(x^{\ast})^T t = 0, i=1, \dots, r \}$ and $M(x^{\ast})=\{ t \in \mathbb{R}^n | \nabla_x g_j(x^{\ast})^T t = 0, j=1, \dots, m \}$ respectively for equality and inequality constraints. $x$ is only considered in set $M$ because optimization should only be considered within the equality constraint.
$\mu_j^{\ast} g_j(x^{\ast})=0$ is called a *Complimentarity Condition. When the solution is optimal $x^{\ast}$ this holds true, in other word, you can also try to minimize the complimentarity gap. The interior point method utilizes this characteristics to optimize a linear programming in a polynomial time.
Linear independence constraint qualification:
$\nabla h_i(x)$ are linear independent, thus Jacobian $J_h(x)$ is a full rank matrix.
Similary, $\nabla g_j(x)$ are also linear independent to each other
TODO: Why Linear Independence?
TODO: Proof why $x^{\ast}$ is the local minimum
Why does it work?
Here, the Lagrange multiplier quantifies how much it puts weight on the original cost (objective). As it gets small, it gets very close to the original problem. However, if it gets larger, it only cares about feasible region, where it gets close to an “analytic center” of a polyhedron.
Vanderbei, Linear Programming, p.272 [1]
We need to say how to pick μ. If μ is chosen to be too large, then the sequence could converge to the analytic center of the feasible set, which is not our intention. If, on the other hand, μ is chosen to be too small, then the sequence could stray too far from the central path and the algorithm could jam into the boundary of the fea- sible set at a place that is suboptimal.
[2]
Equation | |
---|---|
Lagrange Function | $L(x, \lambda, \mu) = f(x) + \lambda h(x) + \mu g(x)$ |
1st Order Stationary Condition | $\nabla_x L(x^{\ast}, \lambda^{\ast}, \mu^{\ast}) = \nabla_x f(x^{\ast}) + \sum_i{\lambda_i \nabla_x h_i(x^{\ast})} + \sum_j{\mu_j \nabla_x g_j(x^{\ast})} = 0$ $\nabla_{\lambda} L(x^{\ast}, \lambda^{\ast}, \mu^{\ast}) = h_i(x^{\ast}) = 0$ $\mu_j^{\ast} g_j(x^{\ast}) = 0 \quad (j = 1, \dots, m)$ $\mu_j^{\ast} \geq 0 \quad (j = 1, \dots, m$ $g_j(x^{\ast}) \leq 0 \quad (j = 1, \dots, m)$ |
2nd Order Necessary Condition | $t^T \nabla_x^2 L(x^{\ast}, \lambda^{\ast}, \mu^{\ast}) t \geq 0$ $\text{where} \quad \forall t \in M(x^{\ast})$ $\text{where} \quad M(x) = \{x \in \mathbb{R}^n | \nabla_x g_j(x)^T t \leq 0 \quad (j=1,\dots,m), \ \nabla_x h_i(x)^T t =0 \quad (i=1,\dots, r) \}$ |
2nd Order Safficient Condition | $t^T \nabla_x^2 L(x^{\ast}, \lambda^{\ast}, \mu^{\ast}) t > 0$ $\text{where} \quad \forall t \in M(x^{\ast}) \ \& \ t \neq 0$ |
Set of $S \in \mathbb{R}^n$ is a convex set when for any $x, y\in S$, the equation below always holds true.
\[\alpha x + (1-\alpha) y \in S\]A function $f(x)$ is a convex function over a convex set $C \in \mathbb{R}^n$ when the equation below always holds true.
\[f(\alpha x + (1-\alpha)y) \leq \alpha f(x) + (1-\alpha) f(y) \quad \forall{\alpha} \in [0, 1]\]A Strictly Convex Function is:
\[f(\alpha x + (1-\alpha)y) < \alpha f(x) + (1-\alpha) f(y) \quad \forall{\alpha} \in (0, 1)\]Primal Problem | Dual Problem | |
---|---|---|
$\max c^t x$ $\text{s.t} \ Ax \leq b$ $\quad x \geq 0$ | $\max b^t y$ $\text{s.t} \ A^ty \geq c$ $\quad y \geq 0$ | |
slack | $Ax + w = b$ $x, w \geq 0$ | $A^t y + v = c$ $y, v \geq 0$ |
Weak Duality:
Strong Duality:
Thus, $c^tx = y^t b$
Proof: Vertex is a the intersection of n faces of the polyhedron. n of the inequality have to be satisfied. & It must be a part of the feasible region.
By setting n basics variables to 0 (let’s say $\omega$s), then you get n inequalities. $Ax - b = 0$. Every feasible dictionary is a vertex in polygon, since dictoinary never exists when it doesn’t satisfy inequality constraints. Set all the basics variables to 0, it must be a feasible dictionary while satisfying the inequality constraints.
Proof: Pivot moves to adjacent vertex.
1 Entry variable becomes a Basic variable, which means it is only changing only 1 face and leaving n-1 faces in common.
Apply taylor expansion to function $f(x): \mathbb{R}^n \to \mathbb{R}$ where $x \in \mathbb{R}^n$.
\[f(x_0 + \Delta x) = f(x_0) + \nabla_x f(x_0)\Delta x + o(\|\Delta x\|)\]Since $\nabla_x f(x_0)\Delta x$ is not 0 and we want this to be negative, setting $\Delta x$ as below is called Gradient Method.
However, this doesn’t ensure that it is heading straight to the minimum point. The direction of the gradient is orthogonal to the tangential to the contour $f(x)$. If this contour is a sphere, the direction of the gradient is pointing directly to the minimum point. However, the contour could be ellipsoid. Thus, conjugate gradient method is proposed.
Let $F(x) = \frac{\partial f}{\partial x}(x)$ and apply a taylor expansion around the gradient vector.
\(F(x_0 + \Delta x) = F(x_0) + \frac{\partial F}{\partial x}(x_0) \Delta x +o(\| \Delta x\|) = 0\) Then
\[\Delta x = - \Big( \frac{\partial F}{\partial x}(x_0) \Big)^{-1} F(x_0) \\ \Delta x = - \Big( \frac{\partial^2 f}{\partial x^2}(x_0) \Big)^{-1} \frac{\partial f}{\partial x}(x_0)^T\]Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method. TODO.
Non-zero Integer Set: $\mathbb{Z}_{\geq 0}$
$a \leq x \leq b$: $[a,b]$
$a < x < b$: $(a, b)$
n dim real number vector: $\mathbb{R}^n$
n x m dim real number matrix: $\mathbb{R}^{n \times m}$
$x = [x_1, …, x_n]^T \in \mathbb{R}^n$
positive semidefinite: $x^TAx \geq 0$
positive definite: $x^TAx > 0$ if $x \neq 0$
gradient vector: $\nabla_x f(x) = \frac{\partial f}{\partial x} = [\frac{\partial f}{\partial x_1}, .., \frac{\partial f}{\partial x_n}]$
Hessian: $\nabla_x^2 f(x) = \frac{\partial^2f}{\partial x_i y_j}$
Jacobian: $f(x) = [f_1(x), …, f_m(x)]$ and $x = [x_1, …, x_n]$, then
Before going to the main topic / equation, it is important to define a region, a variable and a function.
ex.)