**SVMs** adapt the __Maximal Margin Classifier__ for linearly non-separable data using slack variables and kernel functions, focusing on __soft margin classification for linear data__ and the __kernel trick for non-linear cases__.

**Objective Function**: Minimize the following objective to find the optimal hyperplane

\(\min_{w, b, \xi} \; \frac{1}{2}\|w\|^2 + C\sum_{i=1}^{n}\xi_i\)

Subject to \(y_i(w^{T}x + b) \geq 1 - \xi_i\) for all \(i\), where:

\(w\) is the weight factor

\(b\) is the bias

\(\xi_i\) are slack variables representing the degree of misclassification of \(x_i\)

\(C\) is the is the regularization parameter controlling the trade-off between margin maximization and classification error.

**Dual Problem Solution**: The SVM optimization problem in its dual form allows the incorporation of kernel functions:

\(\max_{\alpha} \; \sum_{i=1}^{n}\alpha_i - \frac{1}{2}\sum_{i,j=1}^{n}\alpha_i \alpha_j y_i y_j K(x_i, x_j)\)

Subject to \(0 \leq \alpha_1 \leq C\) for all \(i\) and \(\sum_{i=1}^{n}\alpha_iy_i = 0\), where:

\(\alpha_i\) Lagrange multipliers

\(K(x_i, x_j)\) is the kernel function evaluating the dot product of \(x_i\) and \(x_j\) in the transformed feature space.

**Slack Variables** \(\xi_i\): Allow for flexibility in classification by permitting data points to be within the margin or incorrectly classified, i.e., the soft margin approach.

**Regularization Parameter (**\(C\)**)** : Balances the trade-off between achieving a wide margin and minimizing the classification error; higher \(C\) values lead to less tolerance for misclassification.

**Kernel Functions**: Transform the original feature space into a higher-dimensional space, enabling SVMs to find a separating hyperplane in cases where data is not linearly separable. Common kernels include linear, polynomial, RBF, and sigmoid.

**Dual Formulation**: Simplifies the problem by focusing on Lagrange multipliers, allowing the use of kernel functions and making the problem solvable even when the feature space is high-dimensional or infinite.

**Support Vectors**: Data points corresponding to non-zero \(\alpha_i\) values; these are the critical elements that define the hyperplane and margin.

**Decision Function**: For a new data point \(x\), the decision function becomes \(\text{sign}\left(\sum_{i=1}^{n} \alpha_i y_i K(x_i, x) + b\right)\), determining the class membership based on the sign of the output.