Summary table of models + methods

Introduction

Throughout the course, we will go over several supervised and unsupervised machine learning models. This page summarizes the models.

Method Strengths Limitations Example Use Cases Implementation
  • Simple and fast
  • Works well with small datasets
  • May not handle complex data relationships
  • Sensitive to outliers
  • Basic data analysis
  • Quick data cleaning
  • Can capture the relationships between features
  • Works well with moderately missing data
  • Computationally intensive for large datasets
  • Sensitive to the choice of k
  • Medical data analysis
  • Market research
  • Effective for matrix completion in large datasets
  • Works well with low-rank data
  • Assumes low-rank data structure
  • Can be sensitive to hyperparameters
  • Recommender systems
  • Large-scale data projects
  • Can model complex relationships
  • Suitable for multiple imputation
  • Computationally expensive
  • Depends on the choice of model
  • Complex datasets with multiple types of missing data
  • Good for matrix completion with low-rank assumption
  • Handles larger datasets
  • Sensitive to rank selection
  • Computationally demanding
  • Image and video data processing
  • Large datasets with structure
  • Useful for recommendation systems
  • Can handle large-scale problems
  • Requires careful tuning
  • Not suitable for all types of data
  • Recommendation engines
  • User preference analysis
  • Theoretically strong for matrix completion
  • Finds the lowest rank solution
  • Very computationally intensive
  • Impractical for very large datasets
  • Research in theoretical data completion
  • Small to medium datasets
  • Normalizes data effectively
  • Often used as a preprocessing step
  • Not an imputation method itself
  • Doesn’t always converge
  • Preprocessing for other imputation methods
  • Data normalization
Model Type Strengths Limitations Example Use Cases Implementation
  • Simple and interpretable
  • Fast to train
  • Assumes linear boundaries
  • Not suitable for complex relationships
  • Credit approval
  • Medical diagnosis
  • Intuitive
  • Can model non-linear relationships
  • Prone to overfitting
  • Sensitive to small changes in data
  • Customer segmentation
  • Loan default prediction
  • Handles overfitting
  • Can model complex relationships
  • Slower to train and predict
  • Black box model
  • Fraud detection
  • Stock price movement prediction
  • Effective in high dimensional spaces
  • Works well with clear margin of separation
  • Sensitive to kernel choice
  • Slow on large datasets
  • Image classification
  • Handwriting recognition
  • Simple and intuitive
  • No training phase
  • Slow during query phase
  • Sensitive to irrelevant features and scale
  • Product recommendation
  • Document classification
  • Capable of approximating complex functions
  • Flexible architecture Trainable with backpropagation
  • Can require a large number of parameters
  • Prone to overfitting on small data Training can be slow
  • Pattern recognition
  • Basic image classification
  • Function approximation
  • Can model highly complex relationships
  • Excels with vast amounts of data State-of-the-art results in many domains
  • Requires a lot of data Computationally intensive
  • Interpretability challenges
  • Advanced image and speech recognition
  • Machine translation
  • Game playing (like AlphaGo)
  • Fast
  • Works well with large feature sets
  • Assumes feature independence
  • Not suitable for numerical input features
  • Spam detection
  • Sentiment analysis
  • High performance
  • Handles non-linear relationships
  • Prone to overfitting if not tuned
  • Slow to train
  • Web search ranking
  • Ecology predictions
  • Transparent and explainable
  • Easily updated and modified
  • Manual rule creation can be tedious
  • May not capture complex relationships
  • Expert systems
  • Business rule enforcement
  • Reduces variance
  • Parallelizable
  • May not handle bias well
  • Random Forest is a popular example
  • Reduces bias
  • Combines weak learners
  • Sensitive to noisy data and outliers
  • AdaBoost
  • Gradient Boosting
  • Scalable and efficient
  • Regularization
  • Requires careful tuning
  • Can overfit if not used correctly
  • Competitions on Kaggle
  • Retail prediction
  • Dimensionality reduction
  • Simple and interpretable
  • Assumes Gaussian distributed data and equal class covariances
  • Face recognition
  • Marketing segmentation
  • Prevents overfitting
  • Handles collinearity
  • Requires parameter tuning
  • May result in loss of interpretability
  • Ridge and Lasso regression
  • Combines multiple models
  • Can improve accuracy
  • Increases model complexity
  • Risk of overfitting if base models are correlated
  • Meta-modeling
  • Kaggle competitions
Model Type Strengths Limitations Example Use Cases Implementation
  • Simple and interpretable
  • Assumes linear relationship
  • Sensitive to outliers
  • Sales forecasting
  • Risk assessment
  • Can model non-linear relationships
  • Can overfit with high degrees
  • Growth prediction
  • Non-linear trend modeling
  • Prevents overfitting
  • Regularizes the model
  • Does not perform feature selection
  • High-dimensional data
  • Preventing overfitting
  • Feature selection
  • Regularizes the model
  • May exclude useful variables
  • Feature selection
  • High-dimensional datasets
  • Balance between Ridge and Lasso
  • Requires tuning for mixing parameter
  • High-dimensional datasets with correlated features
  • Models the median or other quantiles
  • Less interpretable than ordinary regression
  • Median house price prediction
  • Financial quantiles modeling
  • Flexible
  • Can handle non-linear relationships
  • Sensitive to kernel and hyperparameters
  • Stock price prediction
  • Non-linear trend modeling
  • Handles non-linear data
  • Interpretable
  • Can overfit on noisy data
  • Price prediction
  • Quality assessment
  • Handles large datasets
  • Reduces overfitting
  • Requires more computational resources
  • Large datasets
  • Environmental modeling
  • High performance
  • Can handle non-linear relationships
  • Prone to overfitting if not tuned
  • Web search ranking
  • Price prediction
Model Type Strengths Limitations Example Use Cases Implementation
  • Simple and widely used
  • Fast for large datasets
  • Sensitive to initial conditions
  • Requires specifying the number of clusters
  • Market segmentation
  • Image compression
  • Doesn’t require specifying the number of clusters
  • Produces a dendrogram
  • May be computationally expensive for large datasets
  • Taxonomies
  • Determining evolutionary relationships
  • Can find arbitrarily shaped clusters
  • Doesn’t require specifying the number of clusters
  • Sensitive to scale
  • Requires density parameters to be set
  • Noise detection and anomaly detection
  • Variety of linkage criteria
  • Produces a hierarchy of clusters
  • Not scalable for very large datasets
  • Sociological hierarchies
  • Taxonomies
  • No need to specify number of clusters
  • Can find arbitrarily shaped clusters
  • Computationally expensive
  • Bandwidth parameter selection is crucial
  • Image analysis
  • Computer vision tasks
  • Automatically determines the number of clusters
  • Good for data with lots of exemplars
  • High computational complexity
  • Preference parameter can be difficult to choose
  • Image recognition
  • Data with many similar exemplars
  • Can capture complex cluster structures
  • Can be used with various affinity matrices
  • Choice of affinity matrix is crucial
  • Can be computationally expensive
  • Image and speech processing
  • Graph-based clustering
Method Strengths Limitations Example Use Cases Implementation
  • Dimensionality reduction
  • Preserves variance
  • Linear method
  • Not for categorical data
  • Feature extraction
  • Data compression
  • Captures non-linear structures
  • Good for visualization
  • Computationally expensive
  • Not for high-dimensional data
  • Data visualization
  • Exploratory analysis
  • Dimensionality reduction
  • Non-linear relationships
  • Neural network knowledge
  • Computationally intensive
  • Feature learning
  • Noise reduction
  • Effective for high-dimensional data
  • Fast and scalable
  • Randomized
  • May miss some anomalies
  • Fraud detection
  • Network security
  • Matrix factorization
  • Efficient for large datasets
  • Assumes linear relationships
  • Sensitive to scaling
  • Recommender systems
  • Latent semantic analysis
  • Identifies independent components
  • Signal separation
  • Non-Gaussian components
  • Sensitive to noise
  • Blind signal separation
  • Feature extraction
Method Strengths Limitations Example Use Cases Implementation
  • Well-known and widely used
  • Easy to understand and implement
  • Can be slow on large datasets
  • Generates a large number of candidate sets
  • Market basket analysis
  • Cross-marketing strategies
  • Faster than Apriori
  • Efficient for large datasets
  • Memory intensive
  • Can be complex to implement
  • Frequent itemset mining in large databases
  • Customer purchase patterns
  • Faster than Apriori
  • Scalable and easy to parallelize
  • Limited to binary attributes
  • Generates many candidate itemsets
  • Market basket analysis
  • Binary classification tasks
  • Identifies sequential patterns
  • Flexible for various datasets
  • Can be computationally expensive
  • Not as efficient for very large databases
  • Customer purchase sequence analysis
  • Event sequence analysis
  • Efficient for mining sequential rules
  • Works well with sparse datasets
  • Requires careful parameter setting
  • Less known and used than Apriori or FP-Growth
  • Analyzing customer shopping sequences
  • Detecting patterns in web browsing data
Technique Strengths Limitations Example Use Cases Implementation
  • Simple and intuitive
  • Effective for balanced datasets
  • Misleading for imbalanced datasets
  • Doesn’t reflect true positives/negatives
  • General classification problems
  • Comparing baseline models
  • Effective for binary classification
  • Good for imbalanced datasets
  • Can be overly optimistic in imbalanced data
  • Not threshold-specific
  • Medical diagnosis classification
  • Fraud detection models
  • Focuses on positive class
  • Reduces false positives
  • Ignores false negatives
  • Not useful alone in imbalanced datasets
  • Spam detection
  • Content moderation systems
  • Identifies actual positives well
  • Minimizes false negatives
  • Ignores false positives
  • Can be misleading if positives are rare
  • Disease outbreak detection
  • Recall-focused tasks
  • Balances precision and recall
  • Useful for imbalanced datasets
  • May not reflect true model performance
  • Depends on balance of precision and recall
  • Customer churn prediction
  • Sentiment analysis
  • Reduces overfitting
  • Provides robust model evaluation
  • Computationally expensive
  • May not be ideal for very large datasets
  • General model evaluation
  • Comparing multiple models
  • Simple and easy to implement
  • Good for initial model assessment
  • Can lead to overfitting
  • Dependent on the split
  • Quick model prototyping
  • Small datasets
  • Very detailed
  • Each observation used for validation exactly once
  • Computationally intensive
  • Not suitable for large datasets
  • Small but rich datasets
  • Highly sensitive models
  • Balances computational cost and validation accuracy
  • Suitable for various data sizes
  • Variability in results depending on how data is divided
  • Choice of ‘k’ can impact results
  • Medium-sized datasets
  • Model selection
  • Good for estimating model accuracy
  • Effective for small datasets
  • Results can be sensitive to outliers
  • May overestimate accuracy for small datasets
  • Small or medium-sized datasets
  • Uncertainty estimation