Week 6 CST 383
This week in CST 383, we continued to supplement our learning about machine learning ideas with a focus on hyperparameter tuning, kNN regression, linear regression, and evaluating or assessing the regressors. I thought this week was very interesting because we built upon the idea of kNN and how it does not entirely require parameters due to being an instance based method. From my understanding, instance based means that it does not automatically learn the best settings to use for training and instead take the training data and make predictions by comparing the new points of data to the already existing examples. In this case it does not guarantee the best refinement for a model and it may not perform well. To ensure that we get the best results and performance we need to apply some hyperparameters. The lecture discussed that we should test different hyperparameters to find the best settings for the kNN modeling, which includes testing different combinations of the neighbor values k, the different distance functions like Euclidiean or Manhattan, and lastly the weighting which involves the way we count the neighbors. We can also test the different combinations of hyperparameters involving certain cross validation folds.
Before this week, I thought just finding the k-values would be enough, but now I am starting to grasp that kNN performance depends on the specific decision and settings to build a strong performance model. Additionally, we took a look at a way to find these proper settings by utilizing the GridSearch with GridSearchCV. The way to do this is by feeding a grid of parameters to the grid search object and using cross validation folds to train, the grid search on its own begins guessing and tries every possible combination and evaluates each to allow us to find the best performing one without just guessing. Though the problem with grid search comes from how many combinations it has to try, and with a large dataset this could be a slow process. So in this case we can utilize the Randomized Search to sample different combinations instead of going through all of them. With these different ways to search for the best hyperparameter combination, I am starting to comprehend how to decide the proper hyperparameter tuning methods for different models and datasets. It leaves me thinking over the possible trade-offs and costs of finding the proper tuning. Finally it got me thinking if we would be doing something similar to more complex models in the future.
Something else in this week that we went over was learning about kNN and its usage for regression. Rather than just predicting the category, instead we can use kNN to predict a number by finding the average of the target values of the nearest neighbors. This made me realize that the kNN algorithm could be used for different scenarios. Something else we uncovered this week was linear regression, this concept felt quite different from kNN. To me Linear Regression felt a bit easier for me to understand compared to kNN because rather than looking for hyperparameters to tune we are paying more attention to actual parameters or coefficients and rather we are looking for a specific value compared to finding a category. Additionally, I liked that linear regressions parameter equation was basically the intercept slope in Algebra, so this supported my understanding even more. Something that caught my attention in regards to linear regression was utilizing Dummy variables, which from my understanding is that we can allow categorical variables to be used in regression as quantitative variables. I understand the basic idea of converting categories into numeric columns, but I think I want more practice interpreting the results, especially when one category becomes the reference group.
One thing also that I need to keep in mind is in regards to scaling. The lecture mentioned that linear regression does not require scaling because the model is based on solving an equation directly rather than relying on distances like kNN. For kNN, scaling is important because the model utilizes distance, so if one predictor value has values in a larger context and another is very small, the larger predictor may take over in the distance calculation. I am still a bit confused on why scaling is not needed for linear regression in regards to using SciKit-Learn, the lecture mentioned that SciKit-Learn in a sense uses direct mathematical function to find optimal coefficients. I want to look more closely at how this works under the hood, because I am still curious about how the model actually compensates for features on very different scales. What is the formula used to find the optimal coefficients?
The last thing I will discuss is the assessing of regressors. For assessing regressors, I learned that accuracy does not work because regression predictions are numbers, and instead we use error metrics like MSE, RMSE, and MAE. Rather than finding the highest test accuracy like in kNN we actually look for the lowest value of the error metrics mentioned before and compare it to the baseline results. In regards to this I’m still wrapping my head around the difference between those metrics. I understand that each is a measurement of the prediction error, I know how to calculate for them, and that lower values are better, but I am still trying to grasp what each emphasizes and the context of why we use them. This is something I’ll have to review again in the next week to get a better understanding.
Overall this week has been very productive, it made me think more about how different models learn and what they require to learn. Something that I can take away from this week is that machine learning is not just about running the algorithms, but rather it's more so about making correct decisions such as figuring out the correct methods to get optimal hyperparameter tuning, scaling, and even in the context of regression like figuring out the proper evaluations. I definitely need to review some material from this week. As we move forward in our last two weeks, I’m curious to see how these ideas will apply to other models and what new decisions and trade-offs I’ll need to consider with the different methods in the future.


Comments
Post a Comment