How to use sklearn Pipeline with custom Features? [on hold]
I am doing text classification using Python and sklearn. I have some custom Features which I use in addition to vectorizers. I would like to know whether it is possible to use them with sklearn...
View ArticleUsing sklearn.svm.SVC for binary classification and getting 0% accuracy!
I am using the default SVC with rbf kernel to do a leave one out procedure for training and predicting, i.e. I am leaving one sample out at a time for both X and y and using the rest of the samples to...
View ArticleIs it necessary to use warm_start when tracking oob_score in scikit...
I’m planning on doing feature-selection with RandomForestClassifier by using the feature_importances and oob_score. My plan is to recursively drop the 20% least important features and measure the OOB...
View ArticleModelling house energy production using month as a variable
I’m attempting to model the energy production of a set of houses for which data on temperature and daylight over 22 months is available. The data is arranged such as such: Label House Year Month...
View ArticleDesigning a training set for regression on probabiltiy values given time ,...
Assume we have following variables out of which “Probability of sale ” needs to be predicted , and this is to be done for a portable business vendor whose location changes with time : Business street...
View ArticleImage Segmentation with a challenging background
[cross-posted from datascience, as no answers received] I’m working on an animal classification problem, with the data extracted from a video feed. The recording was made in a pen, so the problem is...
View ArticleDifference in partial dependence calculated by R and Python
I noticed there’s a difference in partial dependence calculated by R package gbm and Python’s scikit-learn. Here’s gbm‘s partial dependence of median value on median income of the California housing...
View ArticleFind max value of random forest regressor output
I was wondering, for scikit learns regressors (extra trees, random forest regressor etc), how can i find the combination of inputs that would give me the max value of the target variable? Other than...
View ArticleDefault threshold for clf.predict? [on hold]
I have some data that I have been learning on (with nested cross val etc). I am trying to compare two sets with slightly different hyperparameter values. They both have very similar values for the...
View ArticleLinear Regression uses all the cores in the server [on hold]
I am using the following code using scikit-learn to create a linear regression model which is essentially used to fit a labelled training data and predict values for the test data. However, the...
View Article