I have this pipeline:
diamonds = sns.load_dataset("diamonds") # Build feature/target arrays X, y = diamonds.drop("cut", axis=1), diamonds["cut"] # Set up the colnames to_scale = ["depth", "table", "x", "y", "z"] to_log = ["price", "carat"] categorical = X.select_dtypes(include="category").columns scale_pipe = make_pipeline(StandardScaler()) log_pipe = make_pipeline(PowerTransformer()) categorical_pipe = make_pipeline(OneHotEncoder(sparse=False)) transformer = ColumnTransformer( transformers=[ ("scale", scale_pipe, to_scale), ("log_transform", log_pipe, to_log), ("oh_encode", categorical_pipe, categorical), ] ) knn_pipe = Pipeline([("prep", transformer), ("knn", KNeighborsClassifier())]) # Fit/predict/score _ = knn_pipe.fit(X_train, y_train) preds = knn.predict(X_test)
When I run it, it is fitting to the data perfectly fine but I can’t score or make predictions because I am getting this error:
ValueError: could not convert string to float: 'G' The above exception was the direct cause of the following exception: ValueError: Unable to convert array of bytes/strings into decimal numbers with dtype='numeric'
It is a classification problem, so I thought the reason for the error was because I didn’t encode the target. But even after using LabelEncode on the target, I am still getting the same error.
What might be the reason? I tried the pipeline with other models too. The error is the same. BTW, I am using the built-in Diamonds dataset of Seaborn.
It looks like you did not predict the values for
X_test with your
knn_pipe. The variable
knn that you use in your last line is actually undefined in the example you provide. I guess you have defined it somewhere in the original and thus see this error message.
Anyway, just change
preds = knn.predict(X_test)
preds = knn_pipe.predict(X_test)
and it will work.