ValueError: Unable to convert array of bytes/strings into decimal numbers with dtype=’numeric’

I have this pipeline:

diamonds = sns.load_dataset("diamonds")

# Build feature/target arrays
X, y = diamonds.drop("cut", axis=1), diamonds["cut"]

# Set up the colnames
to_scale = ["depth", "table", "x", "y", "z"]
to_log = ["price", "carat"]
categorical = X.select_dtypes(include="category").columns

scale_pipe = make_pipeline(StandardScaler())
log_pipe = make_pipeline(PowerTransformer())
categorical_pipe = make_pipeline(OneHotEncoder(sparse=False))

transformer = ColumnTransformer(
        ("scale", scale_pipe, to_scale),
        ("log_transform", log_pipe, to_log),
        ("oh_encode", categorical_pipe, categorical),

knn_pipe = Pipeline([("prep", transformer), ("knn", KNeighborsClassifier())])

# Fit/predict/score
_ =, y_train)
preds = knn.predict(X_test)

When I run it, it is fitting to the data perfectly fine but I can’t score or make predictions because I am getting this error:

ValueError: could not convert string to float: 'G'

The above exception was the direct cause of the following exception:

ValueError: Unable to convert array of bytes/strings into decimal numbers with dtype='numeric'

It is a classification problem, so I thought the reason for the error was because I didn’t encode the target. But even after using LabelEncode on the target, I am still getting the same error.
What might be the reason? I tried the pipeline with other models too. The error is the same. BTW, I am using the built-in Diamonds dataset of Seaborn.


It looks like you did not predict the values for X_test with your knn_pipe. The variable knn that you use in your last line is actually undefined in the example you provide. I guess you have defined it somewhere in the original and thus see this error message.

Anyway, just change

preds = knn.predict(X_test)


preds = knn_pipe.predict(X_test)

and it will work.

Exit mobile version