Iris.data.csv - ValueError: x and y must be the same size

140
March 31, 2022, at 02:30 AM

I'm trying to run this code on Google colab, I got the ValueError: x and y must be the same size, I've tried multiple ways but none of them worked.

from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width','class']
df = pd.read_csv('Iris.data.csv', header=None, names=columns)
X = np.array(df.iloc[:, 0:4])   
y = np.array(df['class'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
plt.scatter(X,y)
plt.show() 
Answer 1

I think your problem might be that you haven't thought enough about what kind of scatter plot you are trying to make. The X array contains 150 samples of 4 parameters. The y array contains the classes of each data sample. How were you expecting the scatter plot to look? Remember that a scatter plot can only plot data in two dimensions, not 4!

I don't know what your 'Iris.data.csv' file contains but I used the copy of the iris data set from scikit-learn as shown in this example.

Usually, scatter plots of the Iris data set select two of the four dimensions and plot the points in these dimensions for each class using a different coloured point.

Something like this:

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
# Load data
iris = datasets.load_iris()
X = iris.data[:, :4]  # take the first 4 features
assert(X.shape == (150, 4))
y = iris.target
assert(y.shape == (150,))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
classes = np.unique(y)
assert(classes.shape == (3, ))
# Select dimensions to plot
dim1 = X[:, 0]  
dim2 = X[:, 1]
# Make a scatter plot
fig, ax = plt.subplots()
for c in classes:
    pts = (y == c)
    ax.scatter(dim1[pts], dim2[pts])
ax.grid()
plt.show()

Rent Charter Buses Company
READ ALSO
iPython console in Spyder does not enter in debug

iPython console in Spyder does not enter in debug

I removed and started with a fresh new installation of AnacondaJust after installation I created a custom environment (cust_env) Both the cust_env and the base environment have Python 3

124
Plot the multiple values returned by a function

Plot the multiple values returned by a function

My function returns 2 different values which I want to utilise in 2 different graphs using MatplotlibHow can I achieve it?

111
How to Measure Similarity or Difference of Meaning Between Words? [closed]

How to Measure Similarity or Difference of Meaning Between Words? [closed]

Want to improve this question? Update the question so it's on-topic for Stack Overflow

97
Is there anyway to shorten the amount of lines used for having multiple turtles?

Is there anyway to shorten the amount of lines used for having multiple turtles?

So I'm trying to make a program where I want to have many turtlesBut I was wondering if instead of writing every single name to instate a new turtle being made, I could make it as short as one line if it's possible on Python turtle

109