How do I modify this function to accept multiple Dataframes?

118
March 17, 2018, at 9:56 PM

I wrote this function and I would like it to accept more than one DF so that the final plot has multiple plotted lines for the predictions and the coef_DF gets completed with the rest of the coefficients.

The function extracts the needed features and target from a much larger dataset to make predictions using a linear regression func, it then makes the model, plots the line over the dataset and returns a df with all the coeficients.

(This is just an exercise.)

def prep_model_and_predict(feature, target, dataset, degree):

    # part 1: make a df with relevant format and features 
        # degree >=1
    poly_df=pd.DataFrame()
    poly_df[str(target)] = dataset[str(target)]
    poly_df['power_1']   = dataset[str(feature)]
    #cehck if degree >1
    if degree > 1:
        for power in range(2, degree+1): #loop over reaming deg
            name = 'power_'+str(power)
            poly_df[name]=poly_df['power_1'].apply(lambda x: x**power)
    #part 2: make model and predictions
    features=list(poly_df.columns[1:])
    X=poly_df[features]
    y=poly_df[str(target)]
    model=LinearRegression().fit(X,y)
    predictions=model.predict(X)
    #part 3: put weghts in a nice df
    coef_df=pd.DataFrame()
    coef_df=coef_df.append({"Name":'Intercept', 'Value':model.intercept_},     ignore_index=True)
    coef_df=coef_df.append({'Name':'Power_1',   'Value':model.coef_[0]},   ignore_index=True)
    if degree > 1:
        for degree in range(2, degree+1):
            name = 'Power_' + str(degree)
            coef_df = coef_df.append({"Name":name, 
                                      'Value':'{:.3e}'.format(model.coef_[degree-1])}, ignore_index=True)
    #prt 4: plot it
    fig, ax = plt.subplots()
    ax.plot(poly_df['power_1'], poly_df[str(target)], '.',
            poly_df['power_1'], predictions, '-')
    ax.set_xlabel('Square footage, living area')
    ax.set_ylabel('Price per Sqft')
    ax.ticklabel_format(axis='y', style='sci', scilimits=(-2,2))
    return coef_df, ax

and this is the result:

         Name        Value
0   Intercept       506738
1     Power_1  2.71336e-77
2     Power_2    7.335e-39
3     Power_3   -1.850e-44
4     Power_4    8.437e-50
5     Power_5    0.000e+00
6     Power_6    0.000e+00
7     Power_7    3.645e-55
8     Power_8    1.504e-51
9     Power_9    5.760e-48
10   Power_10    1.958e-44
11   Power_11    5.394e-41
12   Power_12    9.404e-38
13   Power_13   -3.635e-41
14   Power_14    4.655e-45
15   Power_15   -1.972e-49

much appreciated!

Answer 1

I am not sure what exactly you are asking for. But I would suggest, next time try to ask a question that is easily produce-able and runnable by other people here in SO.

I have tried to answer your questions. Correct me if I misunderstand your question.

  • Pass arbitrary number of DataFrame to your function and plot it:

I have created three random dataframes for use:

df1 = pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=list('AB'))
df2 = pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=list('AB'))
df3 = pd.DataFrame(np.random.randint(0,10,size=(10, 2)), columns=list('AB'))

The functions that plots them:

def plot_me(*kwargs):
    plt.figure(figsize=(13,9))
    lab_ind = 0
    for i in kwargs:
        plt.plot(i['A'], i['B'], label = lab_ind)
        lab_ind += 1
    plt.legend()
    plt.show()

The result plot you get:

  • Put the results of your model into a DataFrame

Regarding your second question, I am not going to concentrate too much on your exact details - for example the name of the columns of your dataframe, etc.

For this particular example I have generated two random arrays:

X = np.random.randint(0,50 ,size=(50, 2))
y = np.random.randint(0,2 ,size=(50, 1))

Then fit a LinearRegression model on this data.

model=LinearRegression().fit(X,y)
predictions=model.predict(X)

And then add it to a DataFrame:

res_df = pd.DataFrame(predictions,columns = ['Value'])

And if you print res_df

    Value
0   0.420395
1   0.459389
2   0.369648
3   0.416058
4   0.644088
5   0.362072
6   0.363157
7   0.468943
.      .
.      .
READ ALSO
Can we put in ascending order a list (input ()) in Python3

Can we put in ascending order a list (input ()) in Python3

Hi everybody i make exercise with France IOI but i would know how i can ascending the values of list(input()) like a list

177
Python 2.7 BeautifulSoup find_all() Doesn't Find All Requested Elements

Python 2.7 BeautifulSoup find_all() Doesn't Find All Requested Elements

I am seeing some strange behavior with BeautifulSoup as demonstrated in the example below

154
MatPlotLib: Scatter with multiple y values, and multiple data sets

MatPlotLib: Scatter with multiple y values, and multiple data sets

I want to create a scatter plot to measure the performance of my algorithm (n-queens)I am wanting to plot 100 run times per n

178