What is the difference between the terms accuracy and validation accuracy

37
July 16, 2018, at 00:20 AM

I have used LSTM from Keras to build a model that can detect if two questions on Stack overflow are duplicate or not. When I run the model I see something like this in the epochs.

Epoch 23/200
727722/727722 [==============================] - 67s - loss: 0.3167 - acc: 0.8557 - val_loss: 0.3473 - val_acc: 0.8418
Epoch 24/200
727722/727722 [==============================] - 67s - loss: 0.3152 - acc: 0.8573 - val_loss: 0.3497 - val_acc: 0.8404
Epoch 25/200
727722/727722 [==============================] - 67s - loss: 0.3136 - acc: 0.8581 - val_loss: 0.3518 - val_acc: 0.8391

I am trying to understand the meaning of each of these terms. Which of the above values is the accuracy of my model. I am comparatively new to machine learning, so any explanation would help.

Answer 1

When training a machine learning model, one of the main things that you want to avoid would be overfitting. This is when your model fits the training data well, but it isn't able to generalize and make accurate predictions for data it hasn't seen before.

To find out if their model is overfitting, data scientists use a technique called cross-validation, where they split their data into two parts - the training set, and the validation set. The training set is used to train the model, while the validation set is only used to evaluate the model's performance.

Metrics on the training set let you see how your model is progressing in terms of it's training, but it's metrics on the validation set that let you get a measure of the quality of your model - how well it's able to make new predictions based on data it hasn't seen before.

With this in mind, loss and acc are measures of loss and accuracy on the training set, while val_loss and val_acc are measures of loss and accuracy on the validation set.

At the moment your model has an accuracy of ~86% on the training set and ~84% on the validation set. This means that you can expect your model to perform with ~84% accuracy on new data.

I notice that as your epochs goes from 23 to 25, your acc metric increases, while your val_acc metric decreases. This means that your model is fitting the training set better, but is losing it's ability to predict on new data, indicating that your model is starting to fit on noise and is beginning to overfit.

So that is a quick explanation on validation metrics and how to interpret them.

READ ALSO
How to add hours to specific timestamps in a pandas df

How to add hours to specific timestamps in a pandas df

I have a pandas df that contains a Column of timestampsSome of the timestamps are after midnight

44
Problems Creating CSV from Python with Array of Arrays

Problems Creating CSV from Python with Array of Arrays

So I have a python that collects data points from various DAQ units and all I want to do is spit them out into a CSVI had the code working perfectly with my sample arrays, but I can't get this working for the life of me now

56
Can I pip install python3.6?

Can I pip install python3.6?

If I have an older version of python (eg

61
Tableau Tabpy Lowess Regression in Script_Real not working.

Tableau Tabpy Lowess Regression in Script_Real not working.

I am attempting to run the following python code as a calculated variable in Tableau with tabpy server running:

62