Introduction to Regression

 

 What is a Regression Problem?

A regression problem is a type of supervised machine learning task where the goal is to predict a continuous value based on input features.

๐Ÿ“Œ Examples of Regression Problems:

  • Predicting house prices based on size, location, and number of rooms.

  • Estimating the temperature for tomorrow based on weather conditions.

  • Predicting student scores based on study hours.

In all these examples, the output is a number (not a category), so we use regression models.


Key Concepts


TermMeaning
Features (X)    Input variables (e.g., hours studied)
Target (y)    The value we want to predict (e.g., score)
Model    A function that learns the relationship between X and y
Training    Feeding the model with known X and y values to learn the pattern
Prediction    Using the model to estimate unknown y for a given X


๐Ÿ Python Example: Simple Linear Regression

We'll use pandas, scikit-learn, and matplotlib (optional for plotting).

๐Ÿ”ง Step 1: Sample CSV File

Assume we have a CSV file named student_scores.csv with the following contents:

Hours,Score
2.5,21
5.1,47
3.2,27
8.5,75
3.5,30
1.5,20
9.2,88

This file has:

  • Input feature: Hours studied

  • Output/Target: Score

Python Code

import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt  # optional for plot

# Step 1: Read the CSV file
data = pd.read_csv('student_scores.csv')

# Step 2: Separate the input (X) and output (y)
X = data[['Hours']]   # 2D array
y = data['Score']     # 1D array

# Step 3: Create the Linear Regression model
model = LinearRegression()

# Step 4: Train the model using the data
model.fit(X, y)

# Step 5: Make a prediction (e.g., for 6.5 hours of study)
predicted_score = model.predict([[6.5]])
print(f"Predicted Score for 6.5 hours of study: {predicted_score[0]:.2f}")

# Optional: Plot the data and regression line
plt.scatter(X, y, color='blue')  # actual data points
plt.plot(X, model.predict(X), color='red')  # regression line
plt.xlabel("Hours Studied")
plt.ylabel("Score")
plt.title("Study Hours vs Score")
plt.show()

Output
Predicted Score for 6.5 hours of study: 59.58

  • Blue dots = actual data

  • Red line = prediction line

๐Ÿ’ก Notes:

  • X = data[['Hours']] uses double square brackets because scikit-learn expects a 2D array for features.

  • model.fit(X, y) tells the model to learn the best-fit line.

  • model.predict([[6.5]]) returns the predicted score for 6.5 hours.


Comments

Popular posts from this blog

Python for Artificial Intelligence MNCST319 KTU BTech CS Minor 2024

Python for Artificial Intelligence MNCST319 KTU Minor 2024 - course details and syllabus

Python for Artificial Intelligence MNCST 319 KTU 2024 Scheme Minor Model Question Paper