ML1: Introduction to Machine Learning — House Pricing

Data Science Bootcamp is completed. Let me remind you my path:

Data Science Bootcamp (completed)
Core track (12 projects on key DS and ML topics)
Practice (8 projects)

This project marks my official entry into the field of Machine Learning. While previous modules focused on general-purpose tools and data manipulation, this is where the real journey begins: transitioning from observing static charts to building algorithms capable of predicting future outcomes.

Topics

Foundational ML: Understanding the conceptual shift from Rule-Based to Learning-Based systems.
Supervised Learning: Practical implementation of Regression and Classification tasks.
Exploratory Data Analysis (EDA): Identifying patterns, correlations, and anomalies in real-world data.
Model Benchmarking: Establishing Baselines using naive models to measure true algorithmic value.
Performance Metrics: Using MAE and RMSE to quantify and interpret prediction accuracy.

Roadmap

1. Foundations & Hypotheses

The work began with a theoretical deep dive. I explored the fundamental classification of tasks, determining why predicting rental prices is a Regression problem and defining the boundaries between multiclass and multilabel classification.

2. Statistical Data Auditing

One does not simply "feed" raw data into a model. I conducted a rigorous audit of the Kaggle RentHop dataset:

Target Analysis: Visualized price distributions and identified extreme outliers that could compromise model integrity.
Data Cleaning: Applied statistical filtering (1st–99th percentile) to strip away noise and focus on the representative data range.
Correlation Study: Utilized Heatmaps and Scatterplots to quantify how features like bedrooms or bathrooms actually impact the final cost.

3. Feature Generation

I experimented with data complexity by creating Polynomial Features up to the 10th degree. This stage was essential for understanding the trade-off between model flexibility and computational cost, observing how feature transformations can capture non-linearities.

4. The Model Showdown

To establish a performance baseline, I trained and compared three distinct approaches:

Linear Regression: The primary baseline for predicting continuous values.
Decision Tree Regressor: An introduction to non-linear, tree-based data splitting.
Naive Models (Mean/Median): A critical "sanity check" to ensure that the developed models provide genuine predictive power over simple averages.

Results

The project concluded with a comprehensive evaluation using MAE and RMSE. By comparing error rates across different architectures, I identified the most effective model for this dataset. This module taught me that Machine Learning is not just about writing code; it’s about asking the right questions and methodically searching for answers within the data.

How to Run the Project

Clone the repository:

git clone https://github.com/knight99rus/ML1_Introduction.git
cd ML1_Introduction

Create and activate a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # For Windows: venv\Scripts\activate

Install dependencies:

pip install jupyter pandas numpy scikit-learn matplotlib seaborn scipy statsmodels lightgbm

Download data:
- Read the task on the Kaggle competition page.
- Download test.json file.
Launch Jupyter Notebook:
```
jupyter notebook
```
Open and execute the cells in the project01.ipynb file.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
code-samples		code-samples
data-samples		data-samples
datasets		datasets
materials		materials
misc		misc
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML1: Introduction to Machine Learning — House Pricing

Topics

Roadmap

1. Foundations & Hypotheses

2. Statistical Data Auditing

3. Feature Generation

4. The Model Showdown

Results

How to Run the Project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ML1: Introduction to Machine Learning — House Pricing

Topics

Roadmap

1. Foundations & Hypotheses

2. Statistical Data Auditing

3. Feature Generation

4. The Model Showdown

Results

How to Run the Project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages