XGBoost + SHAP: Building an Explainable House Price Predictor (R²=0.88)

XGBoostSHAPFlask APIExplainability

Predicting house prices is easy. Predicting them accurately and explaining exactly why the model arrived at that number — that's the harder problem. That's what I built.

Why Explainability Matters in ML

A model that says "this house is worth ₹45 lakhs" is useful. A model that says "this house is worth ₹45 lakhs — primarily because it's 3BHK (+₹8L), in a high-demand neighborhood (+₹12L), built post-2015 (+₹5L), but loses value since it has no covered parking (-₹3L)" is actually actionable.

That's the difference SHAP makes. And that's exactly what users see when they use this application.

Feature Engineering: Where the R² Lives

Raw data is never model-ready. Here's what I engineered beyond the baseline features:

price_per_sqft — normalized area relative price signal
age_of_property — derived from year_built, accounts for depreciation curve
location_tier — locality encoded into demand tiers (1-5) based on median historical prices
amenity_score — weighted sum of amenities (gym, parking, swimming pool, etc.)
distance_to_hub — Euclidean distance to nearest commercial/IT hub

Feature engineering alone moved the baseline R² from 0.72 to 0.84. The remaining 0.04 came from hyperparameter tuning.

XGBoost: The Model

import xgboost as xgb
from sklearn.model_selection import GridSearchCV

model = xgb.XGBRegressor(
    n_estimators=800,
    learning_rate=0.05,
    max_depth=6,
    subsample=0.8,
    colsample_bytree=0.8,
    reg_alpha=0.1,     # L1 regularization
    reg_lambda=1.0,    # L2 regularization
    random_state=42
)

model.fit(X_train, y_train,
          eval_set=[(X_val, y_val)],
          early_stopping_rounds=50,
          verbose=False)

SHAP: The Explainability Layer

SHAP (SHapley Additive exPlanations) computes the contribution of each feature to each individual prediction. It's mathematically grounded in game theory — each feature's contribution is its "Shapley value."

import shap

explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_input)

# For each prediction, you get a per-feature contribution score
# Positive SHAP = feature increased the price
# Negative SHAP = feature decreased the price

The Flask API returns both the prediction and the top 5 SHAP contributors, formatted as human-readable strings: "Location quality adds ₹8.2L", "Age of building reduces by ₹2.1L".

Serving via Flask REST API

The model and explainer are serialized with joblib and loaded once at app startup. Each prediction request is:

Validated (required fields, correct ranges)
Feature-engineered (same pipeline as training)
Passed through XGBoost → price prediction
Passed through SHAP → explanation values
Formatted and returned as JSON

Average prediction latency: ~120ms including SHAP computation.

Need a machine learning model integrated into your product with real explanations for users? Let's talk.

Get In Touch

Rugved Chandekar ML Engineer • Explainable AI • Flask Backend • IEEE Author

GitHub LinkedIn

Rugved Chandekar