Credit Card Spending Habits Analysis

Clustering & Linear Regression Integration

Project Outcomes:

The purpose of this project is to analyze credit card data in order to identify trends that may cause low or high balances. To achieve this, we use clustering to separate people into groups based on their credit usage frequency, and then apply linear regression to determine whether these groups tend to accumulate high balances, as well as to identify other spending behaviors and trends.

We’ll be using k-means as our clustering model, since it scales much better than brute-force-esque models like agglomerative. In order to find how many clusters we should assign, we first should model inertia and silhouette scores.

The lower the inertia, the closer to the center of each cluster our points are. In other words, we want to pick a value for "K" with low inertia so that our clusters are well-defined and don't overlap.

Training, Assessing, & Choosing Linear Regression Models

Silhouette score measures the separation quality of our clusters. The closer the score is to 1, the better our points match with their assigned cluster compared to other clusters.

Taking into account inertia and silhouette score, it seems that a value of "K" equal to 5 is optimal, so we will be sticking to 5 total clusters.

We want to rename our clusters into groups based on spending habits. Let's see the average spending frequency of each cluster so we can properly assign our group names.

After examining the trends for each cluster, we can now name them:

- Cluster 0: Low Activity Users - Very little credit card utilization.

- Cluster 1: Structured Spenders – Tend to use credit for planned, large purchases made in installments (such as a TV). Less likely to impulse buy.

- Cluster 2: Impulse Shoppers - Very commonly make one-off purchases, with a moderate amount of installment spending as well.

- Cluster 3: Cash Advance Reliant - Tend to mainly use credit for short-term liquidity. This is very risky behavior!

- Cluster 4: Heavy Spenders - High credit usage across the board.

General Data & Insights by Cluster

Intermediate Conclusion

- Heavy spenders, on average, have the highest credit limit, payments, minimum payments, and balance out of the five clusters. Meanwhile, low activity and structured spenders tend to have the lowest values across these four metrics.

- Impulse shoppers tend to make payments well above their monthly minimums.

- For the four metrics of payments, minimum payments, and balance, the data is heavily right-skewed. This suggests that in each spending pattern cluster, a few individuals bring the numbers up significantly. In other words, a small number of credit card users utilize their credit far more heavily than the average.

- Median values for credit limit are close to the average. This suggests that credit limits tend to be somewhat similar for most people in each cluster (although the data is still somewhat right-skewed).

We train 3 regression models:

Model1 - MINIMUM_PAYMENTS → PAYMENTS: To see how the minimum payments of each cluster could affect the payments they actually make.

Model2 - CREDIT_LIMIT → BALANCE: To see whether the credit limit of each cluster influences their balance.

Model3 - MINIMUM_PAYMENTS + PAYMENTS → BALANCE: To see if the payments and required minimums of each cluster influence their balance.

Interestingly, the first model isn't very effective. The R² scores for each cluster are close to, and in most cases, below zero. This indicates that the first linear regression model does not explain the trends between MINIMUM_PAYMENTS and PAYMENTS. This indicates an almost non-existent relationship between the two.

Models 2 and 3, while not having the highest R² scores, have ones acceptable enough to explain some of the variance in the data, so we will be modeling & analyzing them while foregoing the use of model 1. (Note that we will not be displaying the trend line for Cash Advance Reliant entries in Model 3, since the R² score is negative.)

Insights via Linear Regression

Linear regression model showing the positive relationship between credit limit and balance (total debt on credit card) for each cluster group.

Linear regression model showing how monthly payments relate to total balance for each cluster. We hold minimum payments at the median so that the x-axis reflects how much customers pay beyond their usual minimum amount, rather than just the raw payment value.

As we can see, the Cash Advance Reliant and Heavy Spenders clusters had a stronger relationship between their credit limit and balance. This indicates that these groups are more likely to utilize close to all of their credit and rack up a large balance.

In comparison, the other 3 clusters have a weaker positive correlation, indicating that assigning a high credit limit to these groups will have a small-to-moderate impact on credit utilization.

Something to note is that payments beyond the minimum had a high correlation with balance for Structured Spenders. This is in line with what is expected, given the installment-based nature of the expenditures of this group. Something else to note is that for Heavy Spenders and Impulse Shoppers, the positive trend is much weaker.

This shows us that some individuals in these two groups are financially responsible and try to pay off their balances quickly, while others let their debt accumulate and have to adjust to high payments later.

This indicates an inconsistent level of financial responsibility within these two groups.

Note that the “Cash Advance Reliant” cluster was excluded here since the R2 score for that cluster on this model was negative, meaning the trend line was less effective than random variance at predicting the relationship between payments and balance. This indicates extremely high inconsistency and risk when it comes to trying to assess the reliability & responsibility of those who are dependent on cash advances.

Conclusion

Our analysis grouped customers into five spending behavior clusters and revealed clear financial differences between them. Structured Spenders showed consistent and responsible repayment habits, while Heavy Spenders and Impulse Shoppers had similar spending intensity but far more variable repayment patterns. Cash Advance Reliant users demonstrated the highest financial risk, with balances and payments showing no stable relationship.

Overall, repayment discipline is strongly tied to spending style: some groups manage debt predictably, while others display inconsistent or high-risk credit behavior.