Big-Data ML Automation

From variable selection to model comparison — automated end-to-end.

AutoGluon-based engine auto-decides classification vs regression, picks the right evaluation metric, and trains models. View feature importance, confusion matrix, and ROC in one place.

At a glance

Common use

Prediction prototypes · Feature importance · Baseline benchmarks

Outcome

1 categorical (classification) or numeric (regression) column

Engine

AutoGluon (LightGBM · linear models — lightweight ensemble)

Metrics

Classification: accuracy · precision · recall · f1 · roc_auc / Regression: mse · mae · rmse · mape · r²

Input data

CSV / XLSX, rows = samples, columns = variables

Plan

PREMIUM plan and above

Data preparation

  1. 1Tabular file (CSV / XLSX, ≤ 30MB)
  2. 2Rows = samples, columns = variables
  3. 3Numeric and categorical both supported (categorical auto-encoded)
  4. 41 outcome column — column type + unique-value count decide classification vs regression
  5. 5More samples = better learning

Classification vs regression is auto-decided from the outcome's type + unique-value count. If your outcome is 0/1 numeric, convert it to Categorical to be recognised as classification.

Workflow

  1. 1Variable cleanup + missing-value imputation + encoding
  2. 2Numeric variable distribution + scatter plot (EDA)
  3. 3Outlier detection + user-tuned removal threshold (Z-score / IQR)
  4. 4Multicollinearity removal via correlation + VIF thresholds
  5. 5Scaler selection (Standard / MinMax / None)
  6. 6AutoGluon auto-training (problem type · eval metric auto-decided)
  7. 7Leaderboard + feature importance + confusion matrix / ROC

Supported analyses

  • Variable EDA

    Numeric distribution / scatter + normality

  • Outlier detection + removal

    Z-score / IQR visualisation + user-adjustable

  • Multicollinearity removal

    Auto-remove redundant variables via correlation + VIF

  • AutoGluon auto-training

    Lightweight ensemble (LightGBM + linear) trained in parallel, leaderboard comparison

  • Feature importance

    Quantitative per-variable contribution visualised

  • Performance diagnostics

    Confusion matrix · ROC curve · residual plot auto-generated

Use cases

  • Customer churn prediction

    Predict churn from behaviour + payment patterns, surface the 5 most influential variables.

  • House-price regression

    Regress price on region · area · period, compare via MAE / RMSE.

  • Fraud detection

    Learn fraud patterns across many transaction variables, auto-prioritise recall.

What you get

  • Per-model leaderboard (classification: accuracy · F1 · AUC / regression: MAE · RMSE · R²)
  • Feature importance chart + table
  • Confusion matrix · ROC curve · residual plot
  • Best-model download (.pkl)
  • Auto-generated paper (preprocessing → modeling → results)