Big-Data ML Automation
From variable selection to model comparison — automated end-to-end.
AutoGluon-based engine auto-decides classification vs regression, picks the right evaluation metric, and trains models. View feature importance, confusion matrix, and ROC in one place.
At a glance
Common use
Prediction prototypes · Feature importance · Baseline benchmarks
Outcome
1 categorical (classification) or numeric (regression) column
Engine
AutoGluon (LightGBM · linear models — lightweight ensemble)
Metrics
Classification: accuracy · precision · recall · f1 · roc_auc / Regression: mse · mae · rmse · mape · r²
Input data
CSV / XLSX, rows = samples, columns = variables
Plan
PREMIUM plan and above
Data preparation
- 1Tabular file (CSV / XLSX, ≤ 30MB)
- 2Rows = samples, columns = variables
- 3Numeric and categorical both supported (categorical auto-encoded)
- 41 outcome column — column type + unique-value count decide classification vs regression
- 5More samples = better learning
Classification vs regression is auto-decided from the outcome's type + unique-value count. If your outcome is 0/1 numeric, convert it to Categorical to be recognised as classification.
Workflow
- 1Variable cleanup + missing-value imputation + encoding
- 2Numeric variable distribution + scatter plot (EDA)
- 3Outlier detection + user-tuned removal threshold (Z-score / IQR)
- 4Multicollinearity removal via correlation + VIF thresholds
- 5Scaler selection (Standard / MinMax / None)
- 6AutoGluon auto-training (problem type · eval metric auto-decided)
- 7Leaderboard + feature importance + confusion matrix / ROC
Supported analyses
Variable EDA
Numeric distribution / scatter + normality
Outlier detection + removal
Z-score / IQR visualisation + user-adjustable
Multicollinearity removal
Auto-remove redundant variables via correlation + VIF
AutoGluon auto-training
Lightweight ensemble (LightGBM + linear) trained in parallel, leaderboard comparison
Feature importance
Quantitative per-variable contribution visualised
Performance diagnostics
Confusion matrix · ROC curve · residual plot auto-generated
Use cases
Customer churn prediction
Predict churn from behaviour + payment patterns, surface the 5 most influential variables.
House-price regression
Regress price on region · area · period, compare via MAE / RMSE.
Fraud detection
Learn fraud patterns across many transaction variables, auto-prioritise recall.
What you get
- Per-model leaderboard (classification: accuracy · F1 · AUC / regression: MAE · RMSE · R²)
- Feature importance chart + table
- Confusion matrix · ROC curve · residual plot
- Best-model download (.pkl)
- Auto-generated paper (preprocessing → modeling → results)