Autopilot - Now Available for usage. Try out python package.

Deep Variance Autopilot.

The end-to-end AutoML pipeline. Automatically infer types, clean data, engineer features, and train the best-fit ML or deep learning model. Powered by LLM-driven code generation and intelligent multi-model comparison.

$ pip install deepvariance-sdk

Automated pipeline stages

1 Call

From raw data to trained model

Model architectures ranked

Core Optimization

Integrated directly into your runtime to automate efficiency without changing your training code.

Intelligent Data Processing

Automatically infer column types, encode categoricals, and extract structural patterns to guide intelligent preprocessing decisions.

LLM-driven type inference
Categorical encoding
Correlation & MI analysis

LLM-Driven Preprocessing

Self-correcting LLM code generation handles missing data and feature engineering. Intelligent sampling selects representative subsets efficiently.

Missing data handling
Intelligent sampling
Automatic retry on error

Automated Model Selection

Evaluates and ranks classical ML and deep learning architectures per run, returning a full leaderboard and per-feature importance scores.

Ranked model leaderboard
Feature importance scores
Classical ML & deep learning

Pipeline Intelligence

Self-Correcting Generation

If LLM-generated code fails, the error is automatically fed back as context and retried up to 3 times, without any manual intervention.

Full Run Observability

Every pipeline run returns per-stage wall-clock timing, peak memory usage, and CPU metrics for complete visibility into the automated process.

pipeline.run(data, target="churn")

[1/7] AutoCastLayer ✓ 1.2s

[2/7] DataProfilingLayer ✓ 6.8s

[3/7] CorrelationLayer ✓ 14.2s

[4/7] SamplingLayer ✓ 3.1s

[5/7] PreprocessingLayer ↻ retry 1/3 → ✓ 22.7s

[6/7] ModelRecommendationLayer ✓ 8.3s

[7/7] ModelTrainingLayer ✓ 4m 18s

accuracy: 92.3% | f1_macro: 91.1%

peak_mem: 512 MB | total_time: 5m 14.3s

Python-first integration

Integrate Deep Variance into your existing training scripts with just a few lines of code.

Install via pip: deepvariance-sdk

Point it at your dataset and target column

Get a trained model, full metrics, and ranked leaderboard

train.py

from deepvariance.pipelines.ml import MLPipeline
from deepvariance.typings import PipelineConfig
import pandas as pd

data = pd.read_csv("customers.csv")
config = PipelineConfig(dv_api_key="dv_...")

# Run the full automated pipeline
pipeline = MLPipeline(config=config)
result = pipeline.run(data, target="churn")

print(result["metrics"])

Console Outputaccuracy: 92.3% | f1_macro: 91.1%

Works with your stack

Autopilot connects to any data source and runs on any infrastructure, cloud or on-premise.

Tabular & Image Data

CSV / TSV

Parquet

Excel / XLSX

PNG / JPEG

NumPy Arrays

HDF5 / LMDB

Cloud & Data Providers

Amazon S3

Google Cloud

Azure Blob

PostgreSQL

MongoDB

Snowflake

Extensible by design

Any data source, any format. Implement a simple interface and Autopilot handles the rest: batching, prefetching, and distributed loading included.

Custom S3-compatible stores
Proprietary database connectors
Streaming and real-time sources

# Implement one interface

class MyDataProvider(DataProvider):

def __len__(self) -> int: ...

def __getitem__(self, idx) -> Sample: ...

# Autopilot does the rest

ap.run(MyDataProvider())

Deploy Anywhere, Own Everything

Runs entirely on your hardware. Your data, your models, your infrastructure. No cloud dependency, no data egress, no vendor lock-in.

On-Premise Execution

The entire pipeline, from data profiling to model training, runs on your own servers. No data ever leaves your environment.

Zero Data Egress

Training data is never transmitted to external services. LLM calls carry only schema metadata and error traces, never raw records.

Sandboxed Code Execution

LLM-generated preprocessing code runs in a restricted sandbox. Only approved libraries are permitted with no arbitrary system access.

Bring Your Own LLM Key

Supply your own OpenAI or Groq API key via PipelineConfig. You choose the provider, you own the usage.

Full Audit Trail

Every pipeline run emits per-stage timing, memory usage, and CPU metrics: a complete, inspectable record of every automated decision.

Industry Agnostic

Works on any tabular dataset regardless of domain: healthcare, finance, retail, or manufacturing. No industry-specific configuration needed.

Request a Demo

See Autopilot in action on your data. Our team will walk you through a live session.

Autopilot

Optimemory

LLM Tuner

HyperRAG