Run TimeGPT in a Distributed Manner on Spark
Spark is an open-source distributed compute framework designed for large-scale data processing. With Spark, you can seamlessly scale your Python-based workflows for big data analytics and machine learning tasks. This tutorial demonstrates how to use TimeGPT with Spark to perform forecasting and cross-validation.
1. Installation
Fugue provides a convenient interface to distribute Python code across frameworks like Spark.
To work with TimeGPT, make sure you have the
Install
fugue with Spark support:pip install fugue with spark
nixtla library installed as well.2. Load Data
Load the dataset into a pandas DataFrame. In this example, we use hourly electricity price data from different markets.
load electricity price data
3. Initialize Spark
Create a Spark session and convert your pandas DataFrame to a Spark DataFrame:
spark session creation and conversion
4. Use TimeGPT on Spark
Key Concept
Using TimeGPT with Spark is very similar to using it locally. The main difference is that you work with Spark DataFrames instead of pandas DataFrames.
Using TimeGPT with Spark is very similar to using it locally. The main difference is that you work with Spark DataFrames instead of pandas DataFrames.
TimeGPT can handle large-scale data when distributed via Spark, allowing you to scale your time series forecasting tasks efficiently.
Create a NixtlaClient Instance
Create a NixtlaClient Instance
NixtlaClient initialization
If you need to use an Azure AI endpoint, set the
base_url parameter:NixtlaClient with Azure AI endpoint
Forecast
Forecast
forecasting with NixtlaClient on Spark
When using Azure AI endpoints, specify
model="azureai".AzureAI model usage example
The public API supports two models:
timegpt-1 (default) and timegpt-1-long-horizon.
For long horizon forecasting, see
this tutorial.Cross-Validation
Cross-Validation
Perform cross-validation with Spark DataFrames:
cross-validation example
For including exogenous variables with TimeGPT on Spark, use Spark DataFrames instead of pandas DataFrames, as demonstrated in the
Exogenous Variables tutorial.