Dae Young Kim
About / Categories / Tagged
Creating and Retrieving a FlyteDirectory URL in Flyte
Flyte makes it easy to handle large datasets by treating directories as structured outputs using FlyteDirectory
. This allows seamless management of files without manually handling storage paths.
In this post, we’ll demonstrate the simplest way to create a FlyteDirectory and retrieve its URL.
1. Process and Save Data in a FlyteDirectory
import pandas as pd
from pathlib import Path
from flytekit import task, workflow, FlyteDirectory
@task
def generate_output_directory() -> FlyteDirectory:
"""Creates a directory, generates sample data, and returns a FlyteDirectory."""
# Define local and remote directories
local_dir = Path("output_data")
local_dir.mkdir(parents=True, exist_ok=True)
# Create a sample CSV file
output_file = local_dir / "sample_data.csv"
df = pd.DataFrame({"column_a": ["Value1", "Value2"], "column_b": [123, 456]})
df.to_csv(output_file, index=False)
# Return as FlyteDirectory with a remote storage location
return FlyteDirectory(str(local_dir), remote_directory="s3://your-bucket/output_data")
This function:
- Creates an
output_data
folder. - Saves a sample CSV file.
- Returns a
FlyteDirectory
that maps to a remote storage location.
2. Define and Run a Flyte Workflow
@workflow
def data_pipeline() -> FlyteDirectory:
"""Flyte workflow to generate and return a FlyteDirectory."""
return generate_output_directory()
# Run the workflow
result = data_pipeline()
print(f"Processed files saved at: {result.remote_source}")
This workflow:
- Calls the generate_output_directory task.
- Returns a FlyteDirectory containing the processed files.
- Prints the remote URL where files are stored.
Conclusion
This is the simplest way to create a FlyteDirectory
and get its URL. You can expand this to handle real-world data and integrate it with cloud storage solutions like AWS S3 or Google Cloud Storage.
© 2025 Dae Young Kim ― Powered by Jekyll and Textlog theme