Apache Airflow is a powerful open-source tool designed for orchestrating complex workflows and data pipelines. It enables you to programmatically schedule and monitor workflows, making it perfect for automating tasks like data processing and machine learning pipelines. With dynamic pipeline generation, robust scheduling, and monitoring features, Airflow has become one of the top tools in the data engineering field. To get started, install Apache Airflow on Ubuntu 24.04 and leverage its capabilities for streamlined workflow automation.
Prerequisites:
An Ubuntu 24.04 instance has at least 4 GB of RAM.
A domain(skynats.example.com) with an A record pointing to the instance’s IP address.
Step 1: Update Your System
sudo apt update
Step 2: Verify Python Installation
python3 --version
If its not installed, then run:
sudo apt install python3
Step 3: Install Required Packages
apt-get install build-essential libpq-dev python3-dev
Step 4: Create a Virtual Environment
Create a new Python virtual environment called airflow_env:
python3 -m venv airflow_env
Step 5: Activate the Virtual Environment
source ~/airflow_env/bin/activate
Once activated, your prompt should change to (airflow_env).
Step 6: Install Apache Airflow with PostgreSQL Support
pip install apache-airflow[postgres] psycopg2
Step 7: Install PostgreSQL
sudo apt install postgresql postgresql-contrib
sudo systemctl start postgresql
Step 8: Configure the PostgreSQL for Airflow
Access the PostgreSQL console:
sudo -u postgres psql
Inside the PostgreSQL console, create a new user for Airflow and set the password:
CREATE USER airflow PASSWORD 'yourpassword';
— Replace ‘yourpassword’ with your desired password.Grant the new user full privileges on all tables in the public schema:
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO airflow;
Create a new database for Airflow:
CREATE DATABASE airflowdb;
Grant the Airflow user ownership of the database:
ALTER DATABASE airflowdb OWNER TO airflow;
Grant the Airflow user all privileges on the public schema and exit PostgreSQL:
GRANT ALL ON SCHEMA public TO airflow;
exit;
Step 9: Modify the Airflow Configuration
If you don’t see your Airflow installation directory, initialize the database and start the scheduler to generate the necessary directories:
airflow db init; airflow scheduler
Stop the scheduler using CTRL+C, then open the airflow.cfg file:
nano ~/airflow/airflow.cfg
Find the following lines and modify them:
executor = LocalExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:YourStrongPassword@localhost/airflowdb
Replace Your StrongPassword with the password you set earlier for the PostgreSQL airflow user. Save and close the file.
Step 10: Initialize Airflow’s Metadata Database
Apply the configuration changes by initializing the Airflow metadata database:
airflow db init
Step 11: Create an Admin User for Airflow
Create an administrative user for accessing the Apache Airflow UI. Replace skynats with your desired username:
airflow users create \
--username skynats \
--password yourSuperSecretPassword \
--firstname Skynats \
--lastname User \
--role Admin \
--email [email protected]
Step 12: Start the Airflow Web Server and Scheduler
Start the Airflow web server on port 8080 in the background and redirect logs to a file. Start the Airflow scheduler in the background and redirect logs to another file:
nohup airflow webserver -p 8080 > webserver.log 2>&1 &
nohup airflow scheduler > scheduler.log 2>&1 &
Step 13: Configure Nginx as a Reverse Proxy
Step 14: Access the Airflow UI
Open your browser and navigate to your domain (skynats.example.com). You should be greeted by the Apache Airflow login page. Use the credentials you created earlier to log in.
Step 15: Secure Apache Airflow with SSL Certificates
Step 16: Now create and run DAGs using Apache Airflow.
Navigate to the Airflow web interface, locate your dags, enable it, and manually trigger it. You can monitor the DAG’s execution through the Graph View and Event Log.
If you need assistance or encounter any issues while following the steps to install Apache Airflow on Ubuntu 24.04, feel free to reach out to our support team. Our experts are ready to provide guidance and ensure a smooth installation process. Contact us today for professional support and get the help you need to successfully set up Apache Airflow on your Ubuntu system!