Install Apache Airflow using Python PIP

image
admin July 5, 2022

Apache Airflow is an open-source Workflow management platform for data engineering pipelines.

Its original author is Apache Software Foundation works in Windows, macOS, and Linux written in Python.

Steps to install apache airflow in python using pip

Step 1: Create virtualenv

In this step, you have to create a virtualenv and install the required libraries.

pip install virtualenv
virtualenv my-project

Step 2: Install Airflow using pip install apache-airflow

Now you have to install apache-airflow in your virtualenv

pip install apache-airflow

 


 

Now check the airflow folder in your home directory and also check the dags folder in the airflow folder here you will place all your dags.

Step 3: Setup Airflow config according to your need

Look in the airflow folder for airflow.cfg file. Here, you can customize the variable, for example, if you don't want to load the example dags mentioned below then you can change the load_example variable to False.

 

 


 

 


Step 4: Create/Paste your dags in the airflow dags folder

Next, You have to paste/create your dags in the dags folder in the airflow directory.


 

Step 5: Create apache dags setup the python file

In this step, you will create dag setup file by defining your functions.

Sample Dags File

 
from airflow.models import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

default_args = {
    'start_date': datetime(year=2022, month=8, day=22)
    
}
def helloworld(){
    print("Hello World!")
}
with DAG(
    dag_id='new_dag',
    default_args=default_args,
    description='ETL pipeline'
) as dag:

    helloworld = PythonOperator(
        task_id='hello world',
        python_callable=helloworld,
    )
    
    helloworld
    

You can refer to the airflow document here for more information.

Step 6: Run airflow web server using

airflow webserver

Now run the webserver to list the dags on the web.

Step 7: Run airflow scheduler airflow scheduler

airflow scheduler

Now run the scheduler to run the dags on the web.

Step 8: Visit your localhost at port 8080 and you are good to go

Now run your dag or schedule it by clicking on the play button mentioned on the right.

 

Check logs by clicking on the logs block

 

 

 

python data_engineering apache_airflow setup