Build ETL pipelines with embulk ?

ETL is called Extract Transform and Load.

Extract is process of extracting data from single or multiple sources. Transform is process of changing data to different state. Load is process of ingesting transformed data into different data source.

Extract process can have multiple data sources like Files, RDBMS database, API etc. Load process can also have multiple data sources as mentioned for extract process.

If we actually want to implement ETL then we can write ETL script or application which will extract data from source and later ingest transformed data to new source. To accomplish this we need to write program, do hardcore testing.

Embulk can be used to build ETL pipelines.

Embulk is open source tool which can transfer data between various data sources. It supports

  • Automatic guessing of input file formats
    • Input files format (for an example csv) will be automatically guessed by Embulk
  • Parallel & distributed execution to deal with big data sets
    • Multiple threads across multiple nodes can be used to speedup execution
  • Transaction control to guarantee All-or-Nothing
    • While loading data from single source to different source if embulk crashes or stopped then that change will not be shown to new data source
  • Resuming
    • If operation ends or crash or stopped in middle then it can be resumed.
  • Plugins
    • Multiple input and output plugins can be installed. New plugins can be developed and can be used with embulk


Embulk can grab data from any source as mentioned in image csv, aws s3, hdfc, mysql, salesforce etc and can be loaded into all mentioned sources.

Embulk can be extended with plugins. Plugins which can do data extraction known as input plugins and plugins which can do data ingestion known as output plugins. Data transform can be done with filter plugins.


This is simple embulk file which will load data from csv file and ingest to postgreSQL

     type: file
     path_prefix: "/tmp/data.csv"
     type: postgresql
     host: localhost
     user: root
     password: ""
     table: temperature
     mode: insert

embulk run load-sensor-data.yml will do ETL.

We can help you in Ecommerce developmentInternet of thing(IOT), Industrial Automation, Cloud, Big data & AI, Chatbot, Mobile app development, Alexa.

Contact us

Published by


DBA, DevOps, Big data, Data Analyst

Leave a Reply

Your email address will not be published. Required fields are marked *