Data Automation System

Published:

A Big-data processing production system deployed over Cloud

DAS (Data Automation system)

Data Automation System is a resilient, autoscaling, near real-time Big Data processing automation systems based on Hadoop, Amazon Elastic Beanstalk, CloudBreak, Cascading and Java. DAS systems process thousands of campaign’s events & clicks data at the scale of Petabytes per day. We have used Hadoop MapReduce, Avro and Cascading modules for processing impression, click and event data.

DAS Architecture

  • DAS architecture is based on Server As a function
  • DAS has four module and each execute as a separate server
    • Data Pump : To copy log data into DAS
    • Data Filter : To Structure the data (Avro)
    • Data Summary : To Process the data
    • Data Reporting : Prepare customized business relevant reports

Technologies

  • Avro
  • MapReduce
  • Spark
  • Cascading
  • Java
  • Maven
  • Netbeans
  • Springboot