Twitter Analytics

  • Performed ETL (Extract, Transform, Load) on 1 TB of Twitter data using PySpark on AWS
  • Processed a massive collection of 200 million tweets into an efficiently designed MySQL structure, utilizing indexing and sharding techniques.
  • Developed a scalable microservice using Vertx to retrieve data from the backend efficiently.
  • Implemented an Elastic Load Balancer and Auto Scaling Rules in AWS to achieve a desired throughput of 100,000 requests per second.
  • Tech Stack: AWS, PySpark, Vertx, Java