Big Data Consultant
TITLE: Big Data Consultant
LOCATIONS: REDMOND. WA
PROJECT DURATION: FULL TIME
WORK AUTHORIZATION : GC/USC/
* Build the Event Hubs integration with Service Fabric micro services implementation. Streaming the processed files from blobsinto EH for downstream processing.
* Anonymized files (~1000 of them and to a size of ~GB) will be given as input
* Service Fabric code portion will be provided.
* Build the Spark processing reading off EventHubs, implementation in either Python or Scala would suffice.
* Look at the caching needs; leverage .cache to retain appropriate results from Spark 'Actions' in Spark executors
* Our team will evaluate a set of data store that would be a landing spot post Spark - Blobs being a required one. We will pick 1 or 2 from this list -- SQL DW, Azure SQL DB, Cassandra and DocumentDB being other candidate stores and we will have code snippets and/or guidance
INTEGRATION & DEPLOYMENT
* Integrate the items from above with completed items (Azure Data Factory with ARM provisioning, picking up from the ADF pipeline which lands files onto blobs)
* Apply best practices for capacity planning, deployment for E2E
* Integrate the deployment with existing set of tools and processes.
* Build a unit test framework that can test each building block in isolation (ADF Blobs, Blobs Service Fabric, Service Fabric EH, EH Spark, Spark
* Build an E2E test environment with telemetry on latency, throughout with percentiles. *Leverage APM tools as appropriate