Hi,
I have worked with very large clusters for the Hadoop (Spark, Storm, Kafka, Flume), and have also worked in different distributed databases including relational (MySQL, Postgres) and NoSQL (MongoDB, Hbase, Cassandra, Redis) databases.
Please find more about my previous experience:
Data Management Platform (DMP)/ AdServer
Developed a DMP infrastructure for supporting horizontally scalable advertising platform on DSP side and the DMP. It uses Hbase as the primary database, MongoDB for storing some metrics, and built using Scala. Kafka is being used for streaming real-time data from different trackers into the core of the system.
It also used the Spark framework for all the aggregations done on a cluster, hosted on AWS.
Also, aggregated various metrics on databases for the dashboard reporting.
Big Data Text Analytics App
We developed a proprietary tool for an UAE based university for text analytics. It has features like concordance, clustering, extracting features, LDA, topic classification and many others. The tool is being used internally for using on other analytics projects based on mining textual data. The tool is capable of handling 5TB+ of data.
It is developed using Spark (Using Scala), Hbase, Akka for message interfacing, and uses libraries like mlib, Stanford’s Core-NLP and several others. In the process, several of the algorithms were written to take the advantage of the distributed system.
I would be really interested to work on this project.
Thanks,
Neera