We are a team of Data Scientists specialised in machine learning and big data. We are based in Bangalore.
I'm a Cloudera certified Hadoop engineer with hands-on experience in Hive, HBase, HDFS, MapReduce, Sqoop, Spark (Scala/Python), MySql, MongoDB, JSON and NoSql.
I understand the intricacies of a Big Data ecosystem and what to use when. I have experience in handling terabytes of data, processing it and pushing the mapreduced output to RDBMS databases for further analysis and visualization.
Skills & Capabilities:
1. Set up and configure a Hadoop cluster from scratch to match the capacity and processing needs of the project
2. Launch Amazon EMR (using either regular or spot EC2 instances) along with Bootstrapping steps that will pre-install necessary components (Sqoop, AWS CLI tools etc.). Use the EMR to process large volumes of data (from S3, Kineses/Kafka stream, Twitter API etc.)
3. Build Spark applications using Scala or Python
4. MapReduce (batch process) terabytes of data using Hive/Pig, store processed records in Hive tables and Sqoop the aggregated results to a relational database (such as MySql, Postgres, S3 bucket etc.)
5. Linux skills (Crontab, Shell scripting, SSH, package installation and configuration)
6. Setup and use Elastic Search cluster (non-HDFS) with appropriate sharding and replication. I have setup one for indexing and querying 1000TB of spoken/written language text data in Jan 2016