What We Can Offer
Job Description
Data engineers are mainly tasked with transforming data into a format that can be easily analyzed. They do this by developing, maintaining, and testing infrastructures for data generation. Data engineers work closely with data analysts and are largely in charge of technical solutions to enable more efficient and timeline data analysis
Data engineers are often responsible for building approaches, pipelines and algorithms to help give easier access to raw data, but to do this, they need to understand company’s or client’s objectives.
Data engineers also need to understand how to optimize data retrieval and how to develop dashboards, reports and other visualizations for stakeholders.
Reporting Line: Team lead
Responsibilities
- Develop, construct, test and maintain architectures.
- Align architecture with business requirements.
- Data collection: Collecting, measuring and analyzing.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and ‘big data’ technologies.
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Conduct research for industry and business questions.
- Deploy sophisticated analytics programs, machine learning and statistical methods
- Prepare data for predictive and prescriptive modeling.
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
Job Requirements
1. Know SQL, ER model, normalization, ACID Transactions.
2. Experience and deep understanding with relational SQL, NoSQL, Document data, Wide Column, Graph and Key value
- Relational SQL : MySQL, PostgreSQL, MSSQL, Oracle...
- Non-Relational:
Document: MongoDB, Elasticsearch..
Wide Column: Hbase, Cassandra, GG Bigtable..
Key-Value: Redis, Memcached, Amazon DynamoDB..
3. Data Warehouse: Experience with concepts data modeling like snowflake, Golden Gate, Google BigQuery, Dremio, Presto..
4. Know most modern data processing frameworks like Arrow, Spark, Flink, Kafka... beyond traditional data processing and workflow scheduling: NIFI, Airflow, ODI, SSIS...
5. Know most monitoring: Prometheus, Grafana, Splunk ...
6. Nice to have:
- Infrastructure as Code: Container, Infa provisioning
- CICD
- Visualize data.
- Machine learning: Terminology, TensorFlow, Pytorch...
- Machine learning OPS: TensorFlow Extended, Kubeflow, Seldon, Mlflow, GG AI Platform.
- Python and/or Java programming