Engineer, Data Job Description
Engineer, Data Duties & Responsibilities
To write an effective engineer, data job description, begin by listing detailed duties, responsibilities and expectations. We have included engineer, data job description templates that you can modify and use.
Sample responsibilities for this position include:
Engineer, Data Qualifications
Qualifications for a job description may include education, certification, and experience.
Licensing or Certifications for Engineer, Data
List any licenses or certifications required by the position: AWS, CCNP, CCNA, ITIL, II, IAT, GCP, AZURE, PMP, SQL
Education for Engineer, Data
Typically a job would require a certain level of education.
Employers hiring for the engineer, data job most commonly would prefer for their future employee to have a relevant degree such as Bachelor's and Master's Degree in Computer Science, Engineering, Technical, Statistics, Mathematics, Information Systems, Education, Business, Math, Science
Skills for Engineer, Data
Desired skills for engineer, data include:
Desired experience for engineer, data includes:
Engineer, Data Examples
Engineer, Data Job Description
- Build ETL pipelines, including data cleansing, to acquire data from a variety of internal and external data sources
- Interface with business customers to gather requirements
- Create self-service interfaces using business intelligence tools such as Looker to improve reporting and analysis processes, and increasing automation of data requests
- Create the strategic direction for all data needs as this fast growing, multi-billion dollar business scales
- Implement big data solutions for distributed computing
- Provide and influence Big Data design and architecture
- Manage data loading into the Hadoop ecosystem
- Development of Big Data set processes for data modeling, mining and production
- Perform Big Data modelling leveraging Hadoop database technologies
- Consult, collaborate, and recommend solutions for batch and streaming use case patterns
- Experience with Hadoop and Hive is a huge plus
- Strong technical understanding of data modeling, design, architecture principles
- Experience with the Hadoop ecosystem (HDFS, Map Reduce, Pig, Hive)
- Passionate video gamer with an in-depth knowledge of Blizzard games, products, and services
- Must have worked in in a fast paced environment supporting Batch and Real-time Data Analytics
- At least 2 years of experience with Spark Streaming, Storm, Flink, or other Stream Processing technologies
Engineer, Data Job Description
- Troubleshoot and correct problems discovered in Big Data databases
- Follow change management procedures and help to create policies and best practices for all Big Data environments
- Create and publish design documents, usage patterns, and cookbooks for technical community
- Design and develop world-class RESTful APIs to exchange data between multiple platforms, partners, and customers
- Analyze business requirements and develop long-term data warehousing strategies leveraging state of the art big data technology
- Follow best practices compliant to acquisition, storage, and analysis of data containing personally identifiable information (PII), and personal health information (PHI)
- Drive cross team design / development via technical leadership / mentoring - Understands complex multi-tier, multi-platform systems
- Manage execution plan for key projects
- Manage technical debt and task prioritization
- Building and maintaining data warehousing system, including injection, large scale transform, OLAP design, dependency management
- Past success being a forward thinker and an advocate for the customer
- Highly motivated, self-driven, capable of defining own design and test scenarios
- Background in Big Data and Machine Learning is a plus
- 2-5 years of experience in data engineering / business intelligence space
- Curious, self-motivated & a self-starter with a ‘can do attitude’
- Experience with big data platforms such as Hive/Hadoop/Spark
Engineer, Data Job Description
- Work with program managers, business partners and other engineers to develop and prioritize project plans
- Work with PMs to ensure data for IDL is collected by creating Groovy scripts and ensure the operation of the crawler software/hardware in the distributed cloud environment
- Work with Engineering/QA teams to ensure new crawling requirements can be supported by the existing Data Collection software or through feature enhancements
- Develop reports to track crawling status
- Develop sustainable data driven solutions with modern data technologies to meet the needs of our organization and business clients
- Ability to grasp new technologies rapidly as needed to progress initiatives, break down data issues and resolve them
- Build robust and scalable systems with an eye on the long term maintenance and support
- Create data models, including data warehouse schemas
- Interface with clients and technical peers to understand data and reporting needs
- Drive cross-functional design and development
- Experience with multi-lingual (international) NLP processing and tagging
- Experience with NLP applications such as tokenization, parsing, lemmatization, POS tagging techniques, Named Entity Recognition (NER) or Stanford NER (SNER)
- Experience in Topic mining using keywords, n-grams, and cosine based distance for relevance scores
- Experience with developing NLP applications such as sentiment analysis, topic modeling, text summary production
- Experience with developing NLP tools such as Machine learning, bag of words, parts of speech tagging, linked document analysis (LDA)
- Leverage foundational Data Infrastructure to integrate machine learning & statistical models into real time services and power the Visualization layer
Engineer, Data Job Description
- Contributes towards open source projects
- Presents coding/database topics to development peer group internal and external to work
- Use knowledge in statistics, advanced math, data mining, machine learning, natural language processing & information retrieval to build robust data solutions
- Implement and maintain ETL process, batch and streaming data pipeline
- Data Warehouse maintenance and metadata management
- Ensure data quality is adhered to during project delivery, and provide guidance to delivery team accordingly
- Works closely with teams and individuals on HDFS, Talend, Kafka, Spark, Greenplum, PostgreSQL scripting, has experience with real-time data replication, Data Quality and profiling to ensure quality of extracted data
- Understand and work with reporting tools like OBIEE & Tableau to drive an internal suite of products for the Finance community
- Identify specific success metrics for our internal services client programs
- Design and implement metrics, data storage, and reporting mechanisms
- Recent experience in SQL tuning, indexing, partitioning, data access patterns and scaling strategies
- Programming/Scripting experience in Java and/or Scala
- Scripting with Shell, Python, Ruby or similar
- Bachelor’s degree or higher in computer technology or related field
- Master's Degree in Computer Science or Computer/Electrical Engineering or related field
- 2 to 3 years experience in machine learning
Engineer, Data Job Description
- Develop SQL queries against a Greenplum (Postgresql) data warehouse
- Execute and manage process to make 3rd party data available internally for usage on direct sold media campaigns
- Execute and manage process to make NinthDecimal data available programmatically on our various channel partner platforms
- Provide segment size estimates to the programmatic sales team
- Proactively work with the programmatic sales team to ensure successful creation of the best possible data audiences for the client
- Maintain cadence of refreshing all audience profiles (both internal & external)
- Provide excellent support to our partners and customers
- Become a subject matter expert on how to move data to and from different platforms
- Assist with any new data related projects
- Build and support real-time, high availability ETL and data feed processes, primarily relating to email marketing campaigns
- 2+ years experience with any scripting language (Ruby, Python)
- Experience in web development, JavaScript Framework and programming languages like Java, Scala, C++, Python is a plus
- Knowledge on SQL Server Management Studio, GitHub, and MongoDB
- Hands-on implementation of projects is required
- Running Modern high performance analytical databases like RedShift, BigQuery, Greenplum and others
- Cluster Management - Handled Hadoop and or Spark clusters