Burning Glass Technologies

Sr. Data Engineer

Remote USA Only Published 2 weeks ago

The Company

Burning Glass Technologies is a leading labor market analytics provider that has cracked the genetic code of an ever-changing labor market. Powered by the world's largest and most sophisticated database of labor market data and talent, we deliver real-time data and breakthrough planning tools that inform careers, help define academic programs, and shape workforces. Burning Glass' applications drive practical solutions and are used across the job market: by employers and recruiters in managing their talent needs, by educators in aligning programs with the market, and by policy makers in shaping strategic workforce decisions.

Burning Glass' tools help close the skills gap and inform educational needs for the emerging workforce. The company's products allow companies to better build and manage their workforces with the skills needed for transformative change. Based in Boston, Burning Glass is playing a growing role in informing the global conversation on both strategic workforce management and higher education, and in creating a labor market that works for everyone.

About the Position

As a Senior Data Engineer embedded within the data pipelines team, you will be responsible for developing next generation data pipelines that power Burning Glass' labor analytics platform. The Senior Data Engineer is challenged with driving the continued evolution of our big data systems, scaling these to billions of records, and leveraging advanced techniques like machine learning and computer vision.

Burning Glass' data platform is poised for enormous growth in usage and scope this year and has an ambitious roadmap in terms of new features as well as improved user experience. As a member of the multi-disciplinary data pipelines team, you'll have the opportunity to make key technical decisions to keep this platform moving forward.

Additionally, the data pipelines team functions as a gateway for Burning Glass' analysts, policy researchers, and data scientists to evaluating algorithms, data science models, and use-cases. In this capacity the Senior Data Engineer acts as an evangelist for big data systems developed and managed by this team. In leading this team, the Senior Data Engineer helps to define best practices around warehouse management, infrastructure management, and tooling to provide the best data experience for their stakeholders.

Primary Responsibilities

  • Provide technical leadership in data engineering, data lake and data warehouse design, QA, and testing
  • Design and implement scalable data pipelines in Apache Spark using PySpark and/or Scala
  • Optimize data pipelines for performance and scalability
  • Monitor and troubleshoot performance issues for production pipelines
  • Partner with the data science teams to implement machine learning models
  • Partner with the data collection teams to build interfaces between data ingestion and processing
  • Provide leadership to the team on best practices and architecture in big data systems and Machine Learning pipelines
  • Learn about new technologies and add to our Big Data tech stack
  • Mentor team members

Required Qualifications

  • 5+ years experience in data engineering
  • Experience building systems using Apache Spark that have processed hundreds of terabytes of data in production
  • Experience with automated testing for distributed systems in Spark (unit testing, end to end testing, QA, CI/CD)
  • Experience designing large-scale, evolving data warehouses including storage layout, data partitioning, schema evolution
  • Experience designing and implementing end-to-end pipeline architectures
  • Experience debugging performance issues in Spark pipelines, analyzing the Spark DAG, and implementing recommended enhancements to pipelines
  • Experience leveraging AWS cloud to build data pipelines (AWS Kinesis, AWS S3, AWS EMR, Snowflake)
  • Scala, PySpark and Python proficiency
  • Experience managing data warehouses in a production environment (Delta Lake/Hudi, Snowflake, Athena, Presto)
  • Experience with Relational databases and SQL

Preferred Qualifications

  • Experience productionizing data science models and algorithms to run at scale over terabytes of data
  • Experience with distributed data streaming frameworks such as Spark Structured Streaming, Apache Flink, Apache Kafka, AWS Kinesis
  • Experience extending Apache Spark (DataSource API, Catalyst Optimizer)
  • Experience using workflow management systems such as Airflow or AWS Step Functions
  • Experience with noSQL solutions such as MongoDB, DynamoDB and Elasticsearch
  • Experience with BI tools such as PowerBI, Tableau, and AWS QuickSight
  • Experience building systems with data governance
  • Experience as an open source contributor

Burning Glass Technologies is an Equal Opportunity Employer and prohibits discrimination and harassment of any kind. We're committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at Burning Glass are based on business needs, job requirements and individual qualifications, without regard to race, color, religion or belief, family or parental status, sexual orientation, gender identity or expression, or any other status protected by the laws or regulations in the locations where we operate. We encourage applicants of all ages.