Senior Data Engineer
PebblePost is the industry leader in next-generation addressable marketing, enabling brands to engage decision-ready consumers across the online and offline moments that matter via Programmatic Direct Mail.
Fueled by billions of shared 1st-party identity, intent, and transaction signals, PebblePost’s platform enables brands to quickly and easily engage addressable audiences with active purchase intent and measure performance across all points of sale with address-level accuracy. With these powerful audiences and analytics on their side, brands can build a sustainable marketing engine, creating impactful ways to engage consumers and fostering profitable growth with full-funnel solutions tuned to their data and goals.
About the role and the team
Our data engineering team architects data ingestion/pre-processing, data quality/governance and impactful machine learning pipelines. You will work in scalable ETL workflows as well as preprocessing data for machine learning models.
We run a cloud based stack with assets in both AWS and GCP. We use Kubernetes, Jenkins, Terraform, for our CICD infrastructure. We leverage React on top of a Springboot stack with PostgresQL as our primary relational database. The data engineering pipelines are based in S3 with a databricks/deltalake store with Scala/Spark being the tool of choice for our ETL pipelines running on Airflow/Jenkins and Kubernetes. We also have our proprietary data asset, the PebblePost Identity graph, that uses Neo4J and interfaces with Spark streaming. In addition we have real time streaming pipelines built using Kafka (MSK) for processing behavioral intent signals from our brand customers.
Our reporting infrastructure leverages our delta lake via a combination of tools including AWS Athena, Metabase and Sisense. We are continuously evolving our architecture to match our growing scale and recently incorporated graph databases and streaming pipelines to evolve our Identity solution.
The Senior Data Engineer reports to the Manager of Data Engineering.
- Become a core maintainer of our data lake with building and making available a searchable set of datasets for use by our broader business for intelligence, research and auditing/lineage use-cases
- Assist with key areas of back end development including data pipelines/ETL, testing and deployment tooling in order to build an efficient, high quality data engineering CICD pipeline
- Mentor junior engineers via a variety of techniques including peer reviews, pair programming and tech talks
- Partner with product managers, data scientists, business stakeholders, and fellow engineers to build high quality data assets while managing cross functional dependencies in an agile development framework
- Ideate, design, and implement key initiatives and end-user experiences that are used by hundreds of brands reaching tens of millions of customers
- Work on scaling our PebblePost Identity graph to enable us to provide accurate addressability connecting brands to users at home to deliver impactful mail in a privacy-safe, relevant manner
- Build out reporting and analytics frameworks that can allow large volumes of data aggregated and analyzed for insights to our brands for tracking the performance of their campaigns
- Work with our data science teams to help scale machine learning models with a MLOps mindset. There will be opportunities to productionize projects using both traditional ML as well as LLM/Generative AI based approaches
- 8+ years of professional experience working with a core language such as Java/Scala
- Built software assets in big data ecosystems such as Spark or Tensorflow
- Proficiency in all aspects of SDLC including CICD and automated testing
- Led projects building ETL and ML pipelines including being able to orchestrate more complex pipelines using DAG based schedules such as Airflow, Kubeflow, Mleap or Sagemaker
- Written maintainable code, automated testing and performed code reviews
- Experience in the AWS tech stack including use of Kafka, Lambdas, Glue, Athena, IAM and large scale data management formats and frameworks such as Parquet ORC, Delta Lake, Iceberg or Hudi
- Broken down product requirements into measurable engineering milestones, and owning them to completion with outcome driven six week roadmap planning increments
- Experience in data governance at scale with expertise ranging from Relational or NoSQL/Graph databases and working closely with analysts to deliver scalable insights to Business Intelligence teams
- Bachelor’s degree in Computer Science or related discipline
- Remote-friendly team
- Unlimited PTO policy
- Comprehensive medical, dental and vision plans
- Cell phone reimbursement program
- Flexible spending (FSA), health savings (HSA), and pre-tax commuter accounts
- Employee-based 401(k) program
- Additional voluntary benefit programs available such as life, critical illness, disability, employee assistance and additional buy-up options
The salary range is a reasonable estimate based on aggregate data for all US locations. Any offered salary is determined by a wide range of factors including but not limited to; candidate location, cost of labor, market data/ranges, internal equity, internal salary ranges, applicant’s skills, prior relevant experience, certain degrees and certifications (e.g. JD/technology, for example).
Salary range: Low: $175,000 - High: $200,000
PebblePost is an equal opportunity employer. All employment decisions are made without regard to race, color, age, gender, gender identity or expression, sexual orientation, marital status, pregnancy, religion, citizenship, national origin/ancestry, physical/mental disabilities, military status or any other basis prohibited by law.