From models to pipelines: the journey of a data engineer

John Hammerschlag
Data Engineer at Agilytic

As data becomes the most important pillar of modern businesses, many professionals find themselves rethinking their roles in the data world. John Hammerschlag from Agilytic is one of them. Drawn initially to data science for its thrill of discovery, John has since shifted his focus to data engineering—a path that, as he puts it, allows him to ‘build the road before the journey begins.’ 

Often, teams composed primarily of data scientists struggle to progress due to a lack of foundational data engineering—something essential for long-term data science success. In this candid interview, John shares how his curiosity evolved from analyzing data to structuring it, revealing a data engineering role's surprising challenges and rewards. His journey reflects a growing trend in the field, where understanding data’s foundations is just as vital as interpreting its insights." 

What initially drew you to data science, and how did your interests evolve toward data engineering?

I was initially fascinated by machine learning and loved working with models to derive insights from data. That curiosity is still there—I even work on some personal machine learning projects. But over time, I realized that insights are only as good as the data they’re built on. Many companies discover this the hard way—without strong data foundations, data science work is often stalled or unreliable.

I became interested in the whole data preparation side, the backbone work that ensures the data used is of high quality. This realization led me to data engineering, where I focus on creating a reliable data platform. I enjoy the idea of making data truly useful, almost like building the foundation before you start adding the layers of analysis. Plus, this shift aligns closely with what we see across various projects at Agilytic: data scientists often need to pivot towards data engineering to make meaningful progress.

How does the mindset or approach to work differ between data science and data engineering?

This difference in scope is where many organizations get stuck—they have data scientists focused on specific outcomes, but without the foundational engineering work, their efforts don’t scale. Data science is very project-based. You start with a dataset and work toward a specific objective, like predicting an outcome.

With data engineering, the approach is much broader. You’re not just looking at one project’s goal; you’re designing a pipeline supporting multiple use cases and departments. It’s about thinking ahead—anticipating future needs, understanding the entire data flow from source to utilization, and making sure the infrastructure can handle all of that smoothly.

So, you almost have to see the entire data ecosystem in one view.

In terms of daily tasks, what changed when you moved into data engineering?

My work shifted significantly. While working in data science, I spent a lot of time modeling and building algorithms. In data engineering, it’s more about building and maintaining pipelines. For instance, I now spend a lot of time in PySpark rather than the Python libraries typically used for modeling. My tasks involve setting up data sources, standardizing incoming data, aggregating it in ways that make sense for each department, and ensuring the pipeline runs smoothly. The end goal is to make sure everyone has clean, reliable data to work with.

What about the skills? Did the role shift require new skills?

Definitely. Even though I still use Python, now it’s more focused on PySpark for distributed data processing. SQL is also essential, especially when working with structured data across large datasets.

Data engineering also involves a lot of cloud work, setting up storage, computing resources, and managing connections between different data sources. It’s technical in a different way—less about model accuracy and more about optimizing data processing and storage.  

All these skills are critical for organizations that need to build scalable systems but lack this specific expertise.

Were there any misconceptions or surprises when transitioning from data science to data engineering?

The real surprise for many is that data engineering is a critical, long-term foundation, not a quick fix. One misconception many have is that data engineering leads directly to model building—it really doesn’t. Data engineers lay the groundwork. We ensure the data is ready and accessible, but we’re not typically the ones building models with it. That part is usually up to data scientists.

Many of our clients have similar misconceptions. They often think data engineering will directly speed up their data science output, but it’s more about creating the infrastructure to make those projects sustainable and scalable. Clients quickly realize the importance of data engineering when they face delays and inefficiencies in model deployment without it.

How does collaboration work between your team's data scientists and data engineers?

Collaboration is the central pillar upon we built all of our projects. We’re pretty integrated, so there’s always open communication if someone needs insight on a platform or a data engineering topic. And if I have a question about data modeling, there’s always someone to consult. We have dedicated channels for questions and discussion, so no one’s left struggling with a problem on their own. This back-and-forth helps us understand each other’s needs—data scientists can tell us what kind of data formats they need, for instance, so we don’t end up reworking things at the last minute.

For clients, this kind of close collaboration between data scientists and engineers can be transformative. Without proper collaboration, they often face repeated rework and inefficiencies. We’ve seen firsthand how bridging this gap can streamline data workflows and enhance project outcomes.

What unique value do data scientists bring to a data engineering team? 

They bring perspective on the final output—the end goal of what the data will be used for. This input is critical for us because it helps us structure the data in ways that make it easy to use for modeling. Without their insights, we’d risk delivering data that isn’t quite what they need, leading to inefficiencies and extra work. Their understanding of the data’s meaning and purpose adds a layer of business context that guides our setup.

Do you have a project you’re particularly proud of? 

I’d say my first data engineering project was one that left a strong mark on me. It involved creating a data platform in Azure and building a small web app in Python so the client could interact with and clean data directly. 

Eventually, we even implemented some forecasting models using Prophet and Darts. That project let me take on everything from data platform setup to model deployment. It was the perfect way to blend data engineering and data science skills.

What advice would you give someone looking to make the same transition? 

Always try to understand the broader impact of your work.

Getting caught up in the technical side is very easy, but knowing the end goals—the business needs and how clients will use the data—can make a huge difference. Many data scientists and even companies are caught up in technical details but miss the big picture of how data engineering supports scalable insights. Often, the simplest solutions—those that ensure quality and scalability—are the most effective. And remember, building solid data foundations may seem slower at first, but it accelerates long-term data science success.

Previous
Previous

1% certification - reflecting on our third year

Next
Next

After the hype: what kinds of intelligence should you invest in?