Project Management for Successful Data Science
Avoiding pitfalls and following best practices for data science project management.
The number one goal of any data science project is to solve a problem or answer a question. However, without proper guidance, discipline, and control, these projects can wander off the beaten path until they eventually fizzle out and are deemed a failure. Applying project management methodologies to data science projects ensures they are successful and useful.
This article will discuss common pitfalls during the project management process, some best practices to follow, and Agilytic's top 3 project management tips.
Data Science Project Management Frameworks
There are a plethora of project management frameworks to choose from. Data science teams have adapted some frameworks from different industries (such as Kanban), and some that have gained vast amounts of popularity over the years (like Scrum).
However, not all project management frameworks are well suited to data science due to their iterative, constantly evolving nature, with a unique approach for each new project, industry, and problem.
This makes it incredibly difficult to stick to a rigid system in data science. This is why you will often see these teams adopt hybrid approaches, depending on their preferences and the nature of any given project. These hybrid approaches are also applied quite loosely as you cannot follow a cookie-cutter template for every project.
However, knowing that there is a high-level lifecycle or workflow is helpful. While each project may be unique in the solution and approach it takes to solve the problem, having a set of checklists for projects is essential so that critical steps in the process are not overlooked or forgotten.
One of the most useful project management frameworks for data science teams is TDSP (Team Data Science Process), which forms the basis for project management at Agilytic.
TDSP combines Scrum and the CRISP-DM methodologies into a framework that reaps the benefits of each while discarding some of their drawbacks. Of course, it could be better, but it provides an excellent project management guide.
Common Pitfalls when Managing Data Science Projects
Client Expectations
There is often some discrepancy between what a client expects and what data scientists can realistically deliver for a project. As the old saying goes: "you don't know what you don't know". Clients (or stakeholders) come to you with a problem and may have some idea of what they would like the solution to look like. However, as the data scientist, you are the expert. You know what data science is capable of based on the information and data you have for the project. This is where project management becomes vital to the success of the project.
Before starting any work, everyone involved in the project must be clear on exactly what will be done and what the final deliverable will be. This is developed jointly with the client so that they know exactly what to expect and the team knows exactly how they will deliver the final solution. As the project progresses, it is even more important to check in with the client regularly so that the project remains on course.
Managing Uncertainty
One of the most notorious problems in data science projects is that nothing is guaranteed. For example, it is impossible to guarantee 100% accuracy in machine learning models, and much of the work that data scientists can do depends entirely on the quality and quantity of the data.
If there are much lower volumes of usable data than expected or more data cleaning and preparation are needed before work can even begin, this will severely impact the final solution, timing, and cost of the project.
This is just one example of uncertainty during a data science project. There are many aspects to developing a solution, and each can introduce even more uncertainty to the project. However, it is with effective project management that you can identify risks timely, define mitigating actions, and transparently communicate these risks to the team and the client.
Scope Creep
After spending weeks or months developing a solution for a project, you may end up in a situation where the client wants something you did not agree on at the start of the project. This could be because new insights came to light as the data scientists explored the data, or the client changed their mind about the direction they wanted to take.
In these situations, it is important to have some flexibility. As we have already discussed, we cannot make any guarantees in data science, and these projects are constantly evolving. However, the key here is to understand where to draw the line. For example, suppose the changes significantly affect the budget or the team's ability to deliver according to your agreed-upon timeline. In that case, project managers must be upfront and transparent about it.
Communication
Communication is a topic that is often overlooked or not even prioritized in data science projects. Usually, data scientists want to get stuck into the very thing they're passionate about. So it is no surprise when communication isn't at the top of their to-do list.
However, successful data science projects are highly dependent on good communication. It is almost impossible to develop a solution if you have no idea what the client wants or if the client doesn't know how your solution contributes anything useful or valuable to them.
We can solve all of this with a few good communication practices which form the core of any project management approach. Additionally, it's a good idea to have a separate project manager following and enforcing these practices so that your data science teams can stay focused on their primary function.
Data Science Project Management Best Practices
Create a Checklist
While the exact detail and nature of every data science project are different, there are some common elements. For example, every project should start with a planning phase where proposals and scope documents are drafted and sent to the client. After that, data scientists kick off their work by extracting and exploring the data - developing a good understanding of the data, looking for any abnormalities they need to clean and address, and so on.
It's helpful to create a checklist of items that are in common for most of the projects your team handles. This checklist will serve as your guide so that you never miss a step along the way. Of course, you may not always follow this checklist strictly, but it can offer some guidance and act as a foundation so that you do not start each project entirely from scratch.
Decide on Deliverables First
Every project should start with a planning phase. This is when the data science team and the client understand the problem or question being addressed and the solution that will be developed.
However, the solution's most critical element is the project's exact deliverable. What does the client get at the end of the project, and how does this deliverable help the client to move forward? For example, is it an API that gets integrated into their product, improving efficiency or sales? Or does it facilitate effective decision-making so the client can take the next step in a journey toward another project?
Getting ultra-specific on this deliverable and the expected result right at the start of the project will help the team and the client to understand precisely where the goalposts are and what it will take to get there.
Set a Communication Cadence
As we have already discussed, communication is the backbone of every successful data science project. So it is no surprise that almost every project management framework has its own rules or processes to address this component. However, each framework has a regular and consistent communication process in common.
Set a communication cadence that your team and the client can stick to regularly and keep consistent. At the very least, you should meet with the client once a week. Additionally, a written update on the project's progress to all stakeholders will complement the oral communication.
In addition to giving the client regular updates, it is also beneficial to have recurring internal team meetings. These are usually labeled in the traditional 'scrum' methodology as ‘stand-ups’ with the format: what you did yesterday, what you're going to do today, and what risks or issues you experienced.
These stand-ups allow everyone to stay up-to-date with one another's progress. This is also an excellent time to bounce fresh ideas off each other and discover something significant or insightful to move the project forward. It should also be a safe space for team members to raise issues that could significantly impact the project.
These brainstorming sessions don't only have to take place during the stand-up. It can also help to take a step back at any time of the day and bounce some ideas off the team. Remember to consider the power and value of shared knowledge.
Be Transparent
Whenever you experience an issue or identify a potential risk at any time during the project, the best practice is to discuss it and bring it up with the client as soon as possible.
Hiding potential risks or issues from the client negatively affects the team's ability to deliver the final solution and leaves a bad taste in everyone's mouth. Therefore, transparency should be a core ethos in project management.
This practice also applies to any changes a client wants to make that deviate from the project's initial scope. If a client requests any changes, it is good practice to communicate clearly what you will (and will not) achieve due to the change.
Agilytic's Top 3 Project Management Tips
Projects always take more time and resources than you think. Always assume uncertainties and difficulties will arise and plan accordingly.
Allow buffers in your calendar. Adding small pockets of time between meetings gives you time to decompress or take the following quick action on a project immediately following the meeting.
When in doubt, remember why you are here. Regularly take a step back on what you are doing to brainstorm with your team or to realign yourself with the goals and direction of the project. This is especially beneficial for highly complex or intensive tasks. It's easy to get stuck in a rabbit hole only to emerge and discover that you were going in the wrong direction all along.
Written by Joleen Bothma