Sharing Insights on How to Build a Great Data Team
Regardless of the industry you’re in, building a great data team is a key part of your data foundation and is the first step to executing on your vision. Whether that vision includes having world class analytics, sticky personalization, or a data environment that scales to petabytes of data, a great team of data professionals is required to make it happen. Important considerations when building a data team include both the individual skillsets needed as well as the structure of the team overall. In this article, I’ll lay out a few of the primary skills and organizational structures you’ll want to think through to in order to create a complete data team.
Skillsets within a data team
Before discussing roles or the overall organizational structure, it’s important to understand the different skillsets necessary to build a high functioning data team. A single role could take on multiple skills and some skillsets will be unnecessary for certain organizations or teams. Many organizations hire data scientists as a catch-all category for these skills. I’ll refer to these skillsets generically as the data science role. Even if you’re using the data science job category, it’s important to drill down and evaluate the specific skills your organization needs. Here’s a quick primer on key skills to consider for your data teams:
- Data Engineer: Data engineering is the skillset of building data pipelines and transforming data. Some companies refer to them as ETL (extract, transform, load) engineers, this skillset is critical to getting data to where you need it whether that’s a data lake or an operational data store.
- Data Analyst: Data Analysts typically specialize in reporting and providing important analytics and research to the company. Data Analysts are typically more of generalists compared to other roles.
- Machine Learning Engineer: Machine learning (ML) engineers specialize in building scalable machine learning models. This role can be further divided into individuals that specialize deeply in a specific type of ML such as natural language processing, deep learning, reinforcement learning, etc.
- Data Visualization: These individuals specialize in the visualization of data. In some cases, this could overlap with User Interface (UI) software engineer, but in many data organizations all roles are expected to support some level of data visualization.
Many organizations will also include additional support roles within their data teams depending on the needs of the organization. These roles can include product managers, project managers and engineering manager. Some organizations with specific use cases will also employ more traditional roles such as statisticians or econometrics, which have heavy overlap with the newer data science roles. The likely difference between someone in a data science track and a statistician or econometrician is the depth of training within a specific discipline.
With these data science roles, some organizations will seek a full stack employee that can own the entire process such as building data pipelines, performing analysis and development of machine learning models. Others focus on hiring specialists that focus on a specific area. There are vocal advocates for each approach. I’d recommend evaluating your organization to determine which is right for you. Above all, remember that your organization is unique and the data science skills you need to achieve your vision will require thoughtful planning.
Organizational structures
Now that you’re evaluating the specific skillsets you need, the next step is to conceptualize how to structure your data teams within your organization. There are different approaches with different benefits, however the two main flavors of organization structure are: embedded or centralized. Centralized teams are structured so all the data science roles report to a specific leader in the organization, typically, one that leads the data technology such as a Chief Data Officer. Embedded organizations distribute the data science role within their existing teams.
With the rise of Chief Data Officers and the specialization of data science skillsets, it has become common to leverage a more centralized structure. This can ensure quality across the organization and reduce duplication of technology and infrastructure. An alternative is to have all the data science roles report through a centralized leader but embed them throughout the organization. The goal of the centralized, but distributed model, is to ensure consistency and coordination while also allowing data scientists to work closely with the people closest to a problem.
A final option is to not create a data team at all. With the rise of AutoML and other cloud-based machine learning technologies, people have also begun assessing whether data scientists are still necessary. The core skillsets of knowing a domain, asking the right questions, and knowing how to use technology to solve a problem will remain, but perhaps your organization wants to jump ahead and seek to distribute data science skills within your larger engineering organization. The era of No-DS might not be here quite yet but is rapidly approaching. If you are considering a major investment in data science, it might be a good time to consider using that capital to teach numerous team members the basics of data science by leveraging a data science evangelist or a few data scientists that consult with numerous teams to help them implement their own solutions.
Regardless of whether you choose to leverage a full stack data scientist, an embedded or centralized team or try a No-DS model, it will be important to think critically about the your company, your vision and goals for data, and how your data team is structured for success. At HealthSparq, we originally grew organically with different teams adding people of overlapping skillset. After reviewing our data strategy and our existing capabilities, we formalized our structure around a series of data skillsets that work closely together including data analysts, ETL engineers and ML experts. We continue to review and evolve as our needs change, but our structure has allowed us to lay a strong foundation that will allow us to continue to grow.