Seven Tips for Building a Scalable Data Foundation in Health Care
Today, 80 percent of health data remains unstructured and undigested. The health care sector is expected to produce 2,314 exabytes of data in 2020. One exabyte = one billion gigabytes. To put that into perspective, there are 2.8 billion monthly active users on Facebook and all of the photos, videos, comments, likes, ads and other data Facebook stores amounts to only 300,000 gigabytes[i].
Health care companies are investing significant amounts of money to develop and execute data strategies to extricate valuable information from the surging data ocean, while boosting organization-wide growth and success. Better use of Big Data can bring fundamental changes in the health care system, including improving outcomes and efficiency at lower costs. But many companies are finding that there is a disconnect between the investment they’ve made and the value being generated. In fact, even adoption of initiatives under way is proving to be complex, with 77% of executives reporting that “business adoption of Big Data and AI initiatives remain a major challenge[ii].”
In the race to innovate and maximize the value of data, many companies have failed to set up the right strategy and infrastructure to scale and meet the deluge of impending data demands – not only to accommodate additional use cases for their data, but to accommodate more types of data. This failure renders rapid innovation impossible and can lead to existing use cases becoming outdated. One of the biggest challenges for a company today is building a compelling and scalable data strategy that aligns with overall company goals. To address this issue, below are seven tips learned from experience (aka: the hard way), to help you set a solid foundation for your health care data products and avoid the pitfalls many teams face today.
Tip 1: Know That Data Silos Are Your Enemy
Data silos are the enemy of any data program. They prevent health care organizations from capturing a holistic view of the patient, their history, genomics, socioeconomic factors and other determinants of health. This could lead to inefficiencies in care delivery and management, potentially impacting patient outcomes. Many companies don’t have a data product vision and strategy, instead they have a vision for a product, which is powered by data. They assume the data will be there when they need it, without knowing the product specifics (which often come through user testing, feature enhancements, general iterative growth) nor having a data strategy in place. This leads to a reactionary process where data teams don’t know how they fit into the big picture and are left to react to constant requests. One of the most prevalent challenges this structure creates is data silos – which hurt teams now and hurt companies more later.
A common challenge with data silos is multiple teams solving for a common problem in unique or duplicative ways. This often leads to wasted effort by forcing multiple teams to manage the same process in different areas. Teams either spend a lot of time coordinating with each other to keep their processes in sync or they don’t coordinate, and the processes are out of sync. When the processes are out of sync, you’ll find different answers to the same question based on the team you ask. To avoid data siloes and the hazards associated with them, your data strategy should focus on data accessibility.
Tip 2: A Successful Data Strategy is Founded on Accessibility
In the past, numerous approaches have been adopted to address data silos and prove a data strategy. Enterprise data warehouses and data lakes became popular as ways to support data accessibility, but they can quickly grow into a change management nightmare or into the dreaded data swamp[iii]. Instead of investing significant resources in single large points of failure such as a data lake, a successful data strategy should focus on the culture and vision around data accessibility as well as investing in the underlying techniques.
Having data accessibility as a foundational principle of your data teams will influence how data projects are developed. When developing new products or processes, consider additional use cases early. By asking the team and additional stakeholders how they would use this data you can ensure you’re considering the longer-term vision of a product. In many cases, asking questions up front and making simple tweaks accordingly during early stages of the planning process can lead to a lot more long-term value. Without a vision for the future, team members will make decisions that make sense in the short term but box the company into data silos or one-off use cases in the long term.
Tip 3: A Vision is Not a Roadmap
When developing a vision, it’s important to differentiate between a vision and a detailed roadmap. In many organizations, someone will say they have a vision when they instead have detailed requirements or a list of predefined features to be executed. When you believe you already know all the components that need to be developed, you can become rigid and avoid learning about what’s working as you go. Instead, a product vision should represent the core sense of the product, what you aim to achieve, the opportunities and the threats.
Tip 4: Don’t Fail at Failure
While the larger vision is important, it’s also important to support fast iterations and low-cost failures in order to determine the elements to achieve the vision. Product management literature has long focused on how to iterate fast for traditional user interface (UI) based products, but the concept has been slower to develop around data products. Many health care companies aren’t leveraging iterative methodologies and tend to want to solve all edge cases in the first pass, lacking focus toward their goal. As a result, the health care sector has often had slower and more costly development cycles than other industries, without learning and building a better product for their users along the way. When developing data products, we typically find stakeholders have numerous ideas. It’s important to be able to quickly test those ideas to determine whether they are viable, how you’d want to solve it, whether the use case requires additional work, and to identify the edge cases. Having a method to perform numerous rapid prototypes is key to scaling and increasing the development cycle’s velocity. If instead you attempt to productionize all your ideas, you’ll get bogged down with edge cases or trying to scale something that doesn’t have value.
Tip 5: Invest in the Right Underlying Infrastructure
A key to both rapid prototyping and scaling data products once they have been vetted is investing in the underlying infrastructure. When your data teams are forced to use a single point solution such as an enterprise data warehouse for all projects, they can be boxed in and feel they aren’t able to deliver features that meet expectations. Leveraging cloud technologies such as Google Cloud, Amazon Web Services or Microsoft Azure is a great way to allow your teams to take advantage of the numerous efficiencies gained in this area and focus the team on developing data products rather than managing infrastructure.
Tip 6: Accessibility, Security and Data Privacy Are Not Mutually Exclusive
As a part of investing in the underlying infrastructure, it’s important to build with both accessibility and security in mind. Particularly in health care, which was plagued by more breaches than any other sector in 2018, accounting for 25 percent of incidents[iv]. Accessibility, security and data privacy are not mutually exclusive. Avoid the trap of thinking that if something is hard for you to access, it is secure or that because something is easily accessible to the people that need access, it is somehow unsecure. Design and invest in infrastructure that can have fine grain access and permission controls. It should be easy for users with permissions to consume and leverage data while being inaccessible to those without permissions. Those users with permission to access and build off the data should have a walled garden that allows them to work quickly but without allowing them to do things their permissions don’t allow. If you build with security in mind, you can also be sure that you are minimizing risk at every step.
Tip 7: Get Hip, Stay Fresh and Keep Learning
Who doesn’t love new things? One of the best things about working at the forefront of data products is that new tools and techniques can be used. A key element to allowing your data teams to work quickly while emphasizing accessibility and security is to invest in techniques like data anonymization, synthetic data and differential privacy. Supplying your data teams with strong tools for generating synthetic data and allowing your teams to develop against synthetic data that they can manipulate can reduce the risk of data exposure while also ensuring they test against more edge cases. Data anonymization can also be a tool for allowing more users to access data for certain use cases without exposing personally identifiable information. Data anonymization can be especially useful for analytics teams to perform reporting and research. If your reporting and analytics team doesn’t need to know actual users, providing them access to anonymized data or via differential privacy can be a great method to meet their needs while protecting an individual’s privacy.
By focusing on data accessibility and setting up the right infrastructure from the beginning, health care companies can identify viable data products more quickly and bring them to market faster. If harnessed effectively, health care data has the potential to lower costs, reduce hospital re-admissions and emergency department visits, prevent adverse drug effects, improve care coordination, and much more. With improved data sharing, heath systems, health plans and other organizations can offer consumers more innovative care, better experiences, and change the way we think about health care.
- [i] https://research.fb.com/facebook-s-top-open-data-problems
- [ii] http://newvantage.com/wp-content/uploads/2018/12/Big-Data-Executive-Survey-2019-Findings-Updated-010219-1.pdf
- [iii] https://www.collibra.com/blog/data-lake-vs-data-swamp-pushing-the-analogy
- [iv] https://www.businessinsider.com/why-healthcare-data-breach-epidemic-will-intensify-2019-4