The frustration of finishing a great show on a streaming platform and not knowing what to watch next is an all-too-relatable inconvenience of the 21st century. Seemingly like magic, a recommendation algorithm will pop up to offer a range of suggestions based on your previous viewing history, allowing your weekend viewing binge to continue with no more than the click of a button.
In 2021, Netflix utilized 1,300 recommendation clusters built through collected data from their 209 million subscribers’ viewing preferences. According to Netflix, over 80 percent of subscribers’ watch choices are selected from recommendations created by those algorithms.
With these dashboards and algorithms so standard in our daily lives, it is hard to disagree that data is now a company’s most important asset — if it knows how to use it. Taking the huge quantities of data that a company already collects and using it to create a product or feature that provides a benefit to customers is not only an amazing opportunity for innovation as a business, but an almost surefire way of increasing the overall value of the data itself.
Smarter Sorting Chief Data Officer Russ Foltz-Smith explained, “Our data products emerged like all great data sources — from raw, messy data and the gnarled hands and brains of analysts brave enough to think, type and convert it into useful intelligence.”
According to 2020 research by Insights Association, tech is the fifth industry most served by big data and analytics, and the U.S. Bureau of Labor Statistics predicts that jobs in the data science field will grow by 28 percent by the end of 2026. The appetite for data-driven products is at an all-time high — and it’s not just your favorite streaming platform that has worked it out.
Built In Austin talked to four industry leaders about how they have taken the data they collect and turned it into a new product to provide value to their customers, as well as the challenges they’ve overcome on their way to success.
When did you first realize that your data may have some untapped value?
CCC started with a car valuation product for auto insurers in 1980 and has been a data company ever since. Today, we process over 13 million auto physical damage claims and over half a billion photos every year. AI technology has further enabled the use of data to power key business decisions and processes.
Our very first deep learning model helped us realize what could be achieved by training our AI with photos. With just a single photo, the model was able to predict the outcome of whether a vehicle was a total loss or not. This was the “aha moment” for us that opened the door to new possibilities.
We recently launched our first straight-through processing product that helps insurance carriers estimate damages in seconds, allowing drivers to advance quickly to their next steps like repair scheduling or settlement. With the CCC Estimate-STP product, an AI-powered line-level vehicle damage estimate is generated in real-time. This has been an aspiration for many companies in the insurance industry for several years.
How did you bring this product to life? How did you collaborate with other teams to do it?
It has been an exciting journey. Everyone involved in the product’s development contributed to its success and the collaboration across teams was critical to helping us realize the vision. Straight-through auto claims processing had never been done before, so creating a line-level estimate from photos was certainly challenging. Even more challenging was orchestrating the entire workflow to enable a touchless experience.
A large team of product managers, engineers, data scientists, business analysts and program managers have worked on this product for over a year to bring it to market. Having been with CCC for a long time certainly helped in connecting the dots with many of our core product capabilities like mobile, parts, audit, workflow and other solutions needed to enable this seamless digital experience.
The core team would meet at a regular cadence to discuss their various dependencies, gaps, challenges and plans. A larger go-to-market team came together to implement multiple customers, enable their configurations and workflows, and troubleshoot scenarios. This rigor enabled us to act on market and internal feedback swiftly.
The collaboration across teams was critical to helping us realize the vision.”
What’s the biggest technical challenge you faced along the way? How did you overcome it?
Producing a line-level estimate from photos and claim data was certainly challenging. We had to get to the very core of our estimating solution and understand how to integrate AI solutions. Vehicles are getting more complex with changes in design and the interplay of parts could be different from one vehicle model to another. A damage could have a cascading effect on multiple parts and operations, so understanding this interplay of parts by vehicle model is very complex.
This complexity required combining the disciplines of engineering, data science and vehicle repair subject matter experts. We identified multiple areas of research, experimented with many iterations and evaluated the results from the perspective of the different disciplines. We ran regression tests on the entire product to measure its performance and ensure its readiness.
Equally important was building in controls to allow the insurance carriers to be able to use or discard the predictions based on confidence levels — this framework is key.
When did you first realize that your data may have some untapped value?
The company premise of Smarter Sorting is that physical products lacking data are a form of waste. Consumers and businesses add to that landfill every day because retailers, suppliers and governments don’t have the data about what goes into making millions of consumer products, or how to dispose of them. As a society, we have lots of data on where to buy things and how much to pay, but we have almost no data on what to do with the products we buy when we no longer want or need them.
Smarter Sorting began because we recognized the value of knowing the chemical and physical properties of consumer products. By accurately understanding product composition, we can show companies how to ship those products in the most efficient way, minimizing the greenhouse gas footprint of transportation. We help retailers reduce waste by identifying which unsold products could be donated and if that’s not possible, how they can be disposed of in the least harmful way.
Smarter Sorting’s data has untapped value so long as consumer packaged goods are contributing to climate change. We need to slow and then reverse the negative impacts of these goods on the environment.
As a society, we have lots of data on where to buy things and how much to pay, but we have almost no data on what to do with the products we no longer want or need.”
How did you bring this product to life? How did you collaborate with other teams to do it?
Smarter Sorting crawls the web, uses data from retailers and brands, gathers up government databases and directly measures data from the physical world. We pour each dataset into a data lake where machine learning and statistical analysis can go to work. In just the past six years, we’ve developed over 60 unique machine learning-based algorithms and a set of interlocking product classification properties addressing over 7,100 unique regulatory conditions. Every second, our people, processes and probability converge to create order and address the questions we need answered.
Our teams comprise consumer product goods SMEs, library scientists, computational chemists, regulatory experts and software engineers. We work in agile processes to explore data, devise data pipelines, craft MLOps and serve up intelligence APIs across our thousands of retail and supply customers.
What’s the biggest technical challenge you faced along the way? How did you overcome it?
The biggest technical challenge Smarter Sorting faces every day is the messiness and sprawl of the data. There are millions of consumer goods products, often mislabeled and misattributed in all manner of fascinating ways. No ETL process remains stable week-to-week without careful management between people, processes and probabilistic reasoning.
The usual way of solving data messiness is to restrict the types and formats that you’re willing to handle. Smarter Sorting doesn’t place these restrictions on our customers — we are defined by and valued on our ability to handle data others aren’t able to. We have developed some clever algorithms to help us, built our experience at scaling data platforms wide and deeply, and have learned the time-saving shortcuts and hacks. These were all hard won knowledge efforts. We worked long, hard hours to build databases and information processes that would only start paying off at scale. We also know that to get to the next level of scale, we will have to start that cycle all over again.
When did you first realize that your data may have some untapped value?
Imprivata had a classic random forest classifier in place for years before anyone on the current data science team worked here. We knew that our end users didn’t love the product because the machine learning model was just a black box. We also knew that we had the right data streams to create histogram distributions of interesting features quickly and easily. We could make an easy to understand anomaly detection algorithm without the data engineering team needing to do anything other than hook up our Jupyter notebooks to the demo data box. So, we just went for it. We started with a proof of concept that our stakeholders got excited about and, most importantly, we got a lot of great feedback from our end users.
We meet with stakeholders at the end of each sprint so we’re always getting good feedback.”
How did you bring this product to life? How did you collaborate with other teams to do it?
We always begin these projects by discussing our ideas for interesting features with stakeholders. The data science team will then flesh out a feature in a Jupyter Notebook and work with demo data. Once the feature looks good, we implement it into our application’s code base. The feature developers handle most of the front-end implementation, so we work in two week sprints — staggered with the feature dev team — to easily pass off any front-end work to them. We meet with stakeholders at the end of each sprint so we’re always getting good feedback.
What’s the biggest technical challenge you faced along the way? How did you overcome it?
I had been using Altair to make plots in my Jupyter Notebooks for months before we started on this project, but our application’s plots are all written in JavaScript. This meant that we had to either get our application to work with Altair or the data science team would all need to learn Java. Luckily, the feature dev team understood that it was quite impractical to rewrite all of our plots after many months of working on the project. It took a little bit of time to get Altair up and running smoothly in our application, but now that it is, a feature developer is able to implement an Altair plot without really needing to learn Altair themselves. It’s been a great switch.
When did you first realize that your data may have some untapped value?
From early on, we’ve had our suspicions that the metadata and periphery information of our product would have it’s own value. It’s something that people are a lot more cognizant of than they once were. The use cases for this kind of data is very close to a truism that drives Banyan’s value proposition — water costs are reliably rising. As this scarce resource goes up in price, so will any periphery data that helps people understand, minimize or avoid costly water utility usage.
An example that shows this is an alert we provide that warns users of an imminent change in their water usage pricing based on their system’s live usage and their tiered utility rates. As a property’s usage approaches a higher tier of cost, we will alert key personnel so that they can act to avoid or minimize usage at that higher cost.
This functionality had us cataloging and codifying these usage tiers utility by utility and, lo-and-behold, we developed an encyclopedia of North American water rates! We also had a general understanding of how our users fit into those water rate categories in the context of our other users. These all have their own way of providing value.
How did you bring this product to life? How did you collaborate with other teams to do it?
Our CEO has a vision of helping businesses save water and we’ve enjoyed working as a team to make it happen! We’ve built a centralized IoT platform with different levels of permission, authority and interaction. Post set-up, the system will manage itself in automating schedules, sensing anomalies, reporting usage and more.
The product, design and software teams at Banyan all get to enjoy open access to our different user groups and leadership in defining and building the product, which makes for a really fun and nimble environment. I’ve taken ownership over defining the specific problems we can address to execute on our vision and it’s up to our tech and design teams to decide what the solutions to those problems are. This workflow demands collaboration because you’re always working and communicating across teams just to accomplish daily tasks. Getting a team running like that is really special! We’re pumped about how that is translating into the product and our customers’ reactions, and we’re protective and appreciative of this culture at Banyan.
This workflow demands collaboration because you’re always working and communicating across teams just to accomplish daily tasks.”
What’s the biggest technical challenge you faced along the way? How did you overcome it?
I think Banyan has been saddled with an abundance of smaller, ever-present obstacles more than any single technical difficulty — there has never been one giant or scary complexity, but an endless monotony of small mundane complexities that most technologists can all relate to. These problems get addressed in a tried-and-true method of troubleshooting systems by re-creating problems and removing systems one by one to isolate variables and find sources of trouble. Sometimes these things can get pretty complex in and of themselves — just re-walking the paths your systems took in trying to understand the problem.