Sexy Albeit Arduous? Data Scientists Give Insights Into Their Roles.

Once called “the sexiest job of the 20th Century,” the appeal of the data scientist role is dependent on how data-driven the workplace is.

Written by Robert Schaulis
Published on Aug. 22, 2022
Brand Studio Logo

This July, the Harvard Business Review marked the tenth anniversary of the publication of the essay “Data Scientist: The Sexiest Job of the 21st Century” with a retrospective. The essay’s authors, Thomas H. Davenport and DJ Patil, returned to ask: “Is Data Scientist Still the Sexiest Job of the 21st Century?”

Their answer to this sensationalistic question? A resounding “yes, but …”

“A decade later, the job is more in demand than ever with employers and recruiters,” Davenport and Patil said, citing a very competitive average salary and figures from the U.S. Bureau of Labor Statistics that predict prodigious growth in the field between now and 2029 as proof of the continued attractiveness of data science. What was once an amorphous and poorly understood field, the authors noted, is now better institutionalized and carries with it a profusion of tech tools to help scientists create actionable insights. 

The job does, however, suffer from some long-standing and serious difficulties, according to the authors. While it certainly has its appeal, the prospect of arduously cleaning and combing through data — which, despite advances in AI, is a very real prospect — is not the stuff of many people’s fantasies. Couple that with the possibility of working with a less than data-driven organization that might not take advantage of your insights, and there is the potential for burnout.  

Thankfully, there are workplaces, tools and dedicated professionals working hard to get the most out of data. Built In Austin recently reached out to two data scientists at exceptionally data-driven companies to learn about the tools, the best practices and the innovations that successful data scientists and successful data-driven teams are using — and building — to make life easier for clients, colleagues, their companies and themselves. 

 

Further Reading:What is Data Science? A Complete Guide.


 

QuotaPath team photo
QuotaPath

 

Image of Ryan Milligan
Ryan Milligan
Senior Director, Revenue Operations • QuotaPath

 

With its automated commission-tracking platform, QuotaPath helps sales teams design customizable compensation plans and forecast commissions in real time with a single easy-to-use dashboard.

 

Describe your data stack, and why you use that combination of tools.

We are using Mozart Data to combine our process to extract, transform and load data (using Fivetran) with data warehousing (using Snowflake) and our nightly data transformation. We then use Mode as our primary business intelligence and data visualization tool for internal reports, dashboards and data investigation.

We use Fivetran to pull data from Salesforce, Salesloft, Amplitude, HubSpot, Intercom and other sources into Snowflake — our data warehouse. We then connect our Snowflake data warehouse to Mode for BI and data visualization. We use this combination of tools to make it simple to pull data from our different tools, transform the data and then share dashboards with the team.

The goal of this stack is to ensure that we are able to capture and ingest data from all our different data sources; we can transform that data into usable tables, and we can leverage that data to drive action within the organization.

We place a heavy emphasis on data democratization and access here.

 

How does QuotaPath use data, and what unique requirements does that place on your team and the technology you use?

We use data to develop an understanding of how the business is performing, why it potentially is deviating from our financial plan and what we are going to do next to drive forward progress.

We have over 50 dashboards built here at QuotaPath that tell us how our sales, marketing, product and engineering teams are doing in relation to their goals. That includes tracking new business bookings, new user acquisition, product exploration and customer renewal rates, among other things.

We place a heavy emphasis on data democratization and access here. Everyone can see every dashboard. That requires us to have easily understandable and interpretable data sets for users of all analytical and technical ability.

 

How has your stack evolved over time, and how do you think it’ll continue to evolve in the next few years?

We built our stack from scratch in the last year, so it has evolved a lot. We are likely going to be adding more data sources from different tools over the next 6 months, so that will be the primary evolution.

We are also actively hiring a senior analytics engineer to build out the roadmap of our data organization, so we’re excited to have them help define the future of the data organization as well!

 

 

Image of Vibha Srinivasan
Vibha Srinivasan
VP of Data Science • Cognite INC.

 

With a platform designed to integrate with existing IT and OT infrastructures, industrial data operations company Cognite offers data-management services for the oil and gas, manufacturing and power, and utilities industries — providing visualizations, analytics and AI services at scale.

 

Describe your data stack, and why you use that combination of tools.

Cognite’s customers include heavy-asset industries in the energy, manufacturing and utilities sectors. These industries have disparate and complex data sources such as sensor readings, equipment health and maintenance data, instrumentation diagrams, 3D computer-assisted design models and more. Therefore, the tech stack our team uses is flexible depending on the needs of the customer.

The general components are Python — and its data science libraries, including pandas, scikit-learn, PyTorch, Keras, spaCy and others — Jupyter Notebooks, Databricks and visualization and web application tools such as Microsoft Power BI, Grafana, Dash and Streamlit. We also make use of commercial physics simulators, such as flow simulators, for specific industrial use cases.

We integrate all of these data sources into our cloud-based industrial DataOps platform Cognite Data Fusion, which is equipped with custom machine-learning algorithms to build relationships between these data sources and measure data quality. This provides a solid data foundation and greatly reduces our team’s time and effort spent building data science and analytics solutions.

The tech stack our team uses is flexible depending on the needs of the customer.

 

How does Cognite use data, and what unique requirements does that place on your team and the technology you use?

Cognite works with heavy-asset industries to enable data-driven decision making for daily operations control, scheduled maintenance activities, longer-term initiatives such as wind farm planning or smart well drilling and more.

In addition to the diverse data sources that our customers have, there is a wealth of academic and empirical knowledge about these industrial systems. As such, our data scientists typically have domain expertise in areas like petroleum, electrical and mechanical engineering, as well as experience with data science and machine-learning techniques.

Our team has the unique opportunity to design hybrid AI models that combine physics-based approaches with machine-learning techniques. The technologies we use include the traditional data science stack as well as established physics simulators such as process simulation software — AspenTech HYSYS, for example — and flow simulators such as GAP, PROSPER and OLGA. Mathematical optimization solvers are also used by our team to solve large-scale planning and scheduling objectives, as well.

 

How has your stack evolved over time, and how do you think it’ll continue to evolve in the next few years?

Our goal is to help our industrial customers evolve their tech stack and processes to get maximum business value from their data. We get valuable feedback from our customers on Cognite Hub, our user community, and use this to guide our product roadmap and data science features. 

Over time, our team has created an open-source library for industry-relevant data science algorithms. And in the next few years, we plan to add many more recipes to this library. We also plan to build integrations with more physics solvers and standardize our solutions and visualizations portfolio, so we can readily deploy our solutions to production.

 

 

Responses have been edited for length and clarity. Images via listed companies and Shutterstock.