« Who thinks the Data Scientists are happy in business? » « When I ask this question, I generally get a deathly silence that confirms the unwellness or embarassment of a profession that would have been carried to the top of digital jobs by the wave of Big Data and Artificial Intelligence.
We are still far from the initial promise of Data Science that was supposed to revolutionize the business models and transform companies to « Data driven companies ». Nevertheless I do not believe that Data Scientist will disapear or will be replaced by tools although we can observe huge increase in supply of solutions that promise that anyone can do Data Science tomorrow.
In order to help any current or future Data Scientist, here are some keys to succeeding the data science missions. We won’t talk about machine learning, feature, label, supervised mode etc … but rather we’ll pick up some different elements of « common sense » to cure this feeling of frustration that I use to perceive in this job. These keys to success are:
- Understand the Company
- Handle the art of semantics in a hostile environment
- Do not lock yourself in an ivory tower
- Think useful
- Find data deposits
Understand the Company
Which Data Scientist can pretend to find a model from the data of a company whose activities and organization are unknowned by him/her ? Can we imagine asking to anybody to set the right diagnosis and the right treatment, from a blood test, without knowing anything about the functioning of the human body?
Before starting analytical activities on data to find out insights, my first recommendation is to put aside the statistical models and Machine Learning arsenal, to put its « coat of humility » and to learn about the company.
How to do it ? I will limit myself to give 3 tips. I will develop this point in a later post.
First tip, read the latest annual report of the company. It’s not always easy reading especially if it is too financial but, for large companies, it also gives interesting elements on the strategy and sometimes a brief history is included.
Then, get some interviews with various business managers (CFO, Marketing, Customer Service Manager, Compliance Manager, CDO, etc …). Keep focus on people who have a global and transversal point of view of the company. These meetings will also have the advantage to introduce you and to « demystify » you.
Finally, choose a Balance Score Card model to take your notes. You will have a first macro-mapping of data blocks and possible uses of Data Science (see example below).
Never forget that it’s up to the Data scientist to understand the company (and not the other way around)!
Handle the art of semantics in a hostile environment
To convince yourself of the importance of a common language, try to spread this simple question to people of the company « what is the rule to calculate the number of customers of the company ? ». You’ll get as many answers as points of view. For exemple : should we count the prospects? customers without products ? the missing of the year? two agencies with the same customer is it one or two customers? etc … Perform this test sheds light on a reality : each collaborator evolves in the paradigm of its activity and knowledge. Everyone’s point of view, everyone’s truth.
The Data Scientist can not stop at the mathematical or statistical aspect of his discipline, he must be concerned about the meaning of words. What I call the art of semantics.
Why do I speak about a hostile environment in business? Three reasons at least: 1 / the lack of pedagogy and time of the actors because everyone does not have the gift, the desire or the time to explain well what it does – 2 / the lack of transparency because everybody does not wish to reveal its « secrets » – 3 / the roadmap construction of the company where silotage of structures, contradictory injunctions and divergent or even opposing objectives lead to a fragmented and distorted vision of concepts.
If the Data Governance function is set up and running, the question of semantics and business rules is normally addressed. But we all know that this type of structure is still very emerging, so the tip that I indicate here is to use market standards(for example interbank exchange format or standard Galia in the automobile) because they are useful repositories for laying the basic concepts. They are sufficiently legitimate for the actors of the field to agree to align themselves.
Do not lock yourself in an ivory tower
Many companies have given in to the idea – intellectually attractive – to set up DataLabs, to concentrate the skills in Data Science so that these DataLabs propose in an internal « supplier » model their services to the various business line. This has allowed companies to increase their understanding of the data issues. Data Scientists have spent much of their time evangelizing crowds. But after a few years of this operation, the DataLab results are weak and disappointing: about 2/3 of the Data projects are abandoned. The succession of POC has often been a succession of failures, especially when scaling up and deploying. This organization also contributed to the marginalization of the Data Scientist, his discouragement then his attrition and finally his resignation to take off to other heavens.
The advice I give here is that every Data Scientist must insert his activities into a collaborative teamwork process where he brings his skills and his mindset. The time is over to be locked in an ivory tower because the digital transformation of companies will not stop there. It’s time to work in teams with all the IT players, all the data workers, the business experts, the users … In short all the skills are put into action for a better exploitation of the data.
The company is not a lab ! This is a reality that can disappoint young recruits fresh out of their schools or universities. Of course there are R & D, Innovation, Market Research … but never forget that a company has a social purpose and economic constraints.
Data Science must be an activity that brings value to the company over time. It can not be – as we have seen above – a reserved area whose activities are understood by only a small bunch of privileged people. This will eventually suffer from the syndrome of « Data Science ? So what ? »
Data Scientist has to keep in mind 5 concerns :
- The feasibility. « Is the model I imagine achievable ? » For example, wondering about the freshness of the available data makes it possible to avoid building a model designed for a real time usage on data refreshed every …. 6 months.
- The regulation. « Does the model I imagine will be conformed? » Integrate RGPD as soon as the personal data are used. RGPD training must be part of the background of any Data Scientist. Beyond RGPD, there are many highly regulated professions such as banking, insurance etc … and it will not always be possible to use all data without going through encryption, grouping …
- The return on investment. « Does the model I imagine bring value to the company? » Ideally a business plan should be set for your project. For regulatory projects, it may be useful to think about measuring the risk of not doing. In any case, the indicators that measure the contribution of your model must be planned as soon as it is designed.
- Security. « Do I create a security breach? » The underlying techniques are recent. Their integration into the overall IS architecture of the company is often an innovation to implement. So this kind of changes can weaken the defenses of the SI when this point is forgotten.
- Industrialization. « Is the model ready to be released across the enterprise? » The last Big Data convention in Paris in March 2019 highlighted how critical this topic was for the success of Data projects. It is necessary to leave the mindset « Laboratory » to think « Industrial ».
Find data deposits
The last key relates to the heart of Data Scientist activities to find the raw material he needs for his day-to-day operations. Every Data Scientist arriving in a business dreams of getting quickly relevant and quality data to start, without restraint, his regression or clustering models. This dream is usually short. It will be necessary to roll up the sleeves and find the deposits of data in a data ecosystem of the company made complicated by the 3 agents of complexity which are:
- the technological evolution with centralized architectures that have been gradually transformed under the pressure of the arrival of servers, PCs, networks, the Internet and now IOT, cloud and digital .
- the evolution of the IS architecture with the cohabitation of in-house developments and software packages but also the evolution of the usages from batch to real time which has occured successively sequential files, the databases and now APIs and Data Lake.
- the evolution of the company, which should adapt its business to its ecosystem, merger / acquisition, join-venture with other companies or the pooling or outsourcing of resources. These transformations end up of cohabiting systems in a functional and technical architecture that is often heterogeneous and where it will not be easy to choose and extract the relevant data.
These 3 agents of complexity often lead to an « informational shambles ». The good news for the Data Scientist is that the emergence of big data challenges is pushing companies to deal with this situation. However, this remains a long-term job and the ability to find data deposits must be a primary quality of the Data Scientist.
I strongly advise to take a close look at the history of the IS of the company which is the first key point on the understanding of the company. The circle is complete.
Optimal Profile of a Data Scientist in Business
Finally, what could be the optimal profile of a Data Scientist in business. I assume acquired the skills in Data Science, in Mathematics and, very important, in Computer Science. I can not imagine a Data Scientist who is not comfortable with programming and IT.
To be effective in business, a Data Scientist will have to be:
- Curious to understand his environment and find the relevant data for his models
- Empathic to understand the actors of the company and facilitate its integration
- Collective to work in team
- Pedagogue to explain what he does and the value it brings to the company
- Responsible to commit to creating models that will be useful
Feel free to respond and comment on this article.
Mentions (in French)
Si vous voulez accéder à mes autres billets c’est ici (
Si vous ne voulez pas manquer les prochains articles : demandez à être enregistré à mon blog
Crédit photo :
© Jean Méance en avril 2019