Data Science: A multi-disciplinary field in which scientific methodology is applied to data analysis in order to find useful insights and promote evidence-based decision making.

Soft Skills in Data Science

17th March 2018

Author: Trevor Simmons

As I mentioned in my introductory post, I have noticed a large focus on technical skills in data science articles, with the greatest emphasis being on programming. I imagine this is because programming is clear-cut and it can be easier to teach and write about, but this could also give the wrong impression that data science is all about programming. In this post I would like to give credit to soft skills which I feel do not receive enough attention.

A prime example of a soft skill would be that during my upbringing, my family moved about a lot due to my parents’ jobs, and while I did get tired of having to move schools and keep making new friends, it had an unexpected benefit. I have found as an adult that I can relate to, and communicate with a wide range of people. I never intended to have this skill, or made any conscious effort to develop it, it just came about out of necessity. I remember once thinking about my upbringing when reading about a set of people Larry Wall, the creator of the Perl language, called glue people and thinking that described me quite well.

“But the open source movement is energized by the other sort of joiner. This sort of person joins many tribes. These are the people who inhabit the intersections of the Venn diagrams. They believe in ANDs rather than ORs. They’re a member of more than one subset, more than one tribe. The reason these people are important is, just like merchants who go between real tribes, they carry ideas from one intellectual tribe to another. I call these people ‘glue people’, because they not only join themselves to a tribe, they join tribes together.”

That is also how I see the data scientist, as being the glue connecting the objective world of data and the subjective human world, as well as joining together the many disciplines required in order to faciliatate this.

Communication skills are one of the most valuable skills. Throughout my career, I have found myself working in teams with non-technical people, from sales and marketing to managing directors and I have had to communicate technical information to them in a concise non-technical way using language that they would understand. A great example of this can be found in the video series from Wired where scientists describe one concept in five levels of difficulty. Particularly good was a neuroscientist describing the connectome to a 5 year-old, a 13 year-old, a college student, a neuroscience grad student, and a connectome entrepreneur.

Communication skills are also evident in writing code. I imagine most people would think of coding as being a language for a machine to understand, but in a similar way to the point I made above about joining the objective and the subjective, it is vital to write human readable code. I have heard it said to code with the person who will maintain the code after you in mind, but in practise I have found that I am all too frequently the person who maintains it after me. Sometimes I will have to revisit code I wrote years previously, when I only have a vague to no recollection of writing it. Unbeknown to me all the time I was doing this, I was practising translating technical information to non-technical people.

I was recently reading some sample data science interview questions and one of them asked: Describe linear regression to a business person. I’ll leave the reader to think about that for a second before suggesting an answer.

Answer below.

My answer: “Imagine that you would like to predict house prices based on the number of bedrooms. You could plot the existing data on a graph with the number of bedrooms on the x-axis and the prices on the y-axis. Linear regression would be drawing a straight line through the center of the data so that you can approximate the relationship which you could then use to both describe the current trend, and to make future predictions.”

As you can see, I gave an example from the everyday world in the form of a story, this is what engages people and is something that they can visualise. I kept it short and left out all technical language, nothing about the sum of squared error or anything else mathematical. I also used the most basic example with only one independent and one dependent variable. If someone would like more detail then they will usually ask. For instance, If they wanted to know how I found the ‘center of the data’ then I would talk about minimising error and would eventually get to the sum of squared error. If they asked what happens if you want to base the prediction on more than one variable, say number of bedrooms and location, then I would start to explain multiple regression and the line would become a plane, but at the very start I made it as simple as possible so that I could build upon it if required.

It can be challenging to know what technical information to leave out, especially seeing as that is how one understands a technical process, and particularly in cases where something clever has been done and you would like to tell everybody about it, but it has to be remembered that other people are focussed on different goals and do not always care about the same details.

XKCD - Academia vs. Business (Source: XKCD)

Understanding these goals is where business acumen comes in useful. During my career as a software developer, programming was the relatively easy part providing that I had a detailed specification, but it was obtaining that specification that presented the challenge. Some clients would only have a vague idea of what they wanted, and sometimes what they wanted was not what they needed. With an understanding of their business came the ability to interpret their requirements, not only to draw out of them the information needed but also to be able to make suggestions and identify both risks and opportunities. Again, this was developed from necessity, from running a business, from working on a number of projects, and most importantly from talking to and listening to people.

In all of the soft skills, it can be noted that they can be learnt and developed, but not necessarily acquired by teaching. They are acquired by practise and mostly not always with the tasks that they would useful for in mind. For example, last year I wrote the following tweet:

I was thinking of The Karate Kid (1984) film where Mr Miyagi makes Daniel wax cars. Daniel thinks he is being taken advantage of, being made to work when he really wants to be learning karate. Only to find later he has been developing the muscle and actions for karate moves all along. The day I wrote that tweet, I had asked a client for some specific data only to be given a dump of their entire database which I then had to pick through to find what I needed. This has happened to me a lot as a developer, where clients would send me data in all kinds of formats, sometimes stretched across different files, and I would have to dig through it and accumulate what I needed myself. As frustrating as that could be, especially when trying to meet deadlines, I realised that day, like the character of Daniel in the film, that all along I was developing valuable skills, how to compile, clean and format data. While that is not a soft skill in itself, it was developed in a similar way through practise, and completely unconsciously, before I had even heard of data science.

The main point is that even in the most boring, mind numbing tasks there can be value. A person can be developing skills that could be highly useful in the future without being aware of it. I have only covered a small number of soft skills here, and some like curiosity or patience I have had since such an early age that I cannot remember ever being without them, but I wanted to bring them more into focus because they are skills that a person may not realise they have because they are so familiar, but that we would all be much the less without.