Data Science is a broad field. Not only does it overlap with various other disciplines - machine learning, applied math and statistics to name a few - it often requires clarification on what a data scientist is, or what types of data scientist there are.
Nevertheless, there are core skills anyone interested in the field should have.
The basics - tools
This should go without saying, but no matter your career goals you’re going to need the basic tools to get started. For data science this means knowledge of a statistical programming language, and a database querying language.
For the latter, you’ll be expected to be able to write and execute queries in SQL. NoSQL and Hadoop are important in this field, but SQL is where you should start.
The most common programming language is probably Python, but the debate rages between enthusiasts as to whether it’s superior to R. The natural solution is to eventually learn both, and prove your depth of knowledge to prospective employers. In the first instance, proficiency in one is essential.
The basics - stats
This is another no-brainer. A basic understanding of statistics is necessary for a data scientist, and if you’re not sure what might be required, a quick search will point you in the right direction. Stats knowledge is important because when you’re faced with a new project, or problem to solve, you’ll need to understand which technique to apply for most effective results. It will stand you in good stead everywhere, and particularly at companies who rely on data information to drive their business, but where the product itself is not data-focused. Your expertise will be needed to interpret and explain results, so they can be turned into strategy.
Roadmap your career
Let’s face it, not everybody is a unicorn. Unicorns might not actually exist, and no one really expects you to be one. One advantage of data science being somewhat hard to pin down is that it allows you to room to focus on the areas in which you shine. If the technical aspects are your strength, make sure you keep abreast of new developments. If you’re interested in administering big data processing solutions, you’ll want to become an Apache wizard. Perhaps you want some interaction with clients in your work - communication skills and visualisation tools are going to be important. Identify what you’re good at and what you enjoy, and you can find a data science niche that fits just right.
Rise of the machines
It may not be the end of the world if you’re not fully versed in machine learning, every nuance of algorithms and how they work. Many techniques can be utilised using Python and R libraries. Having said that, companies which use large amounts of data might expect you to be at least familiar with machine learning methods. You should be able to demonstrate some knowledge and, more importantly, show you understand when different techniques are necessary to solve a problem or find an answer. It’s hard to do that if you’re in the dark.
Don’t be surprised to find prospective employers enquiring about your calculus and linear algebra skills, or even asking you to demonstrate them in an interview. These are the basis of many machine learning and statistic techniques. And yes, R provides ready-made implementations that can be used day-to-day…but that won’t help if the team you work on is asked to build their own implementations later on.
Wrangling the data
At some point, the data is going to get messy. It’s raw, it has missing values, the formatting is off - and it might just be your job to clean it up. Imperfections in data sets are inevitable along the way, and you’ll need to know how to data mung with the best of them.
The importance of having this skill cannot be overstated. You may be employed in a start-up company that hasn’t built strategy based on data before, or simply be dealing with people who have little understanding of what you do. And they will never understand, unless you can communicate your work to them. You should be able to present your findings or experiments to others in an accessible way - other departments may be relying on it to be able to do their jobs. Data visualisation tools like ggplot can be useful here, but don’t rely on them to do all the work. If you understand the principles behind presenting and communicating data, you’ll be more effective when you have to do it.
In data science, software allows the data to be systemised. It standardises procedures so it can be used by other people. If you find yourself working for a smaller company, you may be responsible for developing data-driven products, so it follows that a strong grounding in software engineering can only be helpful
Keep your Data Scientist cap on
Success in this field isn’t just about knowing the technical language. It’s important to be able to look over the disparate parts that make up a data scientist, and be able to pick the right skill for the right situation. Somewhere along the recruitment line, you’re likely to be presented with a live problem to think about; perhaps a new branch of the company is opening up, or they’re developing a new product. You may be asked how and what you’ll contribute, and it’s important to consider your role in relation to the business as a whole. Will the design team be reliant on your findings before they can start work? Can you work with approximations rather than specifics, or when is it appropriate to do so? Which of your skills are best applied to a completely new product? Data Science is often about solving problems, and you should be ready to demonstrate your ability to prioritise, and find solutions.
Are you a data scientist? Do you think we missed some skills? Join the discussion in the comments below!