Getting into machine learning with Dr. Karthick Perumal, Data Scientist at Porsche Digital
Dr. Karthick Perumal is working as Data Scientist at Porsche Digital in Berlin. He shares his experience of transitioning into machine learning and his advice on how to get a job in machine learning in Germany.
Dr. Karthick Perumal is working as Data Scientist at Porsche Digital in Berlin. He shares his experience of transitioning into machine learning from academic research in Physics and his advice on how to get a job in machine learning in Germany, which topics and skills to focus on, what the interview process for machine learning and data science positions is like and how to prepare for interviews.
Hit Play to listen to this conversation and Follow to get notified about upcoming episodes.
📬 Get the latest India2Germany articles via email 📨
Welcome Karthick 👋. Could you tell us a bit about yourself?
I'm from Kadathur, a small town in Tamil Nadu. I was mainly interested in research during my studies. I did my bachelor's close to my hometown in Dharmapuri at the Government Arts College. I did two masters - one was in physics at the Presidency College in Chennai, and the second was at the main campus of Anna University on Laser and Electro-Optical Engineering. After that, I wanted to do a PhD. Usually in India, it takes around five to seven years for doing a PhD, but I wanted it to be quicker than that. If you look at the US, it takes almost similar to India like five to seven years. I had heard that in Europe, it usually takes like three to four years. And within Europe, Germany was my first option as they fund a lot for research. At the time, I was also working at Tata Consultancy Services (TCS), just after my master's degree, so, I was simultaneously looking for a PhD position in Germany. Then I heard about the DAAD scholarship, I applied for it and I got it. So once I got the offer PhD offer, I decide to quit my job. And then I came to Berlin for my PhD studies. And that's how I came to Germany.
I had heard that in Europe, it usually takes like three to four years. And within Europe, Germany was my first option as they fund a lot for research.
What do you do in your current role as a Data Scientist?
I am primarily involved in machine learning-related projects at Porsche Digital. Currently, I am focusing mainly on sound-based anomaly detection methods. So with a bit of supervised and semi-supervised learning. Right now we are focusing mainly on automotive industry, Porsche is our main customer, and we are focusing on stress testing of different core components. For the stress testing, we are focusing on doors, e.g., when a new car model has to be tested under different conditions like extreme temperatures. We do this kind of tests to make sure that the cars that we produce are of the highest standards and for this we use machine learning. We approach both in an unsupervised fashion as well as semi-supervised fashion. So far the supervised approach looks really promising.
I have been working at Porsche Digital for nearly three years now. At the beginning, we were mainly developing proof of concepts and every three months we moved on to different topics mainly focusing on solving problems within Porsche. At that time, I was working on different kinds of problems from building chatbots to sound-based classification. We also did some kind of driver analysis and also crash reduction. I was involved in a lot of projects. So, initially it was more about proof of concepts, but later slowly we decided to build more products. Since then in the last one or I would say like one and a half years, I am mainly focused on the current anomaly detection project.
Which technologies do you use to develop machine learning models?
We use TensorFlow currently since we are using deep learning-based methods. Those showed better results than conventional machine learning methods. So we are mainly using TensorFlow at the moment. We are slowly thinking of moving to PyTorch, but right now, all our models are based on TensorFlow.
How did you transition into machine learning?
Before I came to Porsche, I was working as a postdoc at DESY, a Research Institute in Hamburg, which is one of the brightest sources of light in the world. There I was mainly doing structural investigation of crystalline materials mainly using high resolution diffraction. I liked the job, but the one drawback was that I had a fixed contract for only five years. My contract was almost coming to an end and it was time for me to move on to a new job. At that time I didn't want to move to another postdoc, because postdoc contracts are just one year or typically three year contracts. Plus I didn't want to move from one place to another, so I decided to look a job in the industry. For someone like me with less less knowledge of German, I thought it would be better if I moved to a position where I don't have to worry much about my language skills. At that time data science was also a fast growing field. So I thought, since I know a bit of Python, and since I was also doing a lot of data analysis during my PhD and postdoc, I decided to move into data science.
This was around 2016 when I decided to move. At that point of time, I was taking some online courses in machine learning. Then in 2017, I actually took a three months break from my job and did a three-month intensive course in Berlin. After that, I went back to my job while I was simultaneously searching for a machine learning position. Once I found one, I quit my job.
How did you gain skills in machine learning?
Yeah, at the time, I was mainly doing online courses, but I felt I was still lacking something. I applied for a few roles at the time, but I felt I was not getting any reply. So I decided, maybe it's time for me to focus more intensively on the learning. That's when I decided to do this bootcamp called Data Science Retreat in Berlin. It was a bit expensive, but I thought, if I am going to a new job, I should be prepared for it. If I get a job and if I don't know what I'm doing, it will be really bad. So I decided I should not be in a position where I am struggling with something and to focus intensively on data Science. That's why I chose to do the course and actually, it helped me a lot.
There are some other options as well, e.g., Science to Data Science, which I think is in London and is free of cost. They help you to do an industry project. Then there is Insight Fellowship, which is also free. But the problem at the time was that it was only in US, which means you need a visa for it. But since I had a job, I didn't focus on it. I wanted to stay somewhere close and for me, Berlin was closest at that time.
What did you learn during the three months in Data Science Retreat in Berlin?
In the first one and a half months or two months, we were mainly taught concepts of machine learning by a few excellent teachers. They taught us concepts of machine learning and deep learning. In the remaining one and half months or so, each one of us had to pick a problem and try to solve it using machine learning. So the second half of the retreat was focused on building a portfolio project. At the end of the three-month course you have a project, which you can show to others, something you did on your own right from the data collection to solving the problem. This way you show your skills and that you've learned something and it makes it easier for a hiring manager to look into it.
If you have a Computer Science background, or DevOps, Machine Learning Engineer is the right path, but if you're from a non-Computer Science background like me, it is better to focus on the Data Scientist role.
How is a Data Scientist role different from a Machine Learning Engineer role?
I would say a Machine Learning Engineer should be able to write production-level code and they should also have some skills necessary for deployment of machine learning models along with some data science skills. On the other hand, Data Scientists focus mainly on building machine learning models. I was mainly involved in data science initially. Later, once I moved to Porsche Digital, since we didn't have anyone with experience in deployment, I decided to focus on the Machine Learning Engineer part. So I was writing production-level code, and then deploying models into the production. But later over the course of a few months, we hired people that were focusing on deployment. So now I am back to doing data science.
I would say if you have a Computer Science background, or DevOps, definitely Machine Learning Engineer role is the right path. But if you're coming from a non-Computer Science background like me, e.g., with a background in Physics or Mathematics or Biology, it is better to focus on the Data Scientist role. Maybe later over time, if you want, you could switch over to a Machine Learning Engineer role.
Which skills and topics in machine learning would you recommend to people looking for jobs in machine learning in Germany?
A vast majority of the people with whom I have interacted use Python for coding. So I would say Python skills are a must. It's also the industry standard for machine learning. Regarding topics to focus on in machine learning, it varies from one industry to another. Some companies are mainly focused on deep learning, whereas others typically use traditional machine learning techniques. So if you're looking for a job, I would suggest you to learn the traditional machine learning techniques, as well as some deep learning. That should be sufficient. Reinforcement Learning is something that not so many companies are looking for or using at the moment. So in my opinion, traditional machine learning techniques as well as deep learning should be more than sufficient if you're looking for a job in machine learning in Germany.
Which resources would you recommend to get familiar with data science?
Online courses is the right place to start. There are a lot of free courses. There are some paid courses as well on Coursera. You should be able to read a lot of articles. If you learn a new machine learning technique, for example, random forest or linear regression or logistic regression, learn the basics of what it is doing and how it is working. This helps a lot.
Once you know a bit, for example by doing a few online courses, I would suggest to start working on a topic that you're interested in. This can be any topic, for example something related to some work that you did in the past and try to apply machine learning techniques on it. Otherwise, another option I would suggest to participate in some online competitions like on Kaggle. Whether you succeed or not is not important. But you will learn a lot from other people, like how they approach the problem. So Kaggle is another option where you can learn a lot and then it can also help you to get a job. Also, many universities like Stanford and New York University publish their course content for open access on YouTube. So that is also a good resource which is free.
In the beginning, it is better to focus on the fundamentals. Try to understand what the algorithm does, what are the pros and cons of using a particular machine learning algorithm. You don't need to necessarily understand everything completely, but once you understand the basics, then start applying what you have learned. Only through practice, you will improve.
Which are some of the companies that one should consider while applying for machine learning jobs in Germany?
In Germany, Zalando has one of the biggest machine learning teams. I have heard that around 100 people are working on machine learning there. Other companies that come to my mind in Germany are SoundCloud, 20 Billion Neurons, EyeEm. Besides that, Berlin has a rich startup ecosystem, and quite a lot of them hire Data Scientists. So if you like the startup culture here, that's also something you should definitely look into. And if you're looking at the automotive sector since I come from the automotive sector, I would say MHP is a good choice if you don't know where to start. MHP is a consultancy mainly focusing on the automotive industry. They hire a lot of people including Data Scientists, almost on a monthly basis. So that would be a nice option. And if you don't want to focus on product, but if you're more interested in research, I would suggest Argmax. Finally, I would like to advertise our own company, Porsche Digital. Currently there is a stagnation in hiring due to Corona, but hopefully we'll start hiring once the situation improves.
What is the hiring process for data science positions like in Germany?
When I was looking for a job, I felt that in Germany the hiring process is quite slow. Usually it takes somewhere between one and three months. So it is not very quick and you have to wait a lot. So if you are already looking for a switch, try to start applying two or three months in advance. Usually in Germany there are two or three rounds of interviews. The first is usually the HR round. The second and third are mostly the technical rounds. In some companies, they give you a sample problem for you to solve. Depending on the problem, they usually give you from one day up to a week to solve the problem. They look at your solution and they try to decide based on how good your solution is. You don't have to come up with a perfect solution but they basically look at your thought process while coming up with the solution. So in most cases you will get a problem to solve. However, if you are an experienced developer already, then people know what you did, or if you have a good portfolio, or if you have good online presence, it might be a bit faster.
How important is it to have a portfolio while applying for data science jobs?
If you are a beginner or if you are doing a transition from a different field into data science, it would definitely be helpful. If you already have experience in this field, or if you have a computer science or data science background, it won't matter much. Because the problem is, if you come from a different background, people do not know how good you are or how good you code. So, if you have a few projects on GitHub, people can have a look at them, and can judge your skills based on that. So it makes it easier for hiring managers to shortlist the candidates.
How should one prepare for data science interview assignments?
First of all, I must say I'm not a fan of this method. But on the other hand, even in our company, sometimes we use this because we don't want to hire someone without knowing what their skills are. Also, we usually give simple tasks. We are mainly interested in their thought process, like how they try to approach a problem, right from the data analysis to building a model. How they approach the problem is more important than the results they get at the end. So, if you practice few data sets, it should be more than enough. I wouldn't say you don't need any extra preparation.
I say that I am not a fan of this approach because some companies give out datasets for the assignment that are similar to the problem that they are working on. So in the name of hiring, they are indirectly getting ideas from others. I don't like it if they give you the problem that is similar to the topic they are working on. It's better if they give you something completely different, like we do in our company, where we make sure that we don't use anything related to what we are doing. We just use any open source dataset which other people might would have used. So we don't try to use their ideas.
Do you have recommendations for people looking to get into data science?
If you're coming from a completely different field like from a non-Computer Science background, you should definitely learn coding. And for coding, I would suggest to start with Python as it is easier to learn and because most of the data science companies and most of the companies doing machine learning use Python. So this is a must. Then I would suggest to start with some online courses. As I said before, start with traditional machine learning algorithms. Don't directly dive into deep learning. Start with conventional machine learning and then do a few deep learning courses and this would be sufficient. After that, start applying your skills, whatever you've learned. Also, try to publish this on GitHub or participate in a few competitions, e.g., Kaggle competitions to improve your skills. This should be more than enough. So this can definitely help you to switch to a machine learning job. If you perform well in these online competitions or if you build a good portfolio, it can make it easier for you to transition.
How long did it take you to transition to machine learning?
It took me like six months of intensive learning and then searching for jobs afterwards.
Were there any websites that you used for searching for jobs?
When I was looking for a job I was mainly looking on LinkedIn and then I just did simple Google search. I applied to only around 15 or 20 companies.
To receive more posts like these, remember to subscribe to our newsletter 👇 and podcast 🎙
We would love to hear what you think. Write to us at email@example.com
Join the newsletter to receive the latest updates in your inbox.