Skip to Content

The Computer Vision Fellowship

From June – September of this year, I had the good fortune of pursuing a research fellowship with the SAP Machine Learning (ML) Digital Core unit at SAP Labs India, Bengaluru.

 Fig 1. SAP Leonardo Machine Learning Foundation: Breadth of capabilities

The ML Digital Core Unit supports various stakeholders at SAP, of which one of them is the MLF. My responsibility during the period of my fellowship was to understand how the Inference service for Scene Text Recognition (STR) is built, which is now a part of the SAP API Hub, globally available to all SAP customers.

0. Keep reading, keep learning

In true geek fashion, I start off the first section with 0 instead of 1. On Day 1, when I joined this LoB everything was new. I’d been in my parent team for 2 years and everything was very comfortable there. So on Day 1, I was nervous, about this change but excited at the same time, looking forward to new challenges & a new set of problems for the next 4 months. One of the remarkable parts of this fellowship is that I did not have to learn something new prior to joining the team. At the fellowship interview (which was 1 month before), it was determined that my skills are enough to perform the job at hand, and also since I did not have much expertise in computer vision, my mind was a clean slate. I can start fresh. And that’s precisely what I did!

Fig 2. Example of end-to-end scene text detection

After two and a half years, I printed a research paper for the first time and the way it felt was magical. As I read one research paper after the other, recommended by my colleagues – I understood the general theme and slowly got a hang of what computer vision is and in specific, what STR is. Over the course of the next 3 months, I went on to read various research journals, conference publications as well as technical articles in the field of optical character recognition and image recognition. I picked up the fundamental blocks of deep learning –  neural networks, and in specific recurring neural networks (RNNs) and how they are used to build models using Tensorflow.

This fellowship helped me understand how the whole data pipeline is set up, how models are tuned in TensorFlow and deployed. I learned how TensorBoard was used to monitor the whole training and inference phases during development. It was really inspiring to know that SAP uses cutting-edge hardware in terms of GPUs that run round the clock just to train our deep learning models.

 

Keep reading, be open to listening, you will eventually learn, and so will your model!

 

As you can see, it was an eye-opener and each minute in the team, I learned something new. Honestly, at times I felt like it was too much and a lot of stuff was going above my head. However, I realized that that’s the challenge at hand. To put yourself in difficult positions, learn something new and listen, rather than talk. Fun fact: I learned quite a lot from memes and comics as well!

1. Math is at the core of everything

The way I see deep learning is that there were a few very very smart people who came up with the RNNs a while ago and right now, we have the business need, the computational power, and the data to play around and apply these concepts in practical real-life scenarios. It’s the best time and deep learning as a technology has peaked at this crucial juncture.

Unfortunately, I did not have time on my hands to completely dissect our research paper but I did read it a couple of times and figured that during the object detection phase as well as the text deciphering phase – math is at the core of it. Linear algebra, discrete mathematics – those are the concepts involved in deep learning models and if you’re a student at an engineering college, my only request it that, please attend those classes with utmost dedication. Because if I could go back in time, I’d do that.

2. Innovation is important, but perseverance is better

Humanity has been innovating for ages. Right from the wheel to electricity to computers to smartphones – we’ve been doing it and deep learning is just another advancement in the larger scheme of things. It’s just a tool that can help us do something better.

We learn it, use it, make it better and build solutions that help real people solve real problems. The way I see it, innovation is a given. If a set of smart people are put into a room, they will eventually come up with innovative ideas. However, the caveat is of human effort and dedication for the project at hand. For example, I’ve never been the brightest student in my graduating class at engineering school, or in my team, but my core strength lies in accurately determining the root cause of any problem and connecting a few dots to solve that problem. And this needs a lot of focus, passion, and zeal to solve the problem.

If you own a problem, you solve a problem.

Given a choice between a hard-working, passionate person and an expert I may tend to pick the underdog because even if it takes some time, the underdog would eventually win! And during the course of this fellowship, I put in the extra time, the extra effort necessary to go one mile further to make sure that all my tasks were completed and my colleagues got all the info they needed from me in a timely manner. For example, I was assigned the task of producing a few (1000s) synthetically generated images for a specific use case and I was also given 3-4 approaches on how I could accomplish that feat. I took some time, understood what needs to be done and tried all the approaches. I could’ve stopped at the first approach that worked, but I felt the need to test out all the options at hand so that I can pick the best way to do it. And after a week, with the help of my colleagues, I was able to accomplish it, the best way possible! And now, it’s on GitHub for anyone in the team to pick it up and use it in any form they seem fit.

That’s the beauty of hard work. It pays!

3. The customer is at the center of every single use case 

Everyone in the industry is aware of the fact that the customer is the most important stakeholder in any business scenario. At the end of the day, we are all here to solve a customer’s problem, help him/her run better and as a result, make life easier and better for everyone involved. However, the interesting challenge here is to first identify the customer accurately and secondly, determining his/her pain points.

Fig 3. Design Thinking process at Enterprise scale

 

Design Thinking is a unique way to address this challenge and basically, it tells you to put yourself in customer’s shoes and with a little bit of empathy, you will be able to single out these pain points. And if we design software in a way that solves these concerns or makes the whole process a little bit better, a little bit faster, and a little bit safer – it gives enormous value to the customer and more importantly, the customer is relieved that his/her problem is solved.

Prior to the development of the STR service, the manager, data scientists and product owners had extensive design thinking workshops, conversations, meetings and discussions with their customers and during the development process, we were sharing the results with our internal stakeholders as well. Constant feedback was coming in from these folks and that really helped us to get the accuracy and efficiency we were hoping to achieve. I’m really proud and glad that I was in a dynamic and versatile team that could take real feedback, tune our models better and improve efficiency – all in 12 hours! That’s just incredible stuff.

I learned that given enough thought, time and effort, any problem can be solved to a certain extent in this world and if it makes the customers happy, then our job is done.

4. Connections matter. Conversations matter.

SAP has a working culture wherein any employee, can reach out to any other employee on the whole planet and get answered. It might take some time initially, but once you have your network in place at a global company like SAP, it really helps change the way you think and you get help quicker. In that respect, ML Digital Core Unit has a very interesting set-up wherein they have product experts, managers, research fellows, interns, data scientists, software developers, security experts, and software architects – all sitting on the same floor next to each other. And that’s a treasure chest waiting to be explored for someone like me who’s just completed 2 years in the corporate world. Every time I have a conversation with a member of the team, I feel optimistic because each one is committed to driving the machine learning topic to the best of their abilities and they’re really passionate in building the intelligent enterprise that has been envisioned by SAP’s leaders.

During the course of my fellowship and even after that period, all the colleagues in this unit were extremely open to discussing technology, society, and life in general and most importantly, helped me understand the nitty-gritty of building a consumable deep learning service.

I absolutely cherish the moments when we went on a team outing to a resort and played badminton all morning and water polo (kinda) in the swimming pool all afternoon. Those were indeed, good times!

Fig 6. A photograph to preserve the good times with ML Digital Core, Bengaluru

 

Lastly, I’d like to thank Mahesh Gopalan, my manager as well as Satyadeep Dey, the fellowship manager for making these 4 months a seamless experience. Two incredible gentlemen, I’d work with any day.

I truly believe that the computer vision fellowship with the ML Digital Core unit was an enriching experience, both personally as well as professionally. If I had to do it again all over, I would!

To report this post you need to login first.

2 Comments

You must be Logged on to comment or reply to a post.

Leave a Reply