What is machine learning?

“The 21st century is a digital book. … Your bank records, medical histories, voting patterns, emails, phone calls, your damn SAT scores! [The] algorithm evaluates people’s past to predict their future. …”

Above is a dialogue from one of my favorite Marvel movies, Captain America: The Winter Soldier, and I think it truly encompasses the idea of machine learning and artificial intelligence (ignoring that they meant it in a bad destroy-the-world way!). I’m sure its no surprise that we are currently in the age of modern technology. From popular devices like virtual assistants (Alexa, Google Home, Siri) all the way to self-driving cars like the Tesla Autopilot, machine learning and artificial intelligence have and are making great strides in many different fields.  Personally, coming from a medical sciences background, I’ve always been very interested in machine learning’s applications to healthcare and biomedicine. But before talking about this, I first want to take you through a whistle-stop tour of machine learning.

What is machine learning (ML)? Well, let’s think about it this way: suppose I introduced you to a 2 year old kid (let’s call him Jack) and I told you that you have to teach him what a cat is, how would you do it? The easiest way would be to go to google, show him pictures of cats and say “Look Jack, this is a cat!”. Now consider that it’s possible that Jack might look at a dog the next day and call that a cat too and you don’t want to teach him the wrong thing, so it’s important you show him pictures of animals that are not cats as well. Obviously, Jack will make some mistakes initially but over time, as we show him more pictures, he will hopefully learn the differences between cats and other animals and then identify them easily.

Jack learned through a process called “supervised learning”. We “trained” him with pictures of animals which were labelled i.e we told him “Look this is a cat!” or “No, this is not a cat!” and as time passed, Jack got more experienced and finally learned to recognize the cats. Computers are pretty much like Jack; the difference is that Jack has a brain which learns, while computers learn using complex models (the “digital” brains) which are designed using math and programming. And that’s essentially the idea of machine learning – its the study of computer algorithms and models that can automatically get better at certain tasks with experience. With supervised learning, we essentially give a computer/machine some labelled data and it automatically learns how to predict on new data. 

Getting back to our Jack analogy, supervised learning isn’t the only way to teach Jack though. Another way we could teach him is we could take him to a zoo and show him all the animals. We don’t have to tell him what each animal is, but hopefully just by looking at them, he could probably tell that some animals look similar and belong to the same family, while others belong some other family. Essentially, he’s learning to cluster the animals, an example of a type of learning called “unsupervised learning”. Putting this in technical terms, machines learn in an “unsupervised” manner by taking data that is un-labelled and learning how to cluster it or understand intrinsic patterns (that even we may not understand!).

Finally, another example method of teaching Jack is to show him pictures of animals and make him guess/predict if its a cat or not – every time his answer is correct, I’ll give him a chocolate as a prize and every time he gets it wrong I won’t give him a chocolate. Jack may make random guesses initially so he may or may not get his chocolate, but what’s interesting is that because of his want for more chocolate, he will indirectly learn the pictures and after a few more rounds, he will make the right guess and score a truckload of chocolate! This strategy of learning is called “Reinforcement learning” and in the technical sense, we train “agents”/computer algorithms by making them take “actions” and then giving a reward or punishment based on whether they took the right action. With experience, these agents will learn what are the right actions to take to achieve the most rewards.

So that’s a quick tour through machine learning! There’s many more types of learning methods which I didn’t discuss but the above three are the most popular ones. Now how is this type of research useful to medicine? Well, one important thing I’ve learnt over the course of my undergraduate degree is that the medical field is a data goldmine. The technology used in labs and hospitals today has become so advanced that collecting data has never been easier. For example, it took us $3 billion and 13 years to find out the entire sequence of human DNA in the “Human Genome Project” but today, we can find out our entire DNA sequence within 1-2 days for $2000. The challenge however, lies in data analysis – we still haven’t been able to extract the most insights out of medical data. This is where I think the power of ML comes in; using these methods like supervised learning, unsupervised learning etc we can train computers to assist us in deciphering disease patterns, developing better diagnostic tools and creating drugs that are most effective for patients. Here are just a few examples of the research domains where ML is being greatly used:

Medical Imaging 

Brain Medical Images. Taken from https://i.gifer.com/OB8I.gif

This is a field that I am very passionate about! Medical imaging looks at improving how we can visualize the body and detect diseases. Deep learning and “machine vision” are popularly used here, which I’ll talk more about in future posts. Deep learning is a special type of machine learning which focuses on neural networks, algorithms that try to mimic the neurons in our brain.  For example, engineers in Google and Stanford have developed deep learning systems that scan and detect breast and skin cancer. Here at UCL, we have the Centre for Medical Image Computing which conducts research into novel imaging techniques, image reconstruction and also machine learning techniques. For my third year dissertation, I did a project focused on deep learning and cardiac MRI, which I hope to share in a future post.

Genomics and Cancer research

Cancer is a debilitating disease where cells of an organ uncontrollably divide to form a tumor mass. This disease gets worse over time and if detected too late, can become difficult to treat. One use of ML is for early diagnosis of cancer, which can be done using imaging as I discussed previously. But ML is also valuable for understanding cancer itself. Why is this condition still such a challenge for us to tackle today? It’s because cancer doesn’t have a well-defined cause; its not just one factor but several factors like genetic mutations, smoking, diet, viruses, alcohol, radiation etc that come into play. And if that’s not complicated enough, within a single tumor itself, there are tons of different genetic mutations, meaning two people with the same cancer can have completely different mutations, a concept called “tumoral heterogeneity”. ML would be beneficial here for identifying and understanding gene interactions. The Francis Crick Institute published a paper about how esophageal cancer genes were identified using ML tools.

Drug development

Drug Development Pipeline. Taken from https://www.yourgenome.org/facts/how-are-drugs-designed-and-developed

 

Drug development is a painful process that takes around 12 years or longer and costs approximately $2.6 billion! It starts with screening through tons of potential drug molecules, followed by years of testing in the lab on animals, then moving to clinical trials where they are tested on actual patients, and then finally approved by the FDA to enter into the market. ML can be beneficial at all stages of this process in order to reduce the duration. It can be used for in silico screening of drugs, selecting the best molecules for testing and chemometrics (using math and statistics to understand chemical information).
So I’m sure you can appreciate that ML and AI have enormous applications in medical research and healthcare. McKinsey estimated that with better decision analytics systems, the medical domain could potentially generate around $100 billion in revenue annually, which just shows how there is a growing demand for better analytics. Being in this so-called 4th Industrial Revolution, I am certain that amazing things await!

Leave a comment

Design a site like this with WordPress.com
Get started