Skip to main content

Introduction to Transfer Learning

 

Introduction to Transfer Learning

We, humans, are very perfect in applying the transfer of knowledge between tasks. This means that whenever we encounter a new problem or a task, we recognize it and apply our relevant knowledge from our previous learning experiences. This makes our work easy and fast to finish. For instance, if you know how to ride a bicycle and if you are asked to ride a motorbike which you have never done before. In such a case, our experience with a bicycle will come into play and handle tasks like balancing the bike, steering, etc. This will make things easier compared to a complete beginner. Such leanings are very useful in real life as it makes us more perfect and allows us to earn more experience.

Following the same approach, a term was introduced Transfer Learning in the field of machine learning. This approach involves the use of knowledge that was learned in some task, and apply it to solve the problem in the related target task. While most machine learning is designed to address a single task, the development of algorithms that facilitate transfer learning is a topic of ongoing interest in the machine-learning community.

Why transfer learning?
Many deep neural networks trained on images have a curious phenomenon in common: in the early layers of the network, a deep learning model tries to learn a low level of features, like detecting edges, colors, variations of intensities, etc. Such kind of features appear not to be specific to a particular dataset or a task because no matter what type of image we are processing either for detecting a lion or car. In both cases, we have to detect these low-level features. All these features occur regardless of the exact cost function or image dataset. Thus learning these features in one task of detecting lions can be used in other tasks like detecting humans. This is what transfer learning is. Nowadays, it is very hard to see people training whole convolutional neural network from scratch, and it is common to use a pre-trained model trained on a variety of images in a similar task, e.g models trained on ImageNet (1.2 million images with 1000 categories), and use features from them to solve a new task.

Blocked Diagram :

When dealing with transfer learning, we come across a phenomenon called the freezing of layers. A layer, it can be a CNN layer, hidden layer, a block of layers, or any subset of a set of all layers, is said to be fixed when it is no longer available to train. Hence, the weights of frozen layers will not be updated during training. While layers that are not frozen follows regular training procedure.



When we use transfer learning in solving a problem, we select a pre-trained model as our base model. Now, there are two possible approaches to use knowledge from the pre-trained model. The first way is to freeze a few layers of the pre-trained model and train other layers on our new dataset for the new task. The second way is to make a new model, but also take out some features from the layers in the pre-trained model and use them in a newly created model. In both cases, we take out some of the learned features and try to train the rest of the model. This makes sure that the only feature that may be the same in both of the tasks is taken out from the pre-trained model, and the rest of the model is changed to fit the new dataset by training.

Freeze and Trainable Layers:

Freezed and Trainable Layers

Now, one may ask how to determine which layers we need to freeze and which layers need to train. The answer is simple, the more you want to inherit features from a pre-trained model, the more you have to freeze layers. For instance, if the pre-trained model detects some flower species and we need to detect some new species. In such a case, a new dataset with new species contains a lot of features similar to the pre-trained model. Thus, we freeze fewer layers so that we can use most of its knowledge in a new model. Now, consider another case, if there is a pre-trained model which detects humans in images, and we want to use that knowledge to detect cars, in such a case where the dataset is entirely different, it is not good to freeze lots of layers because freezing a large number of layers will not only give low-level features but also give high-level features like nose, eyes, etc which are useless for new dataset (car detection). Thus, we only copy low-level features from the base network and train the entire network on a new dataset.

Let’s consider all situations where the size and dataset of the target task vary from the base network.

  • Target dataset is small and similar to base network dataset: Since the target dataset is small, that means we can fine-tune the pre-trained network with the target dataset. But this may lead to a problem of overfitting. Also, there may be some changes in the number of classes in the target task. So, in such a case we remove the fully connected layers from the end, maybe one or two, add a new fully-connected layer satisfying the number of new classes. Now, we freeze the rest of the model and only train newly added layers.


  • Target dataset is large and similar to base training dataset: In such case when the dataset is large and it can hold a pre-trained model there will be no chance of overfitting. Here, also the last full-connected layer is removed, and a new fully-connected layer is added with the proper number of classes. Now, the entire model is trained on a new dataset. This makes sure to tune the model on a new large dataset keeping the model architecture the same.


  • Target dataset is small and different from the base network dataset: Since the target dataset is different, using high-level features of the pre-trained model will not be useful. In such a case, remove most of the layers from the end in a pre-trained model, and add new layers to the satisfying number of classes in a new dataset. This way we can use low-level features from the pre-trained model and train the rest of the layers to fit a new dataset. Sometimes, it is beneficial to train the entire network after adding a new layer at the end.


  • Target dataset is large and different from base network dataset: Since the target network is large and different, the best way is to remove the last layers from the pre-trained network and add layers a satisfying number of classes, then train the entire network without freezing any layer.


    Transfer learning is a very effective and fast way, to begin with, a problem. It gives the direction to move, most of the time best results are also obtained by transfer learning.

  • Comments

    Popular posts from this blog

    Best digital marketing in Perth

    Best digital marketing in Perth Introduction Your introduction into the brave new world of the digital space will be custom-tailored to your business needs requirements. You will be introduced to the crew who will be handling your project, from inception to the launch into the market. Assess It will be our job to not only know your customers but how they engage with the core products and  brand relationships . From here we break down what we research, to identify the core elements needed to engage the customer. Create It’s imperative that the design of your vessel is done right from the start. Its shape, level of focus, and attention to detail are crucial for a prosperous, lucrative, and extended journey. We will always present concepts and suggestions as per the requirement, but we truly believe this process should be a collaborative one between the creative crew of the PWD and the client. The final form will dictate its progression into the  development  and manufacturi...

    The Ultimate Guide to Pay-Per-Click (PPC) Advertising

      The Ultimate Guide to Pay-Per-Click (PPC) Advertising Introduction In the fast-paced digital marketing world, businesses strive to maximize their online presence and reach their target audiences effectively. One of the most potent tools in their arsenal is Pay-Per-Click (PPC) advertising. This advertising model has revolutionized how companies attract and engage potential customers. This comprehensive guide will delve deep into PPC advertising, exploring its benefits, strategies, and best practices to help you harness its power for your business. What is Pay-Per-Click (PPC) Advertising? PPC advertising is an online marketing model where advertisers pay a fee each time their ad is clicked. Essentially, it's a way of buying visits to your site rather than earning them organically. PPC ads are displayed on search engines, social media platforms, and websites, targeting specific keywords and demographics. The Mechanics of PPC Understanding the mechanics of PPC is crucial for creating...

    WHAT ARE NEURAL NETWORKS? | Comingfly

    WHAT ARE NEURAL NETWORKS ? Neural Networks the process of machine learning are neural networks. These are brain-inspired networks of interconnected layers of algorithms, called neurons, that feed data into each other, and which can be trained to carry out specific tasks by modifying the importance attributed to input data as it passes between the layers. During training of these neural networks, the weights attached to different inputs will continue to be varied until the output from the neural network is very close to what is desired, at which point the network will have 'learned' how to carry out a particular task. A subset of machine learning is deep learning, where neural networks are expanded into sprawling networks with a huge number of layers that are trained using massive amounts of data. It is these deep neural networks that have fueled the current leap forward in the ability of computers to carry out task like speech recognition and computer vision. T he...

    Difference between loc() and iloc() in Pandas DataFrame

      Difference between loc() and iloc() in Pandas DataFrame Pandas library of python is very useful for the manipulation of mathematical data and is widely used in the field of machine learning. It comprises many methods for its proper functioning.  loc()  and  iloc()  are one of those methods. These are used in slicing data from the Pandas DataFrame. They help in the convenient selection of data from the DataFrame. They are used in filtering the data according to some conditions. The working of both of these methods is explained in the sample dataset of cars. loc()  :  loc()  is label-based data selecting method which means that we have to pass the name of the row or column which we want to select. This method includes the last element of the range passed in it, unlike  iloc() .  loc()   can accept the boolean data unlike  iloc()  .  iloc() :  iloc( )  is an indexed-based selecting method which means that we ...

    What is Cyber Security | Comingfly

    What is Cyber Security The Cyber security or information technology security are the techniques of protecting computers, networks, programs and data from unauthorized access or attacks that are aimed for exploitation. Description:  Major areas covered in cyber security are: 1)  Application Security 2)  Information Security 3)  Disaster recovery 4)  Network Security Application security encompasses measures or counter-measures that are taken during the development life-cycle to protect applications from threats that can come through flaws in the application design, development, deployment, upgrade or maintenance. Some basic techniques used for application security are:  a)  Input parameter validation,  b) User/Role Authentication & Authorization,  c)  Session management, parameter manipulation & exception management, and  d)  Auditing and logging. Information security protects information from unauthori...