# Image Embedding Operator with Resnet50 Authors: Kyle, shiyu22 ## Overview This Operator generates feature vectors from the pytorch pretrained **Resnet50** mode, which is trained on [COCO dataset](https://cocodataset.org/#download). **Resnet** models were proposed in “[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)”, this model was the winner of ImageNet challenge in 2015. The fundamental breakthrough with ResNet was it allowed us to train extremely deep neural networks with 150+layers successfully. Prior to ResNet training very deep neural networks was difficult due to the problem of vanishing gradients. ## Interface `Class Resnet50ImageEmbedding(Operator)` [source](./resnet50_image_embedding.py) `__init__(self, model_name: str)` **param:** - model_name(str): the model name for embedding, like 'resnet50'. `__call__(self, img_tensor: torch.Tensor)` **param:** - img_tensor(torch.Tensor): the normalized image tensor. **return:** - cnn(numpy.ndarray): the embedding of image. ## How to use ### Requirements You can get the required python package by [requirements.txt](./requirements.txt) and [pytorch/requirements.txt](./pytorch/requirements.txt). In fact, Towhee will automatically install these packages when you first load the Operator Repo, so you don't need to install them manually, here is just a list. - towhee - torch - torchvision - numpy ### How it works The `towhee/resnet50-image-embedding` Operator implements the function of image embedding, which can add to the pipeline, for example, it's the key Operator named embedding_model within [image_embedding_resnet50](https://hub.towhee.io/towhee/image-embedding-resnet50) pipeline, and it is the red box in the picture below. ![img](./pic/operator.png) When using this Operator to write Pipline's Yaml file, you need to declare the following content according to the interface of Resnet50ImageEmbedding class: ```yaml operators: - name: 'embedding_model' function: 'towhee/resnet50-image-embedding' tag: 'main' init_args: model_name: 'resnet50' inputs: - df: 'image_preproc' name: 'img_tensor' col: 0 outputs: - df: 'embedding' iter_info: type: map dataframes: - name: 'image_preproc' columns: - name: 'img_transformed' vtype: 'torch.Tensor' - name: 'embedding' columns: - name: 'cnn' vtype: 'numpy.ndarray' ``` We can see that in yaml, the **operator** part declares the `init_args` of the class and the` input` and `output`dataframe, and the **dataframe** declares the parameter `name` and `vtype`. ### File Structure Here is the main file structure of the `resnet50-image-embedding` Operator. If you want to learn more about the source code or modify it yourself, you can learn from it. ```bash ├── .gitattributes ├── .gitignore ├── README.md ├── __init__.py ├── requirements.txt #General python dependency package ├── resnet50_image_embedding.py #The python file for Towhee, it defines the interface of the system and usually does not need to be modified. ├── resnet50_image_embedding.yaml #The YAML file contains Operator information, such as model frame, input, and output. ├── pytorch #The directory of the pytorh │   ├── __init__.py │   ├── model #The directory of the pytorch model, which can store data such as weights. │   ├── requirements.txt #The python dependency package for the pytorch model. │   └── model.py #The code of the pytorch model, including the initialization model and prediction. ├── test_data/ #The directory of test data, including test.jpg └── test_resnet50_image_embedding.py #The unittest file of this Operator. ``` ## Reference - https://pytorch.org/hub/pytorch_vision_resnet/ - https://arxiv.org/abs/1512.03385