logo
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Readme
Files and versions

71 lines
4.1 KiB

# Resnet Operator
3 years ago
Authors: derekdqc, shiyu22
## Overview
This Operator generates feature vectors from the pytorch pretrained **Resnet** model[1], which is trained on [imagenet dataset](https://image-net.org/download.php).
**Resnet** models were proposed in “[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)”[2], this model was the winner of ImageNet challenge in 2015. "The fundamental breakthrough with ResNet was it allowed us to train extremely deep neural networks with 150+layers successfully. Prior to ResNet training very deep neural networks were difficult due to the problem of vanishing gradients"[3].
## Interface
```python
__init__(self, model_name: str, framework: str = 'pytorch')
```
**Args:**
- model_name:
- the model name for embedding
- supported types: `str`, for example 'resnet50' or 'resnet101'
- framework:
- the framework of the model
- supported types: `str`, default is 'pytorch'
```python
__call__(self, image: 'towhee.types.Image')
```
**Args:**
image:
- the input image
- supported types: `towhee.types.Image`
**Returns:**
The Operator returns a tuple `Tuple[('feature_vector', numpy.ndarray)]` containing following fields:
- feature_vector:
- the embedding of the image
- data type: `numpy.ndarray`
- shape: (dim,)
## Requirements
You can get the required python package by [requirements.txt](./requirements.txt).
## How it works
The `towhee/resnet-image-embedding` Operator implements the function of image embedding, which can add to the pipeline. For example, it's the key Operator named embedding_model within [image-embedding-resnet50](https://hub.towhee.io/towhee/image-embedding-resnet50) pipeline and [image-embedding-resnet101](https://hub.towhee.io/towhee/image-embedding-resnet101).
## Reference
[1].https://pytorch.org/hub/pytorch_vision_resnet/
[2].https://arxiv.org/abs/1512.03385
[3].https://towardsdatascience.com/understanding-and-coding-a-resnet-in-keras-446d7ff84d33
# More Resources
- [Understanding Neural Network Regularization and Key Regularization Techniques - Zilliz blog](https://zilliz.com/learn/understanding-regularization-in-nueral-networks): Regularization prevents a machine-learning model from overfitting during the training process. We'll discuss its concept and key regularization techniques.
- [Optimizing Data Communication: Milvus Embraces NATS Messaging - Zilliz blog](https://zilliz.com/blog/optimizing-data-communication-milvus-embraces-nats-messaging): Introducing the integration of NATS and Milvus, exploring its features, setup and migration process, and performance testing results.
- [Optimizing RAG with Rerankers: The Role and Trade-offs - Zilliz blog](https://zilliz.com/learn/optimize-rag-with-rerankers-the-role-and-tradeoffs): Rerankers can enhance the accuracy and relevance of answers in RAG systems, but these benefits come with increased latency and computational costs.
- [What is a Generative Adversarial Network? An Easy Guide](https://zilliz.com/glossary/generative-adversarial-networks): Just like we classify animal fossils into domains, kingdoms, and phyla, we classify AI networks, too. At the highest level, we classify AI networks as "discriminative" and "generative." A generative neural network is an AI that creates something new. This differs from a discriminative network, which classifies something that already exists into particular buckets. Kind of like we're doing right now, by bucketing generative adversarial networks (GANs) into appropriate classifications.
So, if you were in a situation where you wanted to use textual tags to create a new visual image, like with Midjourney, you'd use a generative network. However, if you had a giant pile of data that you needed to classify and tag, you'd use a discriminative model.
- [Training Text Embeddings with Jina AI - Zilliz blog](https://zilliz.com/blog/training-text-embeddings-with-jina-ai): In a recent talk by Bo Wang, he discussed the creation of Jina text embeddings for modern vector search and RAG systems. He also shared methodologies for training embedding models that effectively encode extensive information, along with guidance o