The main objective of Text-Image Search is to find the relevant images by textual description. Both image and texual caption can be embedded into the same embedding space. Then their similiarty can be got from the distance of corresponding embedding vectors.
Image credit: Comment of a cat object
Model(s) |
coco_1k_r1 |
coco_1k_r5 |
coco_1k_r10 |
coco_5k_r1 |
coco_5k_r5 |
coco_5k_r10 |
dim |
eccv_map_at_r |
eccv_rprecision |
Model(s) from |
---|
For each model of text-image search, we evaluate its performance on MS COCO dataset by using the method in ECCV Caption. Details can be refered in its publication and project.
We can use the built-in pipeline to generate text and image embeddings with different modality, insert image embeddings into the vector database, and search related images results in the vector database by text content. More details refer to Text Image Search Pipeline Example.
We can use the built-in text_image_embedding
pipeline to get image modality embedding, which will use the clip_vit_base_patch16
model default to generate embedding for one image or batch-generate embeddings for multi-images.
from towhee import AutoPipes
# get the built-in text_image_embedding pipeline
image_embedding = AutoPipes.pipeline('text_image_embedding')
# generate image embedding
embedding = image_embedding('./test1.png').get()
# batch generate image embeddings
embeddings = image_embedding.batch(['./test1.png', './test2.png'])
embeddings = [e.get() for e in embeddings]
The model in the pipeline can be set to the Models list above using the AutoConfig
interface, refer to TextImageEmbeddingConfig Interface. And the modality
configuration to this pipeline defaults to 'image'
.
We can also set modality
to 'text'
to get text modality embedding with the default clip_vit_base_patch16
model.
from towhee import AutoPipes, AutoConfig
# set TextImageEmbeddingConfig for the pipeline
text_conf = AutoConfig.load_config('text_image_embedding')
text_conf.modality = 'text'
text_pipe = AutoPipes.pipeline('text_image_embedding', text_conf)
# generate image embedding
embedding = text_pipe('A running dog.').get()
# batch generate image embeddings
embeddings = text_pipe.batch(['A running dog.', 'Puppy Corgi.'])
embeddings = [e.get() for e in embeddings]
We can use the built-in insert_milvus
pipeline to insert the image modality embedding into the Milvus vector database, which needs to specify the name of the collection.
Before running the following code, please make sure you have created a collection, for example, named
text_image_search
, and the same dimensions(512) to the model, and the fields are id(auto_id), url(DataType.VARCHAR) and embedding(FLOAT_VECTOR).
from towhee import AutoPipes, AutoConfig
# set MilvusInsertConfig for the built-in insert_milvus pipeline
insert_conf = AutoConfig.load_config('insert_milvus')
insert_conf.collection_name = 'text_image_search'
insert_pipe = AutoPipes.pipeline('insert_milvus', insert_conf)
# generate embedding
embedding = image_embedding('./test1.png').get()[0]
# insert text and embedding into Milvus
insert_pipe(['./test1.png', embedding])
You can also set host
and port
parameters for Milvus, and if you are a Cloud user, there are also user
and password
parameters, refer to MilvusInsertConfig Interface.
After inserting image modality embeddings into Milvus, we can search the text and get the related image results with the built-in search_milvus
pipeline, which needs to specify the name of the collection. And set search_params = {'output_fields': ['url']}
to return the 'url'
field.
Before searching in Milvus, you need to load the collection first.
from towhee import AutoPipes, AutoConfig
# set MilvusSearchConfig for the built-in search_milvus pipeline
search_conf = AutoConfig.load_config('search_milvus')
search_conf.collection_name = 'text_image_search'
search_conf.search_params = {'output_fields': ['url']}
search_pipe = AutoPipes.pipeline('search_milvus', search_conf)
# generate embedding
embedding = text_embedding('A running dog').get()[0]
# search embedding and get results in Milvus
search_pipe(embedding).get_dict()
You can also set host
and port
parameters for Milvus, and if you are a Cloud user, there are also user
and password
parameters, refer to MilvusSearchConfig Interface.
name: str
The name of the built-in pipeline, such as 'text_image_embedding'
, insert_milvus
and 'search_milvus'
.
config: REGISTERED_CONFIG
AutoConfig is registered with the pipeline name, which defaults to AutoConfig.load_config(name)
, such as if the name
is text_image_embedding
and config
defaults to AutoConfig.load_config('text_image_embedding')
.
model: str
The model name in the sentence embedding pipeline, defaults to 'clip_vit_base_patch16'
. You can refer to the above Model(s) list to set the model.
modality
The modality for the text_image multimodal, defaults to 'image'
, and you can also set to 'text'
.
normalize_vec: bool
Whether to normalize the embedding vectors, defaults to True.
customize_embedding_op: str
The name of the customize embedding operator, defaults to None.
device
device: int
The number of devices, defaults to -1
, which means using the CPU.
If the setting is not -1
, the specified GPU device will be used.
And you can also set the above parameters for the text image embedding, for example, you can set model
to 'clip_vit_base_patch32'
with AutoConfig
, and set device
to GPU0
:
from towhee import AutoPipes, AutoConfig
config = AutoConfig.load_config('text_image_embedding')
config.model = 'clip_vit_base_patch32'
config.device = 0
image_embedding = AutoPipes.pipeline('text_image_embedding', config=config)
embedding = image_embedding('./test.png').get()
The code AutoConfig.load_config('insert_milvus')
will return an auto-set MilvusInsertConfig
object that automatically configures some parameters of the insert Milvus pipeline:
host: str
Host of Milvus vector database, default is '127.0.0.1'
.
port: str
Port of Milvus vector database, default is '19530'
.
collection_name: str
The collection name for Milvus vector database, is required when inserting data into Milvus.
user: str
The user name for Cloud user, defaults to None
.
password: str
The user password for Cloud user, defaults to None
.
The code AutoConfig.load_config('search_milvus')
will return an auto-set MilvusSearchConfig
object that automatically configures some parameters of search Milvus pipeline:
host: str
Host of Milvus vector database, default is '127.0.0.1'
.
port: str
Port of Milvus vector database, default is '19530'
.
collection_name: str
The collection name for Milvus vector database, is required when inserting data into Milvus.
search_param: dict
The search parameter for Milvus vector database, defaults to None, more details can refer to it.
user: str
The user name for Cloud user, defaults to None
.
password: str
The user password for Cloud user, defaults to None
.