towhee
copied
Readme
Files and versions
Updated 1 year ago
towhee
VGGish Embedding Operator (Pytorch)
Authors: Jael Gu
Overview
This operator uses reads the waveform of an audio file and then applies VGGish to extract features. The original VGGish model is built on top of Tensorflow.[1] This operator converts VGGish into Pytorch. It generates a set of vectors given an input. Each vector represents features of a non-overlapping clip with a fixed length of 0.96s and each clip is composed of 64 mel bands and 96 frames. The model is pre-trained with a large scale of audio dataset AudioSet. As suggested, this model is suitable to extract features at high level or warm up a larger model.
Interface
__call__(self, datas: List[NamedTuple('data', [('audio', 'ndarray'), ('sample_rate', 'int')])])
Args:
- datas:
- a named tuple including audio data in numpy.ndarray and sample rate in integer
 
 
Returns:
The Operator returns a tuple Tuple[('embs', numpy.ndarray)] containing following fields:
- vec:
- embeddings of the audio
 - data type: 
numpy.ndarray - shape: (num_clips, 128)
 
 
Requirements
You can get the required python package by requirements.txt.
How it works
The towhee/torch-vggish Operator implements the function of audio embedding, which can be added to a towhee pipeline. For example, it is the key operator of the pipeline audio-embedding-vggish.
Reference
[1]. https://github.com/tensorflow/models/tree/master/research/audioset/vggish [2]. https://tfhub.dev/google/vggish/1
More Resources
- What is a Transformer Model? An Engineer's Guide: A transformer model is a neural network architecture. It's proficient in converting a particular type of input into a distinct output. Its core strength lies in its ability to handle inputs and outputs of different sequence length. It does this through encoding the input into a matrix with predefined dimensions and then combining that with another attention matrix to decode. This transformation unfolds through a sequence of collaborative layers, which deconstruct words into their corresponding numerical representations. At its heart, a transformer model is a bridge between disparate linguistic structures, employing sophisticated neural network configurations to decode and manipulate human language input. An example of a transformer model is GPT-3, which ingests human language and generates text output.
 - Comparing Different Vector Embeddings - Zilliz blog: Learn about the difference in vector embeddings between models and how to use multiple collections of vector data in one Jupyter Notebook.
 - How to Get the Right Vector Embeddings - Zilliz blog: A comprehensive introduction to vector embeddings and how to generate them with popular open-source models.
 - Exploring OpenAI CLIP: The Future of Multi-Modal AI Learning - Zilliz blog: Multimodal AI learning can get input and understand information from various modalities like text, images, and audio together, leading to a deeper understanding of the world. Learn more about OpenAI's CLIP (Contrastive Language-Image Pre-training), a popular multimodal model for text and image data.
 - Audio Retrieval Based on Milvus - Zilliz blog: Create an audio retrieval system using Milvus, an open-source vector database. Classify and analyze sound data in real time.
 - Sparse and Dense Embeddings: A Guide for Effective Information Retrieval with Milvus | Zilliz Webinar: Zilliz webinar covering what sparse and dense embeddings are and when you'd want to use one over the other.
 - Sparse and Dense Embeddings: A Guide for Effective Information Retrieval with Milvus | Zilliz Webinar: Zilliz webinar covering what sparse and dense embeddings are and when you'd want to use one over the other.
 - Zilliz partnership with PyTorch - View image search solution tutorial: Zilliz partnership with PyTorch
 
| 
              
                 | 24 Commits | ||
|---|---|---|---|
| 
                
                  
                    
                       | 
              
                
                  
                    
											 
												48 B
											 
                      
                         | 
              
              
              
              4 years ago | |
| 
                
                  
                    
                       | 
              
                
                  
                    
											 
												4.3 KiB
											 
                      
                         | 
              
              
              
              1 year ago | |
| 
                
                  
                    
                       | 
              
                
                  
                    
											 
												682 B
											 
                      
                         | 
              
              
              
              4 years ago | |
| 
                
                  
                    
                       | 
              
                
                  
                    
											 
												9.6 KiB
											 
                      
                         | 
              
              
              
              4 years ago | |
| 
                
                  
                    
                       | 
              
                
                  
                    
											 
												70 B
											 
                      
                         | 
              
              
              
              4 years ago | |
| 
                
                  
                    
                       | 
              
                
                  
                    
											 
												3.3 KiB
											 
                      
                         | 
              
              
              
              3 years ago | |
| 
                
                  
                    
                       | 
              
                
                  
                    
											 
												275 MiB
											 
                      
                         | 
              
              
              
              4 years ago | |
| 
                
                  
                    
                       | 
              
                
                  
                    
											 
												3.0 KiB
											 
                      
                         | 
              
              
              
              3 years ago | |
| 
                
                  
                    
                       | 
              
                
                  
                    
											 
												2.0 KiB
											 
                      
                         | 
              
              
              
              4 years ago | |