# VGGish Embedding Operator (Pytorch) Authors: Jael Gu ## Overview This operator uses reads the waveform of an audio file and then applies VGGish to extract features. The original VGGish model is built on top of Tensorflow.[1] This operator converts VGGish into **Pytorch**. It generates a set of vectors given an input. Each vector represents features of a non-overlapping clip with a fixed length of 0.96s and each clip is composed of 64 mel bands and 96 frames. The model is pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset). As suggested, this model is suitable to extract features at high level or warm up a larger model. ## Interface ```python __call__(self, filepath: str) ``` **Args:** - filepath: - the input audio path - supported types: str **Returns:** The Operator returns a tuple Tuple[('embs', numpy.ndarray)] containing following fields: - embs: - embeddings of the audio - data type: `numpy.ndarray` - shape: (num_clips,128) ## Requirements You can get the required python package by [requirements.txt](./requirements.txt). ## How it works The `towhee/torch-vggish` Operator implements the function of audio embedding, which can be added to a towhee pipeline. For example, it is the key operator of the pipeline [audio-embedding-vggish](https://hub.towhee.io/towhee/audio-embedding-vggish). ## Reference [1]. https://github.com/tensorflow/models/tree/master/research/audioset/vggish [2]. https://tfhub.dev/google/vggish/1