VGGish Embedding Operator (Pytorch)

Authors: Jael Gu

Overview

This operator uses reads the waveform of an audio file and then applies VGGish to extract features. The original VGGish model is built on top of Tensorflow.[1] This operator converts VGGish into Pytorch. It generates a set of vectors given an input. Each vector represents features of a non-overlapping clip with a fixed length of 0.96s and each clip is composed of 64 mel bands and 96 frames. The model is pre-trained with a large scale of audio dataset AudioSet. As suggested, this model is suitable to extract features at high level or warm up a larger model.

Interface

__call__(self, audio_path: str)

Args:

audio_path:
- the input audio path
- supported types: str

Returns:

The Operator returns a tuple Tuple[('embs', numpy.ndarray)] containing following fields:

embs:
- embeddings of the audio
- data type: numpy.ndarray
- shape: (num_clips,128)

Requirements

You can get the required python package by requirements.txt.

How it works

The towhee/torch-vggish Operator implements the function of audio embedding, which can be added to a towhee pipeline. For example, it is the key operator of the pipeline audio-embedding-vggish.

Reference

[1]. https://github.com/tensorflow/models/tree/master/research/audioset/vggish [2]. https://tfhub.dev/google/vggish/1

Jael Gu c374bd682d Update Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>			6 Commits
pytorch		Track vggish with .gitattributes	5 years ago
.gitattributes	48 B	Update	5 years ago
README.md	1.5 KiB	Fix typo	5 years ago
__init__.py	592 B	Add files	5 years ago
requirements.txt	36 B	Add files	5 years ago
torch_vggish.py	1.8 KiB	Add files	5 years ago