torch-vggish/README.md

# VGGish Embedding Operator (Pytorch)

Authors: Jael Gu

## Overview

This operator uses reads the waveform of an audio file and then applies VGGish to extract features. The original VGGish model is built on top of Tensorflow.[1] This operator converts VGGish into **Pytorch**. It generates a set of vectors given an input. Each vector represents features of a non-overlapping clip with a fixed length of 0.96s and each clip is composed of 64 mel bands and 96 frames. The model is pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset). As suggested, this model is suitable to extract features at high level or warm up a larger model.

## Interface

```python
__call__(self, audio_path: str)
```

**Args:**

- audio_path:
  - the input audio path
  - supported types: str

**Returns:**

The Operator returns a tuple Tuple[('embs', numpy.ndarray)] containing following fields:

- embs:
  - embeddings of the audio
  - data type: `numpy.ndarray`
  - shape: (num_clips,128)

## Requirements

You can get the required python package by [requirements.txt](./requirements.txt).

## How it works

The `towhee/torch-vggish` Operator implements the function of audio embedding, which can be added to a towhee pipeline. For example, it is the key operator of the pipeline [audio-embedding-vggish](https://hub.towhee.io/towhee/audio-embedding-vggish).

## Reference

[1]. https://github.com/tensorflow/models/tree/master/research/audioset/vggish
[2]. https://tfhub.dev/google/vggish/1
Add files Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`# VGGish Embedding Operator (Pytorch)`
Initial commit 4 years ago
Add files Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`Authors: Jael Gu`

			`## Overview`

			This operator uses reads the waveform of an audio file and then applies VGGish to extract features. The original VGGish model is built on top of Tensorflow.[1] This operator converts VGGish into Pytorch. It generates a set of vectors given an input. Each vector represents features of a non-overlapping clip with a fixed length of 0.96s and each clip is composed of 64 mel bands and 96 frames. The model is pre-trained with a large scale of audio dataset [AudioSet](https://research.google.com/audioset). As suggested, this model is suitable to extract features at high level or warm up a larger model.

			`## Interface`

			```python
Fix typo Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`__call__(self, audio_path: str)`
Add files Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			```

			`Args:`

Fix typo Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`- audio_path:`
Add files Signed-off-by: Jael Gu <mengjia.gu@zilliz.com> 4 years ago			`- the input audio path`
			`- supported types: str`

			`Returns:`

			`The Operator returns a tuple Tuple[('embs', numpy.ndarray)] containing following fields:`

			`- embs:`
			`- embeddings of the audio`
			- data type: `numpy.ndarray`
			`- shape: (num_clips,128)`

			`## Requirements`

			`You can get the required python package by [requirements.txt](./requirements.txt).`

			`## How it works`

			The `towhee/torch-vggish` Operator implements the function of audio embedding, which can be added to a towhee pipeline. For example, it is the key operator of the pipeline [audio-embedding-vggish](https://hub.towhee.io/towhee/audio-embedding-vggish).

			`## Reference`

			`[1]. https://github.com/tensorflow/models/tree/master/research/audioset/vggish`
			`[2]. https://tfhub.dev/google/vggish/1`