towhee
copied
Readme
Files and versions
Updated 2 years ago
towhee
VGGish Embedding Operator (Pytorch)
Authors: Jael Gu
Overview
This operator uses reads the waveform of an audio file and then applies VGGish to extract features. The original VGGish model is built on top of Tensorflow.[1] This operator converts VGGish into Pytorch. It generates a set of vectors given an input. Each vector represents features of a non-overlapping clip with a fixed length of 0.96s and each clip is composed of 64 mel bands and 96 frames. The model is pre-trained with a large scale of audio dataset AudioSet. As suggested, this model is suitable to extract features at high level or warm up a larger model.
Interface
__call__(self, datas: List[NamedTuple('data', [('audio', 'ndarray'), ('sample_rate', 'int')])])
Args:
- datas:
- a named tuple including audio data in numpy.ndarray and sample rate in integer
Returns:
The Operator returns a tuple Tuple[('embs', numpy.ndarray)] containing following fields:
- vec:
- embeddings of the audio
- data type:
numpy.ndarray
- shape: (num_clips, 128)
Requirements
You can get the required python package by requirements.txt.
How it works
The towhee/torch-vggish
Operator implements the function of audio embedding, which can be added to a towhee pipeline. For example, it is the key operator of the pipeline audio-embedding-vggish.
Reference
[1]. https://github.com/tensorflow/models/tree/master/research/audioset/vggish [2]. https://tfhub.dev/google/vggish/1
Jael Gu
7b9825108e
| 22 Commits | ||
---|---|---|---|
.gitattributes |
48 B
|
2 years ago | |
README.md |
1.6 KiB
|
2 years ago | |
__init__.py |
682 B
|
2 years ago | |
mel_features.py |
9.6 KiB
|
2 years ago | |
requirements.txt |
70 B
|
2 years ago | |
torch_vggish.py |
3.3 KiB
|
2 years ago | |
vggish.pth |
275 MiB
|
2 years ago | |
vggish_input.py |
3.0 KiB
|
2 years ago | |
vggish_params.py |
2.0 KiB
|
2 years ago |