MelMage

See What Your Audio CNN Actually Hears

Real-time audio classification with complete neural network transparency. Understand deep learning through interactive visualizations and explainable AI.

Audio Waveform Visualization

Live

83.75%

Accuracy

Sound Classes

Real-time

Processing

Powerful Features for AI Transparency

MelMage combines cutting-edge deep learning with explainable AI techniques, making neural network behavior accessible to everyone.

🧠

Deep Audio CNN

ResNet-style architecture achieving 83.75% accuracy on ESC-50 dataset with advanced mel-spectrogram processing.

👁️

Layer Visualization

See feature maps from every convolutional layer - understand how your model learns audio patterns.

🎵

Real-time Classification

Upload a WAV file and get top-3 predictions within 100ms using a serverless GPU-powered inference pipeline.

⚡

Serverless Infrastructure

Built on Modal Labs for zero-cost scaling with GPU acceleration when needed.

📊

Complete Transparency

From raw waveform → Mel-Spectrogram → CNN feature maps → Final prediction. Visualized via a custom React dashboard.

🎨

Interactive Dashboard

Built with Next.js + Tailwind CSS. Displays waveform, spectrogram, and per-layer feature maps to explore how audio signals transform across layers.

100

Training Epochs

2,000

Audio Samples

<100ms

Inference Time

Open

Source Code

How MelMage Works

Three simple steps to understand your audio CNN completely

📁

Upload Audio

Drag & drop any WAV file or record directly in your browser. Supports various audio formats and sample rates.

Built-in audio preprocessing and validation

🧠

AI Processing

CNN analyzes Mel-spectrograms through multiple ResNet layers on serverless GPU infrastructure.

Real-time feature extraction and classification

👁️

Visualize Results

See predictions alongside internal feature maps from every layer of the neural network.

Complete transparency from input to output

Watch MelMage in Action

See the complete workflow from audio upload to neural network visualization

Model Internals

Custom ResNet-inspired architecture built from scratch in PyTorch. Every layer designed for optimal audio feature extraction.

3-4-6-3

Residual Blocks

Layer distribution (like ResNet34)

~17

Conv Layers

Total convolutional operations. Each residual block contains 2 Conv2d layers

~23M

Parameters

Trainable model weights

5 depths

Feature Maps

Captured at conv1, layer1-4

Architecture Details

Input Processing

7x7 conv, stride 2, 64 filters → MaxPool

Residual Layers

Layer1: 64 channels, Layer2: 128, Layer3: 256, Layer4: 512 (with downsampling at start of each layer except Layer1)

Global Pooling

AdaptiveAvgPool2d → Flatten → Dropout(0.5)

Classifier

Linear(512 → 50 classes)

Training Specifications

Optimizer

AdamW with weight decay 0.01

Learning Rate

OneCycleLR: 0.0005 → 0.002 → 0.0005

Regularization

Dropout(0.5) + Label Smoothing(0.1) + Mixup

Data Augmentation

FrequencyMasking + TimeMasking + Mixup

Built with Cutting-Edge Tech

MelMage leverages the latest in machine learning, serverless infrastructure, and modern web development to deliver a world-class experience.

Machine Learning

PyTorch

Deep learning framework

ResNet Architecture

Convolutional neural network

Mel-Spectrograms

Audio feature extraction

Tensorboard

Model visualization

Infrastructure

Modal Labs

Serverless GPU compute

FastAPI

High-performance API

Frontend

Next.js

React framework

React

UI library

Tailwind CSS

Styling framework

TypeScript

Type safety

Data & Training

ESC-50 Dataset

Environmental sound classification

Audio Augmentation

Data enhancement techniques

Cross-validation

Model validation

Mel-frequency Analysis

Audio preprocessing

Ready to Explore AI Transparency?

Start understanding how your audio CNN actually works. Upload an audio file and see every layer in action.