MelMage

See What Your Audio CNN Actually Hears

Real-time audio classification with complete neural network transparency. Understand deep learning through interactive visualizations and explainable AI.

Audio Waveform Visualization
Live
83.75%
Accuracy
50
Sound Classes
Real-time
Processing

Powerful Features for AI Transparency

MelMage combines cutting-edge deep learning with explainable AI techniques, making neural network behavior accessible to everyone.

🧠

Deep Audio CNN

ResNet-style architecture achieving 83.75% accuracy on ESC-50 dataset with advanced mel-spectrogram processing.

👁️

Layer Visualization

See feature maps from every convolutional layer - understand how your model learns audio patterns.

🎵

Real-time Classification

Upload a WAV file and get top-3 predictions within 100ms using a serverless GPU-powered inference pipeline.

Serverless Infrastructure

Built on Modal Labs for zero-cost scaling with GPU acceleration when needed.

📊

Complete Transparency

From raw waveform → Mel-Spectrogram → CNN feature maps → Final prediction. Visualized via a custom React dashboard.

🎨

Interactive Dashboard

Built with Next.js + Tailwind CSS. Displays waveform, spectrogram, and per-layer feature maps to explore how audio signals transform across layers.

100
Training Epochs
2,000
Audio Samples
<100ms
Inference Time
Open
Source Code

How MelMage Works

Three simple steps to understand your audio CNN completely

1
📁

Upload Audio

Drag & drop any WAV file or record directly in your browser. Supports various audio formats and sample rates.

Built-in audio preprocessing and validation

2
🧠

AI Processing

CNN analyzes Mel-spectrograms through multiple ResNet layers on serverless GPU infrastructure.

Real-time feature extraction and classification

3
👁️

Visualize Results

See predictions alongside internal feature maps from every layer of the neural network.

Complete transparency from input to output

Watch MelMage in Action

See the complete workflow from audio upload to neural network visualization

Model Internals

Custom ResNet-inspired architecture built from scratch in PyTorch. Every layer designed for optimal audio feature extraction.

3-4-6-3
Residual Blocks
Layer distribution (like ResNet34)
~17
Conv Layers
Total convolutional operations. Each residual block contains 2 Conv2d layers
~23M
Parameters
Trainable model weights
5 depths
Feature Maps
Captured at conv1, layer1-4

Architecture Details

Input Processing
7x7 conv, stride 2, 64 filters → MaxPool
Residual Layers
Layer1: 64 channels, Layer2: 128, Layer3: 256, Layer4: 512 (with downsampling at start of each layer except Layer1)
Global Pooling
AdaptiveAvgPool2d → Flatten → Dropout(0.5)
Classifier
Linear(512 → 50 classes)

Training Specifications

Optimizer
AdamW with weight decay 0.01
Learning Rate
OneCycleLR: 0.0005 → 0.002 → 0.0005
Regularization
Dropout(0.5) + Label Smoothing(0.1) + Mixup
Data Augmentation
FrequencyMasking + TimeMasking + Mixup

Built with Cutting-Edge Tech

MelMage leverages the latest in machine learning, serverless infrastructure, and modern web development to deliver a world-class experience.

Machine Learning

PyTorch
Deep learning framework
ResNet Architecture
Convolutional neural network
Mel-Spectrograms
Audio feature extraction
Tensorboard
Model visualization

Infrastructure

Modal Labs
Serverless GPU compute
FastAPI
High-performance API

Frontend

Next.js
React framework
React
UI library
Tailwind CSS
Styling framework
TypeScript
Type safety

Data & Training

ESC-50 Dataset
Environmental sound classification
Audio Augmentation
Data enhancement techniques
Cross-validation
Model validation
Mel-frequency Analysis
Audio preprocessing

Ready to Explore AI Transparency?

Start understanding how your audio CNN actually works. Upload an audio file and see every layer in action.