This document proposes a Pivot Vector Space Approach for automatic audio-video mixing. It uses high-level perceptual descriptors of audio and video characteristics to pick the best audio clip to mix with a given video shot. Aesthetic features of videos and music are mapped to a pivot vector space to match segments based on cinematographic heuristics. This technique provides amateur video editors a way to automatically add professionally chosen music to video shots.