The document describes a saliency-based video object extraction (VOE) framework that automatically identifies foreground objects in videos without requiring user interaction or training data. It employs visual and motion saliency to differentiate between foreground and background, utilizing a conditional random field to combine these features. The proposed method has shown promising results in maintaining spatial continuity and temporal consistency across various video types.