1) The document discusses various Watson capabilities including microservices for language, speech, vision, and data, as well as embodied cognition. It provides examples of use cases demonstrating speech to text with multiple speakers, a school navigator chatbot, expertise finder, and multimedia enrichment.
2) Live demos are shown for speech to text with diarization, speech to speech translation, a school finder application, and multimedia processing of video and audio.
3) The multimedia enrichment pipeline is described in detail, outlining how video and audio inputs are processed using various Watson APIs to extract metadata like transcripts, entities, keywords, and visual recognition results.