An algorithm is proposed for automatic summarization of multimedia content by segmenting digital video into semantic scenes using HMMs. Various multi-modal low-level features are extracted to determine state transitions in HMMs for summarization. Advantage of using different model topologies and observation sets in order to segment different content types is emphasized and verified by simulations. Performance of the proposed algorithm is also compared with a deterministic scene segmentation method. A better performance is observed due to the flexibility of HMMs in modeling different content types.