In this paper, a structural and event based multimodal video data model (SEBM) is proposed. SEBM supports three different modalities that are visual, auditory and textual modalities for video database systems and it can dissolve these three modalities within a single structure. This dissolving procedure is a mimic of human interpretation regarding video data. The SEBM video data model is used to answer content-based, spatio-temporal and fuzzy queries about video data. A SEBM prototype system is developed to evaluate the practical usage of the SEBM video data model when storing and querying the video data.