People can benefit from today's video archives of huge sizes only through appropriate and effective ways of querying the video data. In order to query the video data through high-level semantic entities such as objects, events, and relations, these entities should be properly extracted and the corresponding video shots should be annotated accordingly. Video texts, which comprise the caption texts on the frames as well as transcription texts obtained through automatic speech recognition techniques, are valuable sources of information for semantic modeling of the videos. In this paper, we present an approach for the extraction of semantic objects from videos by utilizing lexical resources along with the identification of coreference chains in the corresponding video texts. Coreference is a phenomenon in natural language texts where a number of entities in the text refer to the same real world entity. Therefore, while the domain-specific lexical resources aid in the determination of salient entities in the video text, the identification of coreference chains prevents the superfluous extraction of the same underlying entities due to their different surface forms in the video texts. The proposed approach is significant for its being the first attempt to address the importance of coreference phenomenon in video texts for precise entity extraction during the semantic modeling of news videos with a hands-on application. The approach has been evaluated on Turkish political news texts from the METU Turkish corpus and a number of evaluation problems faced such as sparseness of annotated evaluation data for Turkish are also pointed out together with further research directions to pursue.