An event model learning framework is proposed for indoor and outdoor surveillance applications in order to decrease human intervention in the modeling process. The resulting framework makes event detection and recognition flexible, domain and scene independent. A set of predicate types is introduced which define basic spatio-temporal relations and interactions between objects and people in the videos. A set of policies to choose the appropriate predicates is proposed for the event learning process. First, the video data is converted to a set of Markov Logic Network (MLN) predicates. Then, these policies, together with the discriminative weight learning algorithm, are used to infer the relevance of the predicates to the events being queried. Finally, the event model is generated. The proposed framework is applied to the generation of three different event models from CANTATA and our datasets. In particular, model generation for left object event is discussed in detail.