In order to extract detailed information about the environment, multimedia sensor networks are becoming popular nowadays. However, due to unique properties of multimedia data delivery, we face novel challenges for resource-constrained sensor networks. Because of the high bandwidth demands of multimedia frames, the transmission of raw data collected at sensor nodes is costly. On the other hand, processing limitations prohibit the use of sophisticated multimedia processing at individual nodes to reduce the amount of data that needs to be communicated. In this paper, we present a novel framework for multimedia processing in wireless sensor networks considering needs of surveillance video applications. In our framework, automatically extracted moving objects are treated as intruder's events and their positions are exploited for efficient communication. We then apply joint processing of collected data at the sink to identify events using fuzzy memberships and decide the actual multimedia data to be sent from sensors to sink.