The use of wireless multimedia sensor networks (WMSNs) for surveillance applications has attracted the interest of many researchers. As with traditional sensor networks, it is easy to deploy and operate WMSNs. With inclusion of multimedia devices in wireless sensor networks, it is possible to provide data to users that is more meaningful than that provided by scalar sensor-based systems alone; however, producing, storing, processing, analyzing, and transmitting multimedia data in sensor networks requires consideration of additional constraints, including energy, processing power, storage capacity, and communication. Furthermore, as multimedia sensors produce much more data than scalar sensors, more manpower is required to analyze multimedia data. To overcome these constraints and challenges, this paper aimed to propose a system architecture and a set of procedures for WMSNs that facilitate automatic classification of moving objects using scalar and multimedia sensors. Methods and standards for detecting and classifying a moving object, as well as transmission of the results, are described in detail. The hardware for each sensor node includes a built-in camera, a passive infrared motion sensor, a vibration sensor, and an acoustic sensor. An application using our proposed methods was developed and embedded in the multimedia sensor node. In addition, a sink station was set up and the data produced by the sensor network was collected by this server. The classification performance of the application was tested using video recorded by the sensor node. The effect of the proposed methods on power consumption was also tested and measured. The experimental results show that the proposed approach is sufficiently lightweight to be used for real-world surveillance applications.