Wireless Multimedia Sensor Networks (WMSNs) are characterized by large number of resource constrained camera sensors. In remote surveillance applications, such resource constraints necessitate the design of lightweight solutions for traditional problems such as object localization and real-time tracking. In this paper, we propose an energy-efficient object localization and multiple object tracking scheme for WMSNs. The object localization is performed by individual camera sensors. For that purpose, the approach first extracts the detected object from the video frame and finds its boundary using frame differencing. The location of the object is then estimated with the help of the camera sensor's location, distance of the object to the camera and camera/frame size properties. After localizing a detected object, its boundary information is used to perform a fuzzy object classification at the camera sensor. Finally, limited information containing the location and classification of the object are transmitted to the sink node to be used in real-time object tracking. In this way, without receiving the raw video data from the camera sensors, the sink can identify a specific object even though its information may come from several camera sensors at different times, and determine its path for the purpose of tracking. Via extensive simulations, the proposed approach has been shown to provide higher localization and identification accuracies with little cost on camera sensors even under low camera coverage.