In the field of vision based robot actuation, in order to manipulate objects in an environment, background separation and object selection a re fundamental tasks that should be carried out in a fast and efficient way. In this paper, we propose a method to segment possible object locations in the scene and recognize them via local-point based representation. Exploiting the resulting 3D structure of the scene via a time-of-flight camera, background regions are eliminated with the assumption that the objects are placed on planar surfaces. Next, object recognition is performed using scale invariant features in the captured high resolution images via standard camera. The preliminary experimental results show that the proposed system gives promising results for background segmentation and object recognition, especially for the service robot environments, which could also be utilized as a pre-processing step in path planning and 3D scene map generation.