The performance of object recognition and classification on remote sensing imagery is highly dependent on the quality of extracted features, amount of labelled data and the priors defined for contextual models. In this study, we examine the representation learning opportunities for remote sensing. First we attacked localization of contextual cues for complex object detection using disentangling factors learnt from a small amount of labelled data. The complex object, which consists of several sub-parts is further represented under the Conditional Markov Random Fields framework. As a second task, end-to-end target detection using convolutional sparse auto-encoders (CSA) using large amount of unlabelled data is analysed. Proposed methodologies are tested on complex airfield detection problem using Conditional Random Fields and recognition of dispersal areas, park areas, taxiroutes, airplanes using CSA. The method is also tested on the detection of the dry docks in harbours. Performance of the proposed method is compared with standard feature engineering methods and found competitive with currently used rule-based and supervised methods.