In this study, a new building detection framework for monocular satellite images, called self-supervised decision fusion (SSDF) is proposed. The model is based on the idea of self-supervision, which aims to generate training data automatically from each individual test image, without human interaction. This approach allows us to use the advantages of the supervised classifiers in a fully automated framework. We combine our previous supervised and unsupervised building detection frameworks to suggest a self-supervised learning architecture. Hence, we borrow the major strength of the unsupervised approach to obtain one of the most important clues, the relation of a building, and its cast shadow. This important information is, then, used in order to satisfy the requirement of training sample selection. Finally, an ensemble learning algorithm, called fuzzy stacked generalization (FSG), fuses a set of supervised classifiers trained on the automatically generated dataset with various shape, color, and texture features. We assessed the building detection performance of the proposed approach over 19 test sites and compare our results with the state of the art algorithms. Our experiments show that the supervised building detection method requires more than 30% of the ground truth (GT) training data to reach the performance of the proposed SSDF method. Furthermore, the SSDF method increases the F-score by 2 percentage points (p.p.) on the average compared to performance of the unsupervised method.