Biclustering groups samples and features simultaneously in the given set of data. When biclusters are obtained from the data, clusters of samples and clusters of features that determine the partitioning of samples into the underlying clusters are also obtained. We focus on a supervised biclustering problem leading to unsupervised feature selection. We formulate this problem as an optimization model which aims to maximize classification accuracy by selecting a small subset of features. We solve the model with exact and inexact solution methods based on optimization techniques. Microarray cancer datasets are used to experiment our approach.