One of the major drawbacks of brain decoding from the functional magnetic resonance images (fMRI) is the very high dimension of feature space which consists of thousands of voxels in sequence of brain volumes, recorded during a cognitive stimulus. In this study, we propose a new architecture, called Sparse Temporal Mesh Model (STMM), which reduces the dimension of the feature space by combining the voxel selection methods with the mesh learning method. We, first, select the "most discriminative" voxels using the state-of-the-art feature selection methods, namely, Recursive Feature Elimination (RFE), one way Analysis of Variance (ANOVA) and Mutual Information (MI). After we select the most informative voxels, we form a star mesh around each selected voxel with their functional neighbors. Then, we estimate the mesh arc weights, which represent the relationship among the voxels within a neighborhood. We further prune the estimated arc weights using ANOVA to get rid of redundant relationships among the voxels. By doing so, we obtain a sparse representation of information in the brain to discriminate cognitive states. Finally, we train k-Nearest Neighbor (kNN) and Support Vector Machine (SVM) classifiers by the feature vectors of sparse mesh arc weights. We test STMM architecture on a visual object recognition experiment. Our results show that forming meshes around the selected voxels leads to a substantial increase in the classification accuracy, compared to forming meshes around all the voxels in the brain. Furthermore, pruning the mesh arc weights by ANOVA solves the dimensionality curse problem and leads to a slight increase in the classification performance. We also discover that, the resulting network of sparse temporal meshes are quite similar in all three voxel selection methods, namely, RFE, ANOVA or MI.