We propose a method for computing disparity maps from a multi-modal stereo-vision system composed of an infrared-visible camera pair. The method uses mutual information (MI) as the basic similarity measure where a segment-based adaptive windowing mechanism is proposed along with a novel MI computation surface with joint prior probabilities incorporated. The computed cost confidences are aggregated using a novel adaptive cost aggregation method, and the resultant minimum cost disparities in segments are plane-fitted in their respective segments which are iteratively refined by merging and splitting segments reducing dependency to initial segmentation. Finally, the estimated disparities are iteratively refined by repeating all the steps. On an artificially-modified version of the Middlebury dataset and a Kinect dataset that we created in this study, we show that (i) our proposal improves the quality of existing MI formulation, and (ii) our method can provide depth comparable to the quality of Kinect depth data. (C) 2014 Elsevier Inc. All rights reserved.