Video compression systems use prediction to reduce redundancies present in video sequences along the temporal and spatial dimensions. Standard video coding systems use either temporal or spatial prediction on a per block basis. If temporal prediction is used, spatial information is ignored. If spatial prediction is used, temporal information is ignored. This may be a computationally efficient approach, but it does not effectively combine temporal and spatial information. In this letter, we provide a framework where available temporal and spatial information can be combined effectively to perform joint spatial and temporal prediction in video coding. Experimental results obtained from one sample realization of this framework show its potential.