In the aftermath of the 26 December, 2004 tsunami, several quantitative predictions of inundation for historic events were presented at international meetings differing substantially from the corresponding well-established paleotsunami measurements. These significant differences attracted press attention, reducing the credibility of all inundation modeling efforts. Without exception, the predictions were made using models that had not been benchmarked. Since an increasing number of nations are now developing tsunami mitigation plans, it is essential that all numerical models used in emergency planning be subjected to validation-the process of ensuring that the model accurately solves the parent equations of motion-and verification-the process of ensuring that the model represents geophysical reality. Here, we discuss analytical, laboratory, and field benchmark tests with which tsunami numerical models can be validated and verified. This is a continuous process; even proven models must be subjected to additional testing as new knowledge and data are acquired. To date, only a few existing numerical models have met current standards, and these models remain the only choice for use for real-world forecasts, whether short-term or long-term. Short-term forecasts involve data assimilation to improve forecast system robustness and this requires additional benchmarks, also discussed here. This painstaking process may appear onerous, but it is the only defensible methodology when human lives are at stake. Model standards and procedures as described here have been adopted for implementation in the U.S. tsunami forecasting system under development by the National Oceanic and Atmospheric Administration, they are being adopted by the Nuclear Regulatory Commission of the U.S. and by the appropriate subcommittees of the Intergovernmental Oceanographic Commission of UNESCO.