The advance growth of cybercrime in recent years especially in high critical networks becomes an urgent issue to the security authorities. They compromised computer system, targeting especially to government sector, ecommerce and banking networks rigorously and made it difficult to detect the perpetrators. Attackers used a powerful technique, by embedding a malicious code in a normal webpage that resulted harder detection. Early detection and act on such threats in a timely manners is vital in order to reduce the losses which have caused billions of dollars every year. Previously, the detection of malicious is done through the use of blacklisting repository. The repository or database was compiled over time through crowd sourcing solution (e.g.: PishTank, Zeus Tracker Blacklist, StopBadWare.. etc.). However, such technique cannot be exhaustive and unable to detect newly generated malicious URL or zero-day exploit. Therefore, this paper aims to provide a comprehensive survey and detailed understanding of malicious code and URL features which have been extracted from the web content and structures of the websites. We studied the characteristic of malicious webpage systematically and syntactically and present the most important features of malicious threats in web pages. Each category will be presented along with different dimensions (features representation, algorithm design, etc.).