Web Content Mining

As the research stresses search methods along with their underlying methodologies and supporting implementations have presented us with a great variation in structure. While search engines adopt the least structured methods, portals and directories attempt to adopt a fully structured taxonomy based approach. Wikis have evolved as a semi-structured approach that gained popularity as an effective means of topic-based information retrieval.
According to the paper findings web pages (the information resources or documents) can also be searched using more advanced methods that involve the use of Boolean expressions to involve inclusions and/or exclusions, as well as to indicate required terms to appear in results. Not all search engines accept identical means of advanced search. Typically, most users repeatedly try simple keywords until they get an interesting set of documents. No matter how sophisticated the search algorithms are, the searchers’ sifting through text within billions of indexed pages is a highly undeterministic process. There are also commercial factors involved in the relative ranking of pages. Many search engine optimization (SEO) firms now specialize in maintaining good rank in search results for their clients’ sites. It is no simple process for a site or a page to be submitted to search engine that it would be found. SEO has become a discipline that involves statistical, marketing, and financial aspects.