Web Page Structure

 

To search Web pages effectively, it helps to have a basic understanding of their structure. Web pages have some similarities to written documents: they will have a title, a publication date, and sections of text. In addition, a Web page will have a document address (the URL) and hyperlinks to other documents. These parts of the Web document are referred to as fields. Many search engines capture this information.

Title is by far the most useful field. The title is contained within tags that web page authors use. In HTML this will look like <title> This is the title </title>. You can see the title of a page by looking to the top of the browser above the toolbar and menu items.

In the image below, the title is Field Searching

The title of the page you are reading at this moment is Field Searching.

Most authors of Web pages give a short descriptive title to each page. We can narrow our search to just those words to locate pages that will have a higher probability of being relevant. We gain a lot in precision, though we will lose recall and may miss some perfect documents.

URL or the web address may also contain clues about the contents and purpose of the page. The domain will always indicate company, institution, or person. Other parts of the URL will hint at purpose, such as archives, documents, library, info - and many other popular naming conventions.


Next: Title searching.