Pages

Thursday, August 30, 2012

Basics of HTML – XML – XHTML


What is HTML?
HTML is a computer language devised to allow website creation. These websites can then be viewed by anyone else connected to the Internet. It is relatively easy to learn, with the basics being accessible to most people in one sitting; and quite powerful in what it allows you to create. It is constantly undergoing revision and evolution to meet the demands and requirements of the growing Internet audience under the direction of the » W3C, the organisation charged with designing and maintaining the language.
The definition of HTML is HyperText Markup Language.

  • HyperText is the method by which you move around on the web — by clicking on special text called hyperlinks which bring you to the next page. The fact that it is hyper just means it is not linear — i.e. you can go to any place on the Internet whenever you want by clicking on links — there is no set order to do things in.
  • Markup is what HTML tags do to the text inside them. They mark it as a certain type of text (italicised text, for example).
  • HTML is a Language, as it has codewords and syntax like any other language.
     
HTML Tags
What are HTML tags?

  • HTML tags are used to markup HTML elements
  • HTML tags are surrounded by the two characters < and >
  • The surrounding characters are called angle brackets
  • HTML tags normally come in pairs like <b> and </b>
  • The first tag in a pair is the start tag, the second tag is the end tag
  • The text between the start and end tags is the element content
  • HTML tags are not case sensitive, <b> means the same as <B>


Logical vs. Physical Tags
In HTML there are both logical tags and physical tags. Logical tags are designed to describe (to the browser) the enclosed text's meaning. An example of a logical tag is the <strong> </strong> tag. By placing text in between these tags you are telling the browser that the text has some greater importance. By default all browsers make the text appear bold when in between the <strong> and </strong> tags.
Physical tags on the other hand provide specific instructions on how to display the text they enclose. Examples of physical tags include:

  • <b>: Makes the text bold.
  • <big>: Makes the text usually one size bigger than what's around it.
  • <i>: Makes text italic.

Physical tags were invented to add style to HTML pages because style sheets were not around, though the original intention of HTML was to not have physical tags. Rather than use physical tags to style your HTML pages, you should use style sheets.

HTML Elements
Remember the HTML example from the previous page:
<html>
<head>
<title>My First Webpage</title>
</head>
<body>
This is my first homepage. <b>This text is bold</b>
</body>
</html>

This is an HTML element:
<b>This text is bold</b>
The HTML element begins with a start tag: <b>
The content of the HTML element is: This text is bold
The HTML element ends with an end tag: </b>
The purpose of the <b> tag is to define an HTML element that should be displayed as bold.

This is also an HTML element:
<body>
This is my first homepage. <b>This text is bold</b>
</body>
This HTML element starts with the start tag <body>, and ends with the end tag </body>. The purpose of the <body> tag is to define the HTML element that contains the body of the HTML document.

Nested Tags
You may have noticed in the example above, the <body> tag also contains other tags, like the <b> tab. When you enclose an element in with multiple tags, the last tag opened should be the first tag closed. For example:
<p><b><em>This is NOT the proper way to close nested tags.</p></em></b>
<p><b><em>This is the proper way to close nested tags. </em></b></p>

Why Use Lowercase Tags?
You may notice we've used lowercase tags even though I said that HTML tags are not case sensitive. <B> means the same as <b>. The World Wide Web Consortium (W3C), the group responsible for developing web standards, recommends lowercase tags in their HTML 4 recommendation, and XHTML (the next generation HTML) requires lowercase tags.

Tag Attributes
Tags can have attributes. Attributes can provide additional information about the HTML elements on your page. The <tag> tells the browser to do something, while the attribute tells the browser how to do it. For instance, if we add the bgcolor attribute, we can tell the browser that the background color of your page should be blue, like this: <body bgcolor="blue">.

This tag defines an HTML table: <table>. With an added border attribute, you can tell the browser that the table should have no borders: <table border="0">. Attributes always come in name/value pairs like this: name="value". Attributes are always added to the start tag of an HTML element and the value is surrounded by quotes.

Quote Styles, "red" or 'red'?
Attribute values should always be enclosed in quotes. Double style quotes are the most common, but single style quotes are also allowed. In some rare situations, like when the attribute value itself contains quotes, it is necessary to use single quotes:
name='George "machine Gun" Kelly'

Basic HTML Tags
The most important tags in HTML are tags that define headings, paragraphs and line breaks.
Tag                     Description
<html>               Defines an HTML document
<body>              Defines the document's body
<h1> to <h6>    Defines header 1 to header 6
<p>                    Defines a paragraph
<br>                  Inserts a single line break
<hr>                  Defines a horizontal rule
<!-->                  Defines a comment


Headings
Headings are defined with the <h1> to <h6> tags. <h1> defines the largest heading while <h6> defines the smallest. HTML automatically adds an extra blank line before and after a heading. A useful heading attribute is align.

<h5 align="left">I can align headings </h5>
<h5 align="center">This is a centered heading </h5>
<h5 align="right">This is a heading aligned to the right </h5>

Paragraphs
Paragraphs are defined with the <p> tag. Think of a paragraph as a block of text. You can use the align attribute with a paragraph tag as well.
<p align="left">This is a paragraph</p>
<p align="center">this is another paragraph</p>

Line Breaks
The <br> tag is used when you want to start a new line, but don't want to start a new paragraph. The <br> tag forces a line break wherever you place it. It is similar to single spacing in a document.
 The <br> tag has no closing tag.

Horizontal Rule
The <hr> element is used for horizontal rules that act as dividers between sections, like this:

The horizontal rule does not have a closing tag. It takes attributes such as align and width.

Comments in HTML
The comment tag is used to insert a comment in the HTML source code. A comment can be placed anywhere in the document and the browser will ignore everything inside the brackets. You can use comments to write notes to yourself, or write a helpful message to someone looking at your source code.


What is XML?
  • XML stands for Extensible Markup Language
  • XML is a markup language much like HTML
  • XML was designed to carry data, not to display data
  • XML tags are not predefined. You must define your own tags
  • XML is designed to be self-descriptive
  • XML is a W3C Recommendation

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards.

The design goals of XML emphasize simplicity, generality, and usability over the Internet. It is a textual data format with strong support via Unicode for the languages of the world. Although the design of XML focuses on documents, it is widely used for the representation of arbitrary data structures, for example in web services.

1. Markup and Text
Here's a complete (but very simple) XML document:

    <?xml version="1.0"?>

    <contact-info>
    <name>Jane Smith</name>
    <company>AT&amp;T</company>
    <phone>(212) 555-4567</phone>
    </contact-info>

There are two different kinds of information in this example:

  • markup, like “<contact-info>” and “&amp;”; and

  • text (also known as character data), like “Jane Smith” and “(212) 555-4567”.

XML documents mix markup and text together into a single file: the markup describes the structure of the document, while the text is the document's content (actually, sometimes markup can also represent content, as in the case of references: more on this point below). Here's the same XML document again, with the markup highlighted to distinguish it from the text:

    <?xml version="1.0"?>

    <contact-info>
    <name>Jane Smith</name>
    <company>AT&amp;T</company>
    <phone>(212) 555-4567</phone>
    </contact-info>

The rest of this tutorial shows you how to use different kinds of markup and text in an XML document:

  • the XML declaration;
  • tags and element;
  • attributes;
  • references; and
  • text.

2. The XML Declaration
All XML documents can optionally begin with an XML declaration. The XML declaration provides at a minimum the number of the version of XML in use:

    <?xml version="1.0"?>

Currently, 1.0 is the only approved version of XML, but others may appear in the future.

The XML declaration can also specify the character encoding used in the document:

    <?xml version="1.0" encoding="UTF-8"?>

All XML parsers are required to support the Unicode “UTF-8” and “UTF-16” encodings; many XML parser support other encodings, such as “ISO-8859-1”, as well.

There a few other important rules to keep in mind about the XML declaration:

  • the XML declaration is case sensitive: it may not begin with “<?XML” or any other variant
  • if the XML declaration appears at all, it must be the very first thing in the XML document: not even whitespace or comments may appear before it; and
  • it is legal for a transfer protocol like HTTP to override the encoding value that you put in the XML declaration, so you cannot guarantee that the document will actually use the encoding provided in the XML declaration.

3. Tags and elements
XML tags begin with the less-than character (“<”) and end with the greater-than character (“>”). You use tags to mark the start and end of elements, which are the logical units of information in an XML document.

An element consists of a start tag, possibly followed by text and other complete elements, followed by an end tag. The following example highlights the tags to distinguish them from the text:

    <p><person>Tony Blair</person> is <function>Prime
    Minister</function> of <location><country>Great
    Britain</country></location></p>.

4. Attributes
In addition to marking the beginning of an element, XML start tags also provide a place to specify attributes. An attribute specifies a single property for an element, using a name/value pair. One very well known example of an attribute is href in HTML:

    <a href="http://www.yahoo.com/">Yahoo!</a>

In this example, the content of the a element is the text “Yahoo!”; the attribute href provides extra information about the element (in this case, the Web page to load when a user selects the link).

Every attribute assignment consists of two parts: the attribute name (for example, href), and the attribute value (for example, http://www.yahoo.com/). There are a few rules to remember about XML attributes:

  • Attribute names in XML (unlike HTML) are case sensitive: HREF and href refer to two different XML attributes.

  • You may not provide two values for the same attribute in the same start tag. The following example is not well-formed because the b attribute is specified twice:

       <a b="x" c="y" b="z">....</a>

  • Attribute names should never appear in quotation marks, but attribute values must always appear in quotation marks in XML (unlike HTML) using the " or ' characters. The following example is not well-formed because there are no delimiters around the value of the b attribute:

       <!-- WRONG! -->
       <a b=x>...</a>

You can use the pre-defined entities “&quot;” and “&apos;” when you need to include quotation marks within an attribute value (see References for details).

Some attributes have special constraints on their allowed values: for more information, refer to the documentation provided with your document type.

5. References
A reference allows you to include additional text or markup in an XML document. References always begin with the character “&” (which is specially reserved) and end with the character “;”.

XML has two kinds of references:

  • entity references

    An entity reference, like “&amp;”, contains a name (in this case, “amp”) between the start and end delimiters. The name refers to a predefined string of text and/or markup, like a macro in the C or C++ programming languages.

  • character references

A character references, like “&#38;”, contains a hash mark (“#”) followed by a number. The number always refers to the Unicode code for a single character, such as 65 for the letter “A” or 233 for the letter “�”, or 8211 for an en-dash.

For advanced uses, XML provides a mechanism for declaring your own entities, but that is outside the scope of this tutorial. XML also provides five pre-declared entities that you can use to escape special characters in an XML document:

Character           Predeclared Entity
&                         &amp;
<                         &lt;
>                         &gt;
"                          &quot;
'                           &apos;

For example, the corporate name “AT&T” should appear in the XML markup as “AT&amp;T”: the XML parser will take care of changing “&amp;” back to “&” automatically when the document is processed.

6. Text
If you are working with 8-bit characters, you can usually type printing characters from the 7-bit (non-accented) US-ASCII character set directly into an XML document, except for the special characters “<” and “&”, and sometimes, “>” (it's best to escape it as well just to be safe). Whenever you need to include one of these three characters in the text of an XML document, simply escape it using an entity reference as described in the References section:

    <formula>x &lt; (x + 1)</formula>

For “<”, use “&lt;”, for “&”, use “&amp;”, and for “>”, use “&gt;”.

Above character position 127, things become a little trickier on some systems, because by default XML uses UTF-8 for 8-bit character encoding rather than ISO-8859-1 (Latin Alphabet # 1), which HTML and many computer operating systems use by default. UTF-8 and ISO-8859-1 are both essentially identical with US-ASCII up to position 127; for higher characters (those with accents), UTF-8 uses multi-byte escape sequences.

That means that in a UTF-8 XML document, you cannot simply use a single byte with decimal value 233 to represent “�” (and there is no predefined &eacute; entity as there is in HTML); instead, you must either enter the UTF-8 multi-byte escape sequence, or use a special kind of XML reference called a character reference:

    <p>That is everyone's favourite caf&#233;.</p>

When your text consists primarily of unaccented Roman characters, this is often the easiest way to escape the occasional accented or non-Roman character. Since “�” appears at position 233 in Unicode (as in ISO-8859-1), the XML parser will read the string correctly as “That is everyone's favourite caf�.”

What is XHTML?
XHTML (Extensible HyperText Markup Language) is a family of XML markup languages that mirror or extend versions of the widely used Hypertext Markup Language (HTML), the language in which web pages are written.

While HTML (prior to HTML5) was defined as an application of Standard Generalized Markup Language (SGML), a very flexible markup language framework, XHTML is an application of XML, a more restrictive subset of SGML. Because XHTML documents need to be well-formed, they can be parsed using standard XML parsers—unlike HTML, which requires a lenient HTML-specific parser.

XHTML 1.0 became a World Wide Web Consortium (W3C) Recommendation on January 26, 2000. XHTML 1.1 became a W3C Recommendation on May 31, 2001. XHTML5 is undergoing development as of September 2009, as part of the HTML5 specification.

Basics of XHTML
Here are some simple, basic rules for you to follow when changing your Web design from HTML to XHTML:
  1. Use All Lowercase Tags All tags must be written in lowercase letters. No longer is it allowable to write <HTML>, from now on it will be written <html>.
  2. Nest Elements Correctly HTML will forgive you but XHTML will not. Here is a common problem: <b><i>This is wrong.</b></i>. You may notice in this example that the bold and italic elements overlap. Here is the right way to nest these elements: <b><i>This is right.</i></b>.
  3. Always Use End Tags Every tag must have an end tag. When you start a paragraph you use the <p> tag, when you end a paragraph you must use the </p> tag. Same goes for the <li> tag and all other tags.
  4. End Empty Elements Now you're wondering what you should do with elements like the <br> tag, yes you need an end tag for this one too. You can either write it as <br></br> or you can use an alternative. Instead you can write <br /;>. This can be used in both HTML and XHTML so start using it now to get used to it. This feature can be used with other empty tags too such as the <hr> tag.
  5. Use Quotes for Values When you write something like: <table border=1 bgcolor=red>, you've left something out. Values must be surrounded by quotation marks. The proper way to write this would be: <table border="1" bgcolor="red">.
  6. Give Every Attribute a Value Here's where things start to get different. In HTML some attributes have no value. One such attribute is disabled. When using such an attribute you should assign the value of the attribute with the same name at the attribute itself. This is how it would look: disabled="disabled".
  7. Use Code for Special Characters XHTML can get confused when you use such symbols as < or & inside attribute values. Instead use code to write them. Try these lists of codes to help you: Common Symbols and Less Common Symbols. Instead of writing: <img src=my_picture.gif alt="Me & My Son">; you would write: <img src=my_picture.gif alt="Me &amp; My Son">.
  8. Use id Instead of name The <a>, <frame>, and <img> elements have an attribute called <name> that is used to specify a location within the HTML page. In XML the <id> attribute is used instead. It's recommended that you start using the <id> attribute now, instead of the <name> attribute. It's not mandatory for now but it will make it easier for you later.
  9. Separate Styles and Scripts If you are using CSS, JavaScript or another type of language in your Web pages you need to put them in a separate file. Link to them where you want them to show up on your page but keep them separate.

No comments:

Post a Comment