What is HTML?
HTML
is a computer language devised to allow website creation. These
websites can then be viewed by anyone else connected to the Internet. It
is relatively easy to learn, with the basics being accessible to most
people in one sitting; and quite powerful in what it allows you to
create. It is constantly undergoing revision and evolution to meet the
demands and requirements of the growing Internet audience under the
direction of the » W3C, the organisation charged with designing and
maintaining the language.
The definition of HTML is HyperText Markup Language.
- HyperText is the method by which you move around on the web — by clicking on special text called hyperlinks which bring you to the next page. The fact that it is hyper just means it is not linear — i.e. you can go to any place on the Internet whenever you want by clicking on links — there is no set order to do things in.
- Markup is what HTML tags do to the text inside them. They mark it as a certain type of text (italicised text, for example).
- HTML is a Language, as it has codewords and syntax like any other language.
HTML Tags
What are HTML tags?
- HTML tags are used to markup HTML elements
- HTML tags are surrounded by the two characters < and >
- The surrounding characters are called angle brackets
- HTML tags normally come in pairs like <b> and </b>
- The first tag in a pair is the start tag, the second tag is the end tag
- The text between the start and end tags is the element content
- HTML tags are not case sensitive, <b> means the same as <B>
Logical vs. Physical Tags
In
HTML there are both logical tags and physical tags. Logical tags are
designed to describe (to the browser) the enclosed text's meaning. An
example of a logical tag is the <strong> </strong> tag. By
placing text in between these tags you are telling the browser that the
text has some greater importance. By default all browsers make the text
appear bold when in between the <strong> and </strong> tags.
Physical
tags on the other hand provide specific instructions on how to display
the text they enclose. Examples of physical tags include:
- <b>: Makes the text bold.
- <big>: Makes the text usually one size bigger than what's around it.
- <i>: Makes text italic.
Physical
tags were invented to add style to HTML pages because style sheets were
not around, though the original intention of HTML was to not have
physical tags. Rather than use physical tags to style your HTML pages,
you should use style sheets.
HTML Elements
Remember the HTML example from the previous page:
<html>
<head>
<title>My First Webpage</title>
</head>
<body>
This is my first homepage. <b>This text is bold</b>
</body>
</html>
This is an HTML element:
<b>This text is bold</b>
The HTML element begins with a start tag: <b>
The content of the HTML element is: This text is bold
The HTML element ends with an end tag: </b>
The purpose of the <b> tag is to define an HTML element that should be displayed as bold.
This is also an HTML element:
<body>
This is my first homepage. <b>This text is bold</b>
</body>
This
HTML element starts with the start tag <body>, and ends with the
end tag </body>. The purpose of the <body> tag is to define
the HTML element that contains the body of the HTML document.
Nested Tags
You
may have noticed in the example above, the <body> tag also
contains other tags, like the <b> tab. When you enclose an element
in with multiple tags, the last tag opened should be the first tag
closed. For example:
<p><b><em>This is NOT the proper way to close nested tags.</p></em></b>
<p><b><em>This is the proper way to close nested tags. </em></b></p>
Why Use Lowercase Tags?
You
may notice we've used lowercase tags even though I said that HTML tags
are not case sensitive. <B> means the same as <b>. The World
Wide Web Consortium (W3C), the group responsible for developing web
standards, recommends lowercase tags in their HTML 4 recommendation, and
XHTML (the next generation HTML) requires lowercase tags.
Tag Attributes
Tags
can have attributes. Attributes can provide additional information
about the HTML elements on your page. The <tag> tells the browser
to do something, while the attribute tells the browser how to do it. For
instance, if we add the bgcolor attribute, we can tell the browser that
the background color of your page should be blue, like this: <body
bgcolor="blue">.
This
tag defines an HTML table: <table>. With an added border
attribute, you can tell the browser that the table should have no
borders: <table border="0">. Attributes always come in name/value
pairs like this: name="value". Attributes are always added to the start
tag of an HTML element and the value is surrounded by quotes.
Quote Styles, "red" or 'red'?
Attribute
values should always be enclosed in quotes. Double style quotes are the
most common, but single style quotes are also allowed. In some rare
situations, like when the attribute value itself contains quotes, it is
necessary to use single quotes:
name='George "machine Gun" Kelly'
Basic HTML Tags
The most important tags in HTML are tags that define headings, paragraphs and line breaks.
Tag Description
<html> Defines an HTML document
<body> Defines the document's body
<h1> to <h6> Defines header 1 to header 6
<p> Defines a paragraph
<br> Inserts a single line break
<hr> Defines a horizontal rule
<!--> Defines a comment
Headings
Headings
are defined with the <h1> to <h6> tags. <h1> defines
the largest heading while <h6> defines the smallest. HTML automatically adds an extra blank line before and after a heading. A useful heading attribute is align.
<h5 align="left">I can align headings </h5>
<h5 align="center">This is a centered heading </h5>
<h5 align="right">This is a heading aligned to the right </h5>
Paragraphs
Paragraphs
are defined with the <p> tag. Think of a paragraph as a block of
text. You can use the align attribute with a paragraph tag as well.
<p align="left">This is a paragraph</p>
<p align="center">this is another paragraph</p>
Line Breaks
The
<br> tag is used when you want to start a new line, but don't
want to start a new paragraph. The <br> tag forces a line break
wherever you place it. It is similar to single spacing in a document.
The <br> tag has no closing tag.
Horizontal Rule
The <hr> element is used for horizontal rules that act as dividers between sections, like this:
The horizontal rule does not have a closing tag. It takes attributes such as align and width.
Comments in HTML
The
comment tag is used to insert a comment in the HTML source code. A
comment can be placed anywhere in the document and the browser will
ignore everything inside the brackets. You can use comments to write
notes to yourself, or write a helpful message to someone looking at your
source code.
- XML stands for Extensible Markup Language
- XML is a markup language much like HTML
- XML was designed to carry data, not to display data
- XML tags are not predefined. You must define your own tags
- XML is designed to be self-descriptive
- XML is a W3C Recommendation
Extensible
Markup Language (XML) is a markup language that defines a set of rules
for encoding documents in a format that is both human-readable and
machine-readable. It is defined in the XML 1.0 Specification produced by
the W3C, and several other related specifications, all gratis open
standards.
The
design goals of XML emphasize simplicity, generality, and usability
over the Internet. It is a textual data format with strong support via
Unicode for the languages of the world. Although the design of XML
focuses on documents, it is widely used for the representation of
arbitrary data structures, for example in web services.
1. Markup and Text
Here's a complete (but very simple) XML document:
<?xml version="1.0"?>
<contact-info>
<name>Jane Smith</name>
<company>AT&T</company>
<phone>(212) 555-4567</phone>
</contact-info>
There are two different kinds of information in this example:
- markup, like “<contact-info>” and “&”; and
- text (also known as character data), like “Jane Smith” and “(212) 555-4567”.
XML
documents mix markup and text together into a single file: the markup
describes the structure of the document, while the text is the
document's content (actually, sometimes markup can also represent
content, as in the case of references: more on this point below). Here's
the same XML document again, with the markup highlighted to distinguish
it from the text:
<?xml version="1.0"?>
<contact-info>
<name>Jane Smith</name>
<company>AT&T</company>
<phone>(212) 555-4567</phone>
</contact-info>
The rest of this tutorial shows you how to use different kinds of markup and text in an XML document:
- the XML declaration;
- tags and element;
- attributes;
- references; and
- text.
2. The XML Declaration
All
XML documents can optionally begin with an XML declaration. The XML
declaration provides at a minimum the number of the version of XML in
use:
<?xml version="1.0"?>
Currently, 1.0 is the only approved version of XML, but others may appear in the future.
The XML declaration can also specify the character encoding used in the document:
<?xml version="1.0" encoding="UTF-8"?>
All
XML parsers are required to support the Unicode “UTF-8” and “UTF-16”
encodings; many XML parser support other encodings, such as
“ISO-8859-1”, as well.
There a few other important rules to keep in mind about the XML declaration:
- the XML declaration is case sensitive: it may not begin with “<?XML” or any other variant
- if the XML declaration appears at all, it must be the very first thing in the XML document: not even whitespace or comments may appear before it; and
- it is legal for a transfer protocol like HTTP to override the encoding value that you put in the XML declaration, so you cannot guarantee that the document will actually use the encoding provided in the XML declaration.
3. Tags and elements
XML
tags begin with the less-than character (“<”) and end with the
greater-than character (“>”). You use tags to mark the start and end
of elements, which are the logical units of information in an XML
document.
An
element consists of a start tag, possibly followed by text and other
complete elements, followed by an end tag. The following example
highlights the tags to distinguish them from the text:
<p><person>Tony Blair</person> is <function>Prime
Minister</function> of <location><country>Great
Britain</country></location></p>.
4. Attributes
In
addition to marking the beginning of an element, XML start tags also
provide a place to specify attributes. An attribute specifies a single
property for an element, using a name/value pair. One very well known
example of an attribute is href in HTML:
<a href="http://www.yahoo.com/">Yahoo!</a>
In
this example, the content of the a element is the text “Yahoo!”; the
attribute href provides extra information about the element (in this
case, the Web page to load when a user selects the link).
Every
attribute assignment consists of two parts: the attribute name (for
example, href), and the attribute value (for example,
http://www.yahoo.com/). There are a few rules to remember about XML
attributes:
- Attribute names in XML (unlike HTML) are case sensitive: HREF and href refer to two different XML attributes.
- You may not provide two values for the same attribute in the same start tag. The following example is not well-formed because the b attribute is specified twice:
<a b="x" c="y" b="z">....</a>
- Attribute names should never appear in quotation marks, but attribute values must always appear in quotation marks in XML (unlike HTML) using the " or ' characters. The following example is not well-formed because there are no delimiters around the value of the b attribute:
<!-- WRONG! -->
<a b=x>...</a>
You
can use the pre-defined entities “"” and “'” when you
need to include quotation marks within an attribute value (see
References for details).
Some
attributes have special constraints on their allowed values: for more
information, refer to the documentation provided with your document
type.
5. References
A
reference allows you to include additional text or markup in an XML
document. References always begin with the character “&” (which is
specially reserved) and end with the character “;”.
XML has two kinds of references:
- entity references
An entity reference, like “&”, contains a name (in this case,
“amp”) between the start and end delimiters. The name refers to a
predefined string of text and/or markup, like a macro in the C or C++
programming languages.
- character references
A
character references, like “&”, contains a hash mark (“#”)
followed by a number. The number always refers to the Unicode code for a
single character, such as 65 for the letter “A” or 233 for the letter
“�”, or 8211 for an en-dash.
For
advanced uses, XML provides a mechanism for declaring your own
entities, but that is outside the scope of this tutorial. XML also
provides five pre-declared entities that you can use to escape special
characters in an XML document:
Character Predeclared Entity
& &
< <
> >
" "
' '
For
example, the corporate name “AT&T” should appear in the XML markup
as “AT&T”: the XML parser will take care of changing “&”
back to “&” automatically when the document is processed.
6. Text
If
you are working with 8-bit characters, you can usually type printing
characters from the 7-bit (non-accented) US-ASCII character set directly
into an XML document, except for the special characters “<” and
“&”, and sometimes, “>” (it's best to escape it as well just to
be safe). Whenever you need to include one of these three characters in
the text of an XML document, simply escape it using an entity reference
as described in the References section:
<formula>x < (x + 1)</formula>
For “<”, use “<”, for “&”, use “&”, and for “>”, use “>”.
Above
character position 127, things become a little trickier on some
systems, because by default XML uses UTF-8 for 8-bit character encoding
rather than ISO-8859-1 (Latin Alphabet # 1), which HTML and many
computer operating systems use by default. UTF-8 and ISO-8859-1 are both
essentially identical with US-ASCII up to position 127; for higher
characters (those with accents), UTF-8 uses multi-byte escape sequences.
That
means that in a UTF-8 XML document, you cannot simply use a single byte
with decimal value 233 to represent “�” (and there is no predefined
é entity as there is in HTML); instead, you must either enter
the UTF-8 multi-byte escape sequence, or use a special kind of XML
reference called a character reference:
<p>That is everyone's favourite café.</p>
When
your text consists primarily of unaccented Roman characters, this is
often the easiest way to escape the occasional accented or non-Roman
character. Since “�” appears at position 233 in Unicode (as in
ISO-8859-1), the XML parser will read the string correctly as “That is
everyone's favourite caf�.”
XHTML
(Extensible HyperText Markup Language) is a family of XML markup
languages that mirror or extend versions of the widely used Hypertext
Markup Language (HTML), the language in which web pages are written.
While
HTML (prior to HTML5) was defined as an application of Standard
Generalized Markup Language (SGML), a very flexible markup language
framework, XHTML is an application of XML, a more restrictive subset of
SGML. Because XHTML documents need to be well-formed, they can be parsed
using standard XML parsers—unlike HTML, which requires a lenient
HTML-specific parser.
XHTML
1.0 became a World Wide Web Consortium (W3C) Recommendation on January
26, 2000. XHTML 1.1 became a W3C Recommendation on May 31, 2001. XHTML5
is undergoing development as of September 2009, as part of the HTML5
specification.
Basics of XHTML
Here are some simple, basic rules for you to follow when changing your Web design from HTML to XHTML:
- Use All Lowercase Tags All tags must be written in lowercase letters. No longer is it allowable to write <HTML>, from now on it will be written <html>.
- Nest Elements Correctly HTML will forgive you but XHTML will not. Here is a common problem: <b><i>This is wrong.</b></i>. You may notice in this example that the bold and italic elements overlap. Here is the right way to nest these elements: <b><i>This is right.</i></b>.
- Always Use End Tags Every tag must have an end tag. When you start a paragraph you use the <p> tag, when you end a paragraph you must use the </p> tag. Same goes for the <li> tag and all other tags.
- End Empty Elements Now you're wondering what you should do with elements like the <br> tag, yes you need an end tag for this one too. You can either write it as <br></br> or you can use an alternative. Instead you can write <br /;>. This can be used in both HTML and XHTML so start using it now to get used to it. This feature can be used with other empty tags too such as the <hr> tag.
- Use Quotes for Values When you write something like: <table border=1 bgcolor=red>, you've left something out. Values must be surrounded by quotation marks. The proper way to write this would be: <table border="1" bgcolor="red">.
- Give Every Attribute a Value Here's where things start to get different. In HTML some attributes have no value. One such attribute is disabled. When using such an attribute you should assign the value of the attribute with the same name at the attribute itself. This is how it would look: disabled="disabled".
- Use Code for Special Characters XHTML can get confused when you use such symbols as < or & inside attribute values. Instead use code to write them. Try these lists of codes to help you: Common Symbols and Less Common Symbols. Instead of writing: <img src=my_picture.gif alt="Me & My Son">; you would write: <img src=my_picture.gif alt="Me & My Son">.
- Use id Instead of name The <a>, <frame>, and <img> elements have an attribute called <name> that is used to specify a location within the HTML page. In XML the <id> attribute is used instead. It's recommended that you start using the <id> attribute now, instead of the <name> attribute. It's not mandatory for now but it will make it easier for you later.
- Separate Styles and Scripts If you are using CSS, JavaScript or another type of language in your Web pages you need to put them in a separate file. Link to them where you want them to show up on your page but keep them separate.
No comments:
Post a Comment