The terms taxonomy, ontology, directory, cataloguing, categorization
and classification are often
confused and used interchangeably. These are all ways of organizing information (or things or
animals) into categories.
Categorization is the process of associating an object with one or more
subject categories. So the
entry for a page on cross trainer shoes could go into Running, Manufacturing, Sports Medicine, or
Rushkoff, Douglas! All of these are legitimate, depending on the context.
Cataloging and Classification come from libraries, where specialists
enter the metadata (such as
author, date, title and edition) for a document, apply subject categories to it, and place it into a
class (such as a call number) for later retrieval. These tend to be used interchangeably with
Categorization.
Clustering is the process of grouping documents based on similarity of
words, or the concepts in
the documents as interpreted by an analytical engine. These engines use complex algorithms
including Natural Language Processing, Latent Semantic Analysis, Bayesian statistical analysis,
and so on.
A Thesaurus is a set of related terms describing a set of documents.
This is not hierarchical: it
describes the standard terms for concepts in a controlled vocabulary. Thesauri include synonyms
and more complex relationships, such as broader or narrower terms, related terms and other forms
of words.
Taxonomy is the organization of a particular set of information for a
particular purpose. It comes
from biology, where it's used to define the single location for a species within a complex hierarchic.
Biologists have arguments about where various species belong, although DNA analysis can resolve
most of the questions. In informational taxonomies, items can fit into several taxonomic categories.
Ontology is the study of the categories of things within a domain. It
comes from philosophy and
provides a logical framework for academic research on knowledge representation. Work on
ontologies involves schema and diagrams for showing relationships in Venn diagrams, trees,
lattices and so on.