The MetaData Song
Updated: Feb 25, 2022
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
Descriptive metadata — the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords.
Structural metadata — metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters. It describes the types, versions, relationships and other characteristics of digital materials.
Administrative metadata — the information to help manage a resource, like resource type, permissions, and when and how it was created.
Reference metadata — the information about the contents and quality of statistical data.
Statistical metadata, also called process data, may describe processes that collect, process, or produce statistical data.
Legal metadata — provides information about the creator, copyright holder, and public licensing, if provided.
Metadata is not strictly bounded to one of these categories, as it can describe a piece of data in many other ways.
In Popular Culture
One of the first satirical examinations of the concept of Metadata as we understand it today is American Science Fiction author Hal Draper's short story, MS Fnd in a Lbry (1961). Here, the knowledge of all Mankind is condensed into an object the size of a desk drawer, however the magnitude of the metadata (e.g. catalog of catalogs of... , as well as indexes and histories) eventually leads to dire yet humorous consequence for the human race. The story prefigures the modern consequences of allowing metadata to become more important than the real data it is concerned with, and the risks inherent in that eventuality as a cautionary tale.
Administration and Management
Metadata can be stored either internally, in the same file or structure as the data (this is also called embedded metadata), or externally, in a separate file or field from the described data. A data repository typically stores the metadata detached from the data, but can be designed to support embedded metadata approaches. Each option has advantages and disadvantages:
Internal storage means metadata always travels as part of the data they describe; thus, metadata is always available with the data, and can be manipulated locally. This method creates redundancy (precluding normalization), and does not allow managing all of a system's metadata in one place. It arguably increases consistency, since the metadata is readily changed whenever the data is changed.
External storage allows collocating metadata for all the contents, for example in a database, for more efficient searching and management. Redundancy can be avoided by normalizing the metadata's organization. In this approach, metadata can be united with the content when information is transferred, for example in Streaming media; or can be referenced (for example, as a web link) from the transferred content. On the down side, the division of the metadata from the data content, especially in standalone files that refer to their source metadata elsewhere, increases the opportunities for misalignments between the two, as changes to either may not be reflected in the other.
Metadata can be stored in either human-readable or binary form. Storing metadata in a human-readable format such as XML can be useful because users can understand and edit it without specialized tools. However, text-based formats are rarely optimized for storage capacity, communication time, or processing speed. A binary metadata format enables efficiency in all these respects, but requires special software to convert the binary information into human-readable content.
Each relational database system has its own mechanisms for storing metadata. Examples of relational-database metadata include:
Tables of all tables in a database, their names, sizes, and number of rows in each table.
Tables of columns in each database, what tables they are used in, and the type of data stored in each column.
In database terminology, this set of metadata is referred to as the catalog. The SQL standard specifies a uniform means to access the catalog, called the information schema, but not all databases implement it, even if they implement other aspects of the SQL standard. For an example of database-specific metadata access methods, see Oracle metadata. Programmatic access to metadata is possible using APIs such as JDBC, or SchemaCrawler.
Metadata means "data about data". Although the "meta" prefix means "after" or "beyond", it is used to mean "about" in epistemology. Metadata is defined as the data providing information about one or more aspects of the data; it is used to summarize basic information about data which can make tracking and working with specific data easier. Some examples include:
Means of creation of the data
Purpose of the data
Time and date of creation
Creator or author of the data
Location on a computer network where the data was created
Source of the data
Process used to create the data
For example, a digital image may include metadata that describes the size of the image, its color depth, resolution, when it was created, the shutter speed, and other data. A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document. Metadata within web pages can also contain descriptions of page content, as well as key words linked to the content. These links are often called "Metatags", which were used as the primary factor in determining order for a web search until the late 1990s. The reliance of metatags in web searches was decreased in the late 1990s because of "keyword stuffing". Metatags were being largely misused to trick search engines into thinking some websites had more relevance in the search than they really did.
Metadata can be stored and managed in a database, often called a metadata registry or metadata repository. However, without context and a point of reference, it might be impossible to identify metadata just by looking at it. For example: by itself, a database containing several numbers, all 13 digits long could be the results of calculations or a list of numbers to plug into an equation - without any other context, the numbers themselves can be perceived as the data. But if given the context that this database is a log of a book collection, those 13-digit numbers may now be identified as ISBNs - information that refers to the book, but is not itself the information within the book. The term "metadata" was coined in 1968 by Philip Bagley, in his book "Extension of Programming Language Concepts" where it is clear that he uses the term in the ISO 11179 "traditional" sense, which is "structural metadata" i.e. "data about the containers of data"; rather than the alternative sense "content about individual instances of data content" or metacontent, the type of data usually found in library catalogues. Since then the fields of information management, information science, information technology, librarianship, and GIS have widely adopted the term. In these fields the word metadata is defined as "data about data". While this is the generally accepted definition, various disciplines have adopted their own more specific explanation and uses of the term.
Slate reported in 2013 that the United States government's interpretation of "metadata" could be broad, and might include message content such as the subject lines of emails.