When I refer to a meaning model, I am talking about the set of semantic references to factual statements contained within the metadata of a text document.
Humans would still consume natural language text documents, but machines would be able to index, parse, validate, etc., the explicit "aboutness" of each statement. Processing and publishing documents in this manner would require the use of a Semantic Content Management System. Reporters writing within an SCMS would embed RDF tags within the code expressing the natural language text of the document.
To explain what this entails, we'll examine the lead graph from the current top story on Google News at 5:25 p.m. EST on Jan. 19, 2011:
WASHINGTON (AP) — Eager to honor their campaign pledge, Republicans pushed legislation to repeal the nation's year-old health care law toward House passage Wednesday despite implacable opposition in the Senate and a veto threat from President Barack Obama.
The meaning model of this paragraph would include semantic tags around the following terms:
- campaign pledge
- pushed legislation to repeal (Wednesday)
- year-old (health care law)
- opposition in the (Senate)
- veto threat
- Barack Obama
Each one of those tags would include the address of a URI that best and most completely expresses the concept referenced by the word or phrase in the story. Since the knowledge archived at those URIs will be expressed in RDF, most of these URIs will reference RDF Triples, and in many cases, triples about triples.
As we can see in the list of factual references required to derive a meaning model for this 38-word paragraph, the task is daunting. This is why an SCMS is required. Instead of editing and publishing merely a news document, an SCMS will additionally publish and edit all the additions and updates required to connect the story's meaning model to its directory of meaning.
Some of the concepts in the meaning model are simple ones. For instance, when the writer mentions Republicans, he is referring to the Republican Party (of the United States). The meaning model could reference a triple such as:
Republican Party | has function | political party
Alternately, the meaning model could be more specific, as the writer is actually talking about Republican members of Congress:
112th Congress | has House majority | Republican Party
In either case, these extremely simple statements merely begin the process of assigning relationships between facts.
United States House of Representatives | has Congressional class | 112th Congress
112th Congress | has convening date | ----Z01032011
112th Congress | has total House members | 435
112th Congress | has House members | 242 Republicans
And the beauty of RDF is that we can continue creating and applying relevant true statements about these subjects as necessary. When a statement is new, not only must it be expressed in the meaning model of the story that introduces it, it must be added to the directory of meaning that comprises the sum total of knowledge expressed by the organization. Once we've created an RDF triple, all future references to that statement must recycle the original triple.
Other concepts are a bit more complex. For instance, "nation's health care law" is a general reference to two specific pieces of legislation: The Patient Protection and Affordable Care Act, and the Health Care and Education Reconciliation Act of 2010. Each of these can be used as a subject or an object, but we could also write triples that combine both:
2010 health care reform | has component | The Patient Protection and Affordable Care Act
2010 health care reform | has component | Health Care and Education Reconcilliation Act of 2010
2010 health care reform | has slang name | Obamacare
Why combine them? Well, you don't have to. There are multiple ways of expressing meaning within this grammar system. What's important is that organizations that publish directories of meaning use professional standards for markup and structure, so that the information they contain may be read and used by humans and robots alike.
It's best when organizations use standards for creating these triples, as this allows multiple sources to benefit from using the same directory of meaning, with enormous databases of triples expressing every known fact about a subject. So while you might begin with
Barack Hussein Obama | has job | President of the United States
you would proceed to:
Barack Hussein Obama | has birthdate | 0524Z04081961
Barack Hussein Obama | has home address | 1600 Pennsylvania Ave., Washington D.C., DC 20500
Barack Hussein Obama | has spouse | Michelle Robinson
And so on.
Some of these statents are reflexive (Michelle Robinson also has spouse Barack Obama), others are not (1600 Pennnsylvania Avenue does not have the home addresss "Barack Hussein Obama"). And once each triple has its own URI, we can begin to assign relationships to them.
(Barack Hussein Obama | has spouse | Michelle Robinson) | has wedding date | ----Z03101992
So back to our meaning model. We have a set of things to which we need to apply meanings. While writing and editing our story, our SCMS will prompt us to select appropriate triples or semantic expressions. In this case, we would be searching for the best expression of these concepts:
- campaign pledge | Republican opposition to health care reform expressed during 2010 midterm elections
- Republicans | Republican members of the U.S. House of Representatives
- pushed legislation to repeal (Wednesday) | "Repealing the Job-Killing Health Care Law Act"
- nation's | United States of America
- year-old (health care law) | 2010 health care reform
- House | U.S. House of Representatives
- opposition in the (Senate) | Senate has Democratic majority | Senate Democrats oppose "Repealing the Job-Killing Health Care Law Act."
- veto threat | Barack Obama opposes "Repealing the Job-Killing Health Care Law Act."
- Barack Obama | POTUS
And so on.
Once this meaning model is in place, any person wishing to trace a statement back to its origins in our directory of meaning may go to the URI in the tag and follow each statement back through its triple series back to its original statements. But more to the point, semantic robots may do the same.
While this isn't a magic bullet solution to bad journalism or sloppy rhetoric, a person making the claim that the Patient Protection and Affordable Care Act includes jail time for non-compliance would have to cite a section of the law that supports the claim. News organizations that do not connect their natural language text statements to their meaning model to their directory of meaning within a certain amount of time would fail machine audits of their archives, which could have a direct bearing on their standards-based certification.
In other words, the goal of a meaning model is to produce a machine-readable set of RDF statements that would validate as correct if run through a program such as an XML parsing engine.