RDFS. The Resource Description Framework Schema
(RDFS) is a W3C standard for knowledge representation.
It is used in most major KBs. RDFS is based on a set U
of resources. In most applications, the resources are partitioned into instances I, relations R, literals L, and classes
C, with U = I∪R∪L∪C. An instance is any entity of the
world, such as a person, a city, or a movie. A class (or
type) is a name for a set of instances. The class city , e.g.,
is the name for the set of all instances that are cities. A
relation is a name for a relationship between resources,such
as loves,or lives In. Every relation comes with a domain
dom(r) ∈ C and a range ran(r) ∈ C. A literal is number,
string, or date. Literals usually take the form "string"^^
datatype. Here, string is the string representation of a
number, date, or other literal. datatype is a resource. For
YAGO,the data types behave exactly like classes: Every literal is considered an instance of its datatype. Usually, instances,classes, and relations are prefixed by a name space.
We omit the namespace in this paper for legibility. In all
of the following,we assume fixed sets U,I,R,L,C. A statement (or fact) is a triple s∈(UL)×R×U, and usually
for most statementss,s∈I×R×(I∪L). The statement
says that the first component(thesubject)stands in the relation given by the second component (the predicate ) with
the third component(theobject),asinhElvis, marriedTo,
Priscillai. We use the statementh Elvis, type, singeri
to say that Elvis is an instance of the class singer . We
usehsinger, subClassOf, person ito say that singer is a
subclass of person. The subClassOf-relationship is transitive. A knowledge base (KB)is set of statements.
Wikipedia. The online encyclopedia Wikipedia is written
by a community of volunteers. It is available in 287 languages, and 9 of them have more than 1m articles.
The English edition currently has 4.5m articles. The articles are
written in the Wiki markup language. Each article usually
describes one concept or entity. Most articles are members
of one or several categories. The article about Elvis Presley,e.g.,is in the categories American baritones,and1935
births. Furthermore,many articles havean infobox. An infobox is a set of attribute-value pairs with information about
the article entity, such as {birthplace = Tupelo, birth-
date = 8 January 1935, ...}. The infoboxes are grouped
into templates,which of tencarry the name of the class of the
article entity. For example,the infobox for Elvis be longs to
the template singer. The templates define which attributes
may be used. However, the templates are not used consistently,and the attributes vary widely across articles.