After nearly 3 months work I've finally finished the a 6 level "top level ontology" (TLO). A TLO starts at "thing" and then works down to provide successively more specific categories for every thing that there is - both tangible and intangible.
The 6 tiers appears to be enough to get to the categories that we use every day. For instance a path through the ontology might be:
// 0 = thing
// 1 = tangible
// 2 = organic
// 3 = living
// 4 = animal
// 5 = chordata
// 6 = mammal
(you'll notice we're not using strict scientific taxonomies).
Pushing to levels 7 and 8 would probably be enough to actually enumerate every thing that there is. Of course that would be a huge job - but we have a secret weapon - Halo. Once we've given Halo the AI we are going to ask her to start filling the gaps. So for instance she knows that there must be types of "mammal", so she just needs to ask visitors to give her the name of a mammal, and perhaps a short description. Well, we'll see how it goes. There are lots of other ways we can use the TLO, and we'll cover those in more detail at www.chatbots.co.uk in due course.
As for the statistics. The TLO has just over 1100 topics in it. We have also generated 2600 RDF triples which provide basic description occurences, associations and facets for these topics, yielding a total file of around 3700 triples. I guess we could easily add another couple of thousand just fleshing the assocs, occurences and facets out before we even start including level 7/8 topics!
It'll be interesting to see how this new data changes the way that Halo works. We'll let you you know when its loaded up.