OMCSNet-WNLG is a semantic network of common sense reasoning information, based on OMCSNet 1.2, WordNet 2.0 and the Link Grammar Parser.

WordNet is an online lexical reference system that in recent years has become a popular tool for AI researchers. The Link Grammar Parser is a syntactic parser of the English language that is capable of handling a wide variety of syntactic constructions and is considered quite robust. These resources are being utilized in combination with OMCSNet to build automated processes that perform part-of-speech tagging, WordNet sense tagging and other tasks.

The goals of the OMCSNet-WNLG project are:
  • Part-of-speech tag the entire OMCSNet dataset.
  • WordNet sense tag the entire OMCSNet dataset.
  • Import additional semantic linkage data from WordNet.
  • Improve the quality of the OMCSNet dataset (Remove bad links).

This is an ongoing project. The OMCSNet-WNLG v1.3 release contains part-of-speech tagging for 70% of OMCSNet and WordNet sense tagging for 25%. This release also contains over 30,000 assertions imported from WordNet, and removes around 7000 bad assertions. This brings the total number of pieces of common sense reasoning statements to over 307,000.

Author:

OMCSNet-WNLG was written by Elliot Turner (ell_NOSPAM_iot@eturner.net).

back to top
General:

OMCSNet+WNLG is a part of the OMCSNetCPP project.

back to top
News:
  • NEW! Oct 1, 2003 - OMCSNet-WNLG v1.3 has been released! This version utilizes a new automated scoring algorithm to automatically determine WordNet senses for OMCSNet assertions. 23,000+ words have been sense-tagged using this new methodology. Around 7,000 "bad" assertions have also been removed from OMCSNet-WNLG v1.3, further increasing the quality of the dataset.
  • Sept 26, 2003 - OMCSNet-WNLG v1.2 is now available! This release contains part-of-speech tagging for 10,000+ additional words, WordNet sense tagging for 3,000+ additional words, identifying tags for each OMCSNet-WNLG assertion indicating its import source (OMCSNet/WordNet), licensing information for both OMCSNet and WordNet data and lots of other general cleanups.
  • Sept 24, 2003 - A new predicates.txt file that contains preliminary Part-Of-Speech and WordNet sense tagging of OMCSNet data is now available. Note that this file is not compatible with the current (1.0a) version of OMCSNetCPP. A version of OMCSNetCPP that is compatible with this updated file format will be available soon.
  • Sept 22, 2003 - Now making use of the Link Grammar Parser to help assist with word type/sense tagging. An updated predicates.txt file containing preliminary tagged sentence/concept information will be available soon! This project is now known as OMCSNet-WNLG
  • Sept 19, 2003 - OMCSNet+WordNet v1.0 released! This release adds over 30,000 new assertions to OMCSNet 1.2
back to top
Project Statistics:

172,908Unique concept/data strings
158,186Concept/data strings containing part-of-speech tagging
14,722Concept/data strings needing part-of-speech tagging
91%Concept/data string part-of-speech coverage

172,908Unique concept/data strings
67,035Concept/data strings containing WordNet sense tagging
105,873Concept/data strings with no WordNet tagging
38%Concept/data string WordNet coverage

527,368Words contained within OMCSNet-WNLG
59,623Untagged function words
312,190Words that are part-of-speech tagged
155,555Words with no part-of-speech tagging
70%OMCSNet-WNLG part-of-speech coverage

527,368Words contained within OMCSNet-WNLG
59,623Untagged function words
72,314Words that are WordNet sense tagged
395,431Words with no WordNet sense tagging
25%OMCSNet-WNLG WordNet sense coverage

back to top
Features:
  • Uses the Link Grammar Parser for extracting part-of-speech information from OMCSNet assertions.
  • Automated processes for "grafting" WordNet semantic data onto existing OMCSNet concepts.
  • Inference techniques for determining the WordNet sense of words within OMCSNet.
back to top
Downloads & Documentation:

Download: OMCSNet-WNLG 1.3 Predicates Data

back to top

Note: OMCSNet-WNLG is distributed under the terms of the GNU General Public License. Specific assertions within the OMCSNet-WNLG dataset are distributed under different licenses (specifically the WordNet license). Licensing information for these assertions is included with the OMCSNet-WNLG distribution and all assertions are tagged to indicate their origin.