OMCSNet-WNLG is a semantic network of common sense reasoning information, based on OMCSNet 1.2, WordNet 2.0 and the Link Grammar Parser.
WordNet is an online lexical reference system that in recent years has become a popular tool for AI researchers. The Link Grammar Parser is a syntactic parser of the English language that is capable of handling a wide variety of syntactic constructions and is considered quite robust. These resources are being utilized in combination with OMCSNet to build automated processes that perform part-of-speech tagging, WordNet sense tagging and other tasks.
The goals of the OMCSNet-WNLG project are:- Part-of-speech tag the entire OMCSNet dataset.
- WordNet sense tag the entire OMCSNet dataset.
- Import additional semantic linkage data from WordNet.
- Improve the quality of the OMCSNet dataset (Remove bad links).
This is an ongoing project. The OMCSNet-WNLG v1.3 release contains part-of-speech tagging for 70% of OMCSNet and WordNet sense tagging for 25%. This release also contains over 30,000 assertions imported from WordNet, and removes around 7000 bad assertions. This brings the total number of pieces of common sense reasoning statements to over 307,000.
Author:
OMCSNet-WNLG was written by Elliot Turner (ell_NOSPAM_iot@eturner.net).
back to topGeneral:
OMCSNet+WNLG is a part of the OMCSNetCPP project.
back to topNews:
- NEW! Oct 1, 2003 - OMCSNet-WNLG v1.3 has been released! This version utilizes a new automated scoring algorithm to automatically determine WordNet senses for OMCSNet assertions. 23,000+ words have been sense-tagged using this new methodology. Around 7,000 "bad" assertions have also been removed from OMCSNet-WNLG v1.3, further increasing the quality of the dataset.
- Sept 26, 2003 - OMCSNet-WNLG v1.2 is now available! This release contains part-of-speech tagging for 10,000+ additional words, WordNet sense tagging for 3,000+ additional words, identifying tags for each OMCSNet-WNLG assertion indicating its import source (OMCSNet/WordNet), licensing information for both OMCSNet and WordNet data and lots of other general cleanups.
- Sept 24, 2003 - A new predicates.txt file that contains preliminary Part-Of-Speech and WordNet sense tagging of OMCSNet data is now available. Note that this file is not compatible with the current (1.0a) version of OMCSNetCPP. A version of OMCSNetCPP that is compatible with this updated file format will be available soon.
- Sept 22, 2003 - Now making use of the Link Grammar Parser to help assist with word type/sense tagging. An updated predicates.txt file containing preliminary tagged sentence/concept information will be available soon! This project is now known as OMCSNet-WNLG
- Sept 19, 2003 - OMCSNet+WordNet v1.0 released! This release adds over 30,000 new assertions to OMCSNet 1.2
Project Statistics:
| 172,908 | Unique concept/data strings |
| 158,186 | Concept/data strings containing part-of-speech tagging |
| 14,722 | Concept/data strings needing part-of-speech tagging |
| 91% | Concept/data string part-of-speech coverage |
| 172,908 | Unique concept/data strings |
| 67,035 | Concept/data strings containing WordNet sense tagging |
| 105,873 | Concept/data strings with no WordNet tagging |
| 38% | Concept/data string WordNet coverage |
| 527,368 | Words contained within OMCSNet-WNLG |
| 59,623 | Untagged function words |
| 312,190 | Words that are part-of-speech tagged |
| 155,555 | Words with no part-of-speech tagging |
| 70% | OMCSNet-WNLG part-of-speech coverage |
| 527,368 | Words contained within OMCSNet-WNLG |
| 59,623 | Untagged function words |
| 72,314 | Words that are WordNet sense tagged |
| 395,431 | Words with no WordNet sense tagging |
| 25% | OMCSNet-WNLG WordNet sense coverage |
Features:
- Uses the Link Grammar Parser for extracting part-of-speech information from OMCSNet assertions.
- Automated processes for "grafting" WordNet semantic data onto existing OMCSNet concepts.
- Inference techniques for determining the WordNet sense of words within OMCSNet.
Downloads & Documentation:
Download: OMCSNet-WNLG 1.3 Predicates Data
back to topNote: OMCSNet-WNLG is distributed under the terms of the GNU General Public License. Specific assertions within the OMCSNet-WNLG dataset are distributed under different licenses (specifically the WordNet license). Licensing information for these assertions is included with the OMCSNet-WNLG distribution and all assertions are tagged to indicate their origin.
