Natural language processing
One of the HILab major research interests is to create Natural Language Processing tools, more specifically to apply machine learning techniques for extracting linguistic knowledge from Modern Greek text. The most significant NLP activities that have been addressed by HILab are listed next. The learning datasets, if completed, are freely available for research and experimentation upon request.
PP Attachment in Modern Greek
Prepositional Phrase Attachment is an ongoing syntactic disambiguation research challenge. HILab has proposed a machine learning approach to resolving PP attachment in Modern Greek. The dataset is available for experimentation upon request. The approach has been documented and published in the Panhellenic Conference of Informatics (PCI 2010). Please cite the publication when using the data.
Shallow Parsing in Modern Greek
HILab has investigated the automatic identification of subject-verb-object dependencies in Modern Greek text. The dataset for learning the syntactic relations in Modern Greek is available for experimentation upon request. The shallow parser has been documented and published in the Artificial Intelligence Applications and Innovations Conference (AIAI 2011). Please cite the publication when using the data.
Morphological Case Tagging in Modern Greek
Morphological case ambiguity in Modern Greek is a research challenge, as several words appear in more than one cases in the same orthographic form. On the other hand, case tagging is essential for identifying syntactic and semantic roles of the constituents within a Modern Greek sentence, as these roles are determined by the constituents’ morphology rather than their position in the sentence. HILab has been working on applying machine learning techniques to identify the case value of Modern Greek nouns, adjectives, articles, pronouns and numerals. The dataset is available for experimentation. The approach was published at the Panhellenic Conference on Artificial Intelligence. Please cite the publication when using the data.
Identifying Personality Traits From Linguistic Data
Several research approaches have indicated the link between the linguistic properties of an author’s work and his/her personality. HILab is proposing the use of machine learning in order to identify the value of each of the Big Five personality traits of an author, by linguistic processing of his/her Modern Greek text. The dataset is available for experimentation. The approach was published at the 1st Workshop on Mining Humanistic Data, organized by HILab. Please cite the publication when using the data.
Automatic Spelling Correction in Greek homophone words
Machine learning techniques are employed for the automatic correction of spelling errors in Greek adjectives and verbs that sound alike but are spelled differently (homophones), using minimal linguistic information. The dataset is available for experimentation in csv and arff format. The files named adataset are for learning the spelling of othographically ambiguous adjectives, while vdataset are for learning the spelling of othographically ambiguous verbs. *_f.arff are the datasets after perfomring Synthetic Minority Oversampling (SMOTE).