Natural language processing

One of the HILab major research interests is to create Natural Language Processing tools, more specifically to apply machine learning techniques for extracting linguistic knowledge from Modern Greek text. The most significant NLP activities that have been addressed by HILab are listed next. The learning datasets, if completed, are freely available for research and experimentation upon request.

PP Attachment in Modern Greek

Prepositional Phrase Attachment is an ongoing syntactic disambiguation research challenge. HILab has proposed a machine learning approach to resolving PP attachment in Modern Greek. The dataset is available for experimentation upon request. The approach has been documented and published in the Panhellenic Conference of Informatics (PCI 2010). Please cite the publication when using the data.

Pavlos Nalmpantis, Romanos Kalamatianos, Konstantinos Kordas and Katia Kermanidis. 2010. Low Resources Prepositional Phrase Attachment. Proceedings of the Panhellenic Conference on Informatics (PCI). Tripolis, Greece, September 2010.

Shallow Parsing in Modern Greek

HILab has investigated the automatic identification of subject-verb-object dependencies in Modern Greek text. The dataset for learning the syntactic relations in Modern Greek is available for experimentation upon request. The shallow parser has been documented and published in the Artificial Intelligence Applications and Innovations Conference (AIAI 2011). Please cite the publication when using the data.

A. Karozou and K. Kermanidis. 2011. Learning Shallow Syntactic Dependencies from Imbalanced Datasets: A Case Study in Modern Greek and English. In Proceedings of the Joint International Conferences on Engineering Applications of Neural Networks (EANN) and Artificial Intelligence Applications and Innovations (AIAI). Corfu, Greece, September 15-18 2011.

Morphological Case Tagging in Modern Greek

Morphological case ambiguity in Modern Greek is a research challenge, as several words appear in more than one cases in the same orthographic form. On the other hand, case tagging is essential for identifying syntactic and semantic roles of the constituents within a Modern Greek sentence, as these roles are determined by the constituents’ morphology rather than their position in the sentence. HILab has been working on applying machine learning techniques to identify the case value of Modern Greek nouns, adjectives, articles, pronouns and numerals. The dataset is available for experimentation. The approach was published at the Panhellenic Conference on Artificial Intelligence. Please cite the publication when using the data.

Antonis Koursoumis, Evangelia Gkatzou, Antigoni M. Founta, Vassiliki I. Mavriki, Karolos Talvis, Spyros Mprilis, Ahmad A. Aliwat, Katia Lida Kermanidis. Learning to Case-Tag Modern Greek Text. SETN 2012. Lamia, Greece, pp. 353-360

Identifying Personality Traits From Linguistic Data

Several research approaches have indicated the link between the linguistic properties of an author’s work and his/her personality. HILab is proposing the use of machine learning in order to identify the value of each of the Big Five personality traits of an author, by linguistic processing of his/her Modern Greek text. The dataset is available for experimentation. The approach was published at the 1st Workshop on Mining Humanistic Data, organized by HILab. Please cite the publication when using the data.

Vasileios Komianos, Eleni Moustaka, Maria Andreou, Eirini Banou, Sofia Fanarioti, Katia L. Kermanidis. Predicting Personality Traits from Spontaneous Modern Greek Text: Overcoming the Barriers. Artificial Intelligence Applications and Innovations – AIAI 2012 International Workshop: MHDW, Halkidiki, Greece, September 27-30, 2012, Proceedings, Part II 2012

Automatic Spelling Correction in Greek homophone words

Machine learning techniques are employed for the automatic correction of spelling errors in Greek adjectives and verbs that sound alike but are spelled differently (homophones), using minimal linguistic information. The dataset is available for experimentation in csv and arff format. The files named adataset are for learning the spelling of othographically ambiguous adjectives, while vdataset are for learning the spelling of othographically ambiguous verbs. *_f.arff are the datasets after perfomring Synthetic Minority Oversampling (SMOTE).

Spyridon Sagiadinos, Petros Gasteratos, Vasileios Dragonas, Athanasia Kalamara, Antonia Spyridonidou, Katia Kermanidis. Knowledge-Poor Context-Sensitive Spelling Correction for Modern Greek. Artificial Intelligence: Methods and Applications. Lecture Notes in Computer Science, Volume 8445, 2014, pp 360-369. Springer.