EECE Department Building

Electronics Lab

Laser Lab

Hammam Lab

Home | Research | M.Sc. And Ph.D Thesis | Automatic Diacritizer for Arabic Texts

Automatic Diacritizer for Arabic Texts

Thesis Title: 
Automatic Diacritizer for Arabic Texts
Name: 
Mohammad Ahmed Sayed Ahmed
Date of Birth: 
Tue, 11/08/1981
Nationality: 
Egyptian
E-mail: 
Degree: 
Master
Previous Degrees: 
B.Sc. (ELC) 2003 - Cairo
Registration Date: 
Wed, 01/10/2003
Awarding Date: 
Tue, 14/07/2009
Supervisors: 
Examiners: 

Dr. Fahmy, A. A.
Dr. Fakhr, M. W.
Dr. Rashwan, M. A.

Key Words: 

Automatic Arabic diacritizations, Factorization, Unfactorization,
Morphological analyses, Natural language processing, Human
language technology

Summary: 

The problem of entity factorizing versus unfactorizing is one of the main
problems that face peoples in the human languages technologies field. As a case
study for this problem; this thesis studies the automatic Arabic text diacritization
problem. The thesis compares the diacritization through words factorization
using morphological analyses versus the diacritization through the words
unfactorization using full-form words. From the experimental results show that;
for small training corpus size; unfactorizing system is better since it can reach the
saturation faster than the factorizing one, but it may suffer from the OOV
problem, but for very large training corpus size; the two systems are almost the
same, except that the cost of the unfactorizing systems is lower. So, the best
strategy is using a hybrid of the two systems to enjoy the fast learning and low
cost of the factorizing system and the wide coverage of the factorizing one.