디지털언어지식콘텐츠연구센터 DICORA 디코라 Prof. Jee Sun NAM 남지순 교수

Information Extraction (IE)

Information extraction (IE) is a type of information retrieval whose goal is to extract structured information from unstructured or semi-structured machine-readable documents. In most of the cases, this concerns processing human language texts by using natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and concept extraction out of images, audio, video could be considered as sub-fields of information extraction.

IE is the automatic identification of selected types of entities, relations or events in free texts. It covers a wide range of tasks, from finding all the company names in a text, to finding all the murders, including who killed whom, when, and where. Such capabilities are increasingly important for sifting through the enormous volumes of on-line text for the specific information which is required.

Its applications are still scarce. A few well known examples exist and other classified systems may also be in operation. It is certainly not true that the level of the technology is easy to build systems for new tasks or that the levels of performance are sufficiently high for use in fully automatic systems.

References

● www.wikipedia.org.

● Cowie, J. and Wilks, Y. (2000) Information extraction. In Dale, R. et al. (eds)

Handbook of Natural Language Processing. Marcel Dekker, New York.

● Mitkov, R. ed. 2004. The Oxford Handbook of Computational Linguistics. Oxford University Press: Oxford.