디지털언어지식콘텐츠연구센터 DICORA 디코라 Prof. Jee Sun NAM 남지순 교수

Information Retrieval (IR)

Information Retrieval (IR) concerns itself with the indexing and retrieval of information from heterogeneous and mostly-textual information resources. The term was coined by Mooers in 1951, who advocated that it be applied to the “intellectual aspects” of description of information and systems for its searching.

IR consists of retrieving information from stored data through medium or format like text, image, video, speech, databases, and often combines media. It has a well-established history, and has already reached an initial level of maturity that is deployed in industry and business. Whereas, in past years, IR was primarily required by the people of sub-specialities such as business, law, and medicine, now users who simply want effective Internet searching are pushing the research community to solve information-finding needs. Increasing network transmission speed and capacity promise to bring even more impetus to this field. Finally, globalization adds yet another dimension to the need for powerful information retrieval across languages.

Many different measures for evaluating the performance of information retrieval systems have been proposed. The measures require a collection of documents and a query.

● Precision = {Relevant Documents} ∩ {Retrieved Documents} / {Relevant Documents}

● Recall = {Relevant Documents} ∩ {Retrieved Documents} / {Relevant Documents}

● F-measure (F) = 2ㆍPrecisionㆍRecall / (Precison+Recall)

Traditionally, IR has concentrated on finding whole documents consisting of written text. Now many IR research focuses more specifically on text retrieval–the computerized retrieval of machine readable text without human indexing.

References

● www. wikipedia.org.

● Mitkov, R. ed. 2004. The Oxford Handbook of Computational Linguistics. Oxford University Press: Oxford.

● Text REtrieval Conference (TREC). http://trec.nist.gov