Como os search engines funcionam?

Fábio Ricotta

17 anos atrás

Olá Pessoal!

Finalmente tive tempo para organizar o PDF final do meu Trabalho de Conclusão de Curso e agora disponibilizo à todos. Espero que com este estudo eu possa ajudar muitos outros estudantes e interessados na área, a conhecerem mais sobre o funcionamento dos Search Engines.

Aquivo PDF: TCC – Como os search engines funcionam?

Autor: Fábio Carvalho Motta Ricotta
Orientador: Roberto Affonso Costa Junior
Universidade Federal de Itajubá
Instituto de Ciências Exatas
Departamento de Matemática e Computação

Resumo:

Um search engine é um web site especializado em buscar e listar páginas da internet a partir de palavras-chave indicadas pelo utilizador.
Os search engines surgiram com a intenção de prestar um serviço importante: a busca de qualquer informação na web, apresentando os resultados de uma forma organizada, e também com a proposta de fazer isto de uma maneira rápida e eficiente. Ele permite que uma pessoa solicite conteúdo de acordo com um critério específico (tipicamente contendo uma dada palavra ou frase) e responde com uma lista de referências que combinam com tal critério.
Os search engines baseiam sua coleta de páginas em um robô que varre a Internet à procura de páginas novas para introduzir em sua base de dados automaticamente. Eles possuem índices atualizados constantemente para operar de forma rápida e eficiente.
Quando um usuário faz uma busca, tipicamente digitando palavras-chave, o sistema procura o índice e provê uma lista das páginas que melhor combinam ao critério, normalmente com um breve resumo contendo o título do documento e, às vezes, partes do seu texto.
Este trabalho abordará as três grandes áreas da arquitetura do search engine, Web Crawling, Indexação Web e Busca Web, e seguida apresentará um exemplo ilustrativo desta arquitetura.

Abstract:

A search engine is a web specialized site in search and retrieve pages of the internet starting from keywords indicated by the user.
The search engines appeared with the intention of rendering an extremely important service: the search of any information in the web, presenting the results in an organized way, and also with the proposal of doing this in a fast and efficient way. It allows a person to request content in agreement with a specific (typically containing a given word or sentence) criterion and it shows a list of references that combine with such criterion.
The search engines base their collection of pages automatically on a robot that sweeps the Internet searching new pages to introduce in their database. They have indexes updated constantly to operate in a fast and efficient way.
When a user makes a search, typically typing keyword, the system seeks the index and provides a list of the pages that best combines to the criterion, usually with an abbreviation summary containing the title of the document and, sometimes, parts of web page text.
This work will approach the three great areas of the architecture of the search engine, Web Crawling, Web Indexing and Web Search, and following it will present an illustrative example of this architecture.