Graphic arts

Prospects for the development of search engines. Modern search engines, development trends of one of the Yandex market leaders - Abstract

A variety of technologies and methods created over the years of development of the theory and practice of information retrieval find their application in modern information systems. Along with the classic library IPS, which continue to improve, intensive development is taking place in the field of global Internet IPS, which has become the main driving force modern technologies information retrieval. The gigantic amount of available information resources requires the use of scalable search algorithms. Hypertexts allow the use of fundamentally new search models based on the semantic analysis of document collections. The high speed of updating pages, their free placement and the lack of a guarantee of constant access leads to the need for constant re-indexing of relevant information resources.

Finally, the heterogeneous composition of users, who often do not have the skills to work with a search engine, forces us to look for effective ways to formulate queries that work with minimal initial information.

6.1. Dictionary information retrieval systems

Vocabulary IPS are by far the fastest and most effective search engines most widely used on the Internet. The search for the necessary information in the dictionary IPS is carried out by keywords. Search results are generated in the course of the work of one or another search algorithm with a dictionary and a query compiled by the user in the ISL.

The structure of the vocabulary IPS (Figure 13) consists of the following components: document viewer, user interface, search engine, search image database, and indexing agent.

The information array includes information resources potentially available to the user. This includes text and graphic documents, multimedia information, etc. For the global IPS, this is the entire Internet, where all documents are characterized by a unique URL address (URL - Uniform Resource Locator).

The search engine interface defines the way the user interacts with the IPS. This includes the rules for generating queries, the mechanism for viewing search results, etc. The interface of Internet search engines is usually implemented in a web browser environment. Appropriate software is used to work with sound and video information.

The main function of the search engine is the implementation of the accepted search model. First, the user's request, prepared in the ILP, is translated according to the established rules into a formal request. Then, during the execution of the search algorithm, the query is compared with the search images of documents from the database. Based on the results of the comparison, a final list of found documents is formed. It usually contains the name, size, date of creation and a brief annotation of the document, a link to it, as well as the value of the similarity measure of the document and the query.

Fig.13. The structure of the vocabulary IPS.

The list is subject to ranking (ordering by some criterion, usually by the value of formal relevance).

The database of search images of documents is designed to store descriptions of indexed documents. The structure of a typical IPS vocabulary database is described in detail in Part 1 of the Guidelines.

The indexing agent performs indexing of available documents in order to compile their search images. In local systems, this operation is usually carried out once: after the completion of the formation of an array of documents, all information is indexed and search images are entered into the database. In a dynamic decentralized information array of the Internet, a different approach is used. A special robot program, which is called a spider (spider) or crawler (crawler), continuously bypasses the network. Transitions between different documents are carried out using the hyperlinks contained in them. The rate at which the information in the search engine database is updated is directly related to the rate at which the network is crawled. For example, a powerful indexing robot can traverse the entire Internet in a few weeks. With each new crawl cycle, the database is updated and old invalid addresses are removed.

Some documents for search engines are closed. This is information that is accessed or accessed not through a link, but upon request from a form. Currently, intelligent methods for scanning the hidden part of the Internet are being developed, but they have not yet received wide distribution.

To index hypertext documents, agent programs use sources: hypertext links (href), headings (title), headings (H1, H2, etc.), annotations, lists of keywords (keywords), image captions. URLs are used to index non-text information (for example, files transferred via ftp protocol).

The possibilities of semi-automatic or manual indexing are also used.

In the first case, administrators leave messages about their documents, which the indexing agent processes after some time; in the second, administrators enter the necessary information into the IPS database on their own.

An increasing number of IPSs are performing full-text indexing. In this case, the entire text of the document is used to compile the search image. Formatting, links, etc. in this case become an additional factor affecting the significance of a particular term. The title term will receive more weight than the figure caption term.

Modern large ISs must process hundreds of requests within a second. Therefore, any delay can lead to an outflow of users and, as a result, to the unpopularity of the system and commercial failures. From the point of view of architecture, such IPS are implemented as distributed computing systems consisting of hundreds of computers located around the world. Search algorithms and program code are subject to extremely thorough optimization.

In IPS with a large document base, technologies are used to speed up their work. separation and pruning .

separation consists in dividing the database into obviously more relevant and less relevant parts. First, the IPS searches for documents on the first part of the database. If no documents are found or not enough documents are found, then the search is performed in the second part.

Using pruning (Pruning - English reduction, removal) processing of the request is automatically terminated after finding a sufficient number of relevant documents.

Also widely used threshold search models , which define some threshold values for the characteristics of documents issued to the user. For example, the relevancy of documents is usually limited to some relevancy value

All documents with a relevance value are offered to the user's attention

In the case of ranking search results by date, the threshold values determine the time interval for the date the documents were modified. For example, the IPS can automatically cut off documents that have not changed in the last three years.

The main advantage of dictionary-type IPS is its almost complete automation. The system independently analyzes search resources, compiles and stores their descriptions, and searches among these descriptions. The wide coverage of Internet resources is also one of the advantages of such systems. Significant volumes of databases make vocabulary IPS especially useful for exhaustive searches, complex queries, or for localizing obscure information.

At the same time, the huge number of documents in the system database often leads to too many documents found. This makes it difficult for most users to analyze the information found and makes a quick search impossible. Automatic indexing methods cannot take into account the specifics of specific documents, and the number of non-pertinent documents among

found by such a system is often large.

Another disadvantage of the dictionary IPS is the need to formulate queries to the system in a special language. Although there is a trend towards the convergence of ISL with natural languages, today the user must have certain skills in formulating queries.

KOVROV STATE TECHNOLOGICAL ACADEMY

Information and analytical information on informatics

on the topic: “Modern search engines, development trends of one of the Yandex market leaders”.

Completed by: 1st year student

3 academic groups

Makarov Ivan

Introduction. 3

Main part. 4

Conclusion. eleven

Introduction.

Yandex is a Russian IT company that owns a search engine of the same name on the Web and an Internet portal. The Yandex search engine is the eighth largest search site in the world in terms of the number of processed search queries (1.290 billion, statistics for August 2009) and the second largest non-English search server after the Chinese Baidu.

The company's website was opened on September 23, 1997. 2000 is the year of the formation of Yandex. Yandex was founded by CompTek (the company that developed the Yandex search engine and supported it). The company reached self-sufficiency in 2002, turnover for 2006 - 72.6 million dollars, net profit - 29.9 million, for 2005 - 35.6 million dollars, net profit - 13.6 million.

The main and priority direction of the company is the development of a search engine, but over the years Yandex has become a multi-portal. In 2009, Yandex has more than 30 services. The most popular are: Yandex.News, Yandex.Fotki, Yandex.Toys and others.

The main office of the company is located in Moscow. The company has offices in St. Petersburg, Yekaterinburg, Odessa, Simferopol and Kyiv. In mid-June 2008, the company announced the opening of Yandex Labs - an office in the US, California.

Main part.

History of the company.

The Yandex.Ru search engine was officially announced on September 23, 1997 at the Softool exhibition. The main distinguishing features of Yandex.Ru at that time were checking the uniqueness of documents (excluding copies in different encodings), as well as the key properties of the Yandex search engine, namely: taking into account the morphology of the Russian language (including search by exact word form), search taking into account distances (including within a paragraph, the exact phrase), and a carefully developed algorithm for assessing relevance (correspondence of the answer to the request), taking into account not only the number of query words found in the text, but also the "contrast" of the word (its relative frequency for this document) , spacing between words, and the position of the word in the document.

A little later, in the section "Tales" (observations on the content of the Russian Internet), the first tale of the Runet appeared - "Web - humanism or chernukha?". And in the "Numbers" section - the first estimate of the volume of the Runet, 5 thousand servers and 4 GB of texts.

Two months later, in November 1997, a natural language query was implemented. From now on, Yandex.Ru can be accessed simply “in Russian”, asking long queries, for example: “where to buy a computer”, “genetically modified products” or “international telephone communication and get accurate answers. The average length of a query in Yandex.Ru is now 2.7 words. In 1997, it was 1.2 words, when search engine users were accustomed to telegraphic style.

In 1998, Yandex.Ru introduced the ability to “find a similar document”, a list of found servers, search in a given date range, and sorting search results by time. last change. During this year, the "volume" of the Russian Internet has doubled, which led to the need to optimize search engines. Both then and now (with a volume of 200 GB), the search speed on Yandex.Ru is a fraction of a second.

During 1999, the Runet grew by an order of magnitude, both in the volume of texts and in the number of users. It was a year of rapid development for Yandex.Ru as well. The new search robot made it possible to optimize and speed up the bypass of Runet sites. Today, the Yandex.Ru search base is twice as large as that of its closest competitors.

The new robot made it possible to provide users with new features - search in different text areas (headings, links, annotations, addresses, captions for pictures), limiting the search to a group of sites, searching for links and images, and highlighting documents in Russian. There was a search in the categories of the catalog and for the first time in Runet the concept of "citation index" was introduced - the number of resources that refer to this one.

Throughout the year, work continued on the quantitative and qualitative analysis of the Runet. The NINI-index was opened (index "Inconsistency of Interests of the Population of the Internet"), showing the dynamics of changes in the interests of Internet users. A search Forum and a new service have been opened - a subscription to a request, that is, you can leave your request on Yandex.Ru and regularly receive information by e-mail about the appearance of new and / or modified documents corresponding to this request. By the beginning of the school year, the "Family Yandex" was opened, filtering search results from obscene language and pornography.

The origin of the word "Yandex".

Today "Yandex" is a word from the everyday life of an Internet user. It is often found on the Web “What, Yandex has already been canceled?”, “Loneliness is when Yandex is the first to congratulate you on your birthday”, “All questions to Yandex”. Many already think that this has always been the case. In a way, this is true - Yandex really appeared simultaneously with the mass Internet, when access to the network ceased to be the lot of selected technical specialists. But the very word "Yandex" is artificial, has its own authors and its own history.

1993 Arkady Volozh, future CEO future company"Yandex", and Ilya Segalovich, the future director of technology of the company, developed, as it turned out later, the main technology - the search for unstructured information, taking into account the Russian language.

The development had to be named somehow. Ilya remembers how he wrote down different derivatives of words describing the meaning of technology in a column. It quickly became clear that search (“search”) in Russian sounds too dissonant and you can’t make a successful combination based on it. The word index was more suitable. So yandex appeared in the list of names - yet another indexer ("another indexer" or Language index). Both Ilya and Arkady liked the option - it is easy to pronounce, easy to write. In addition, Arkady suggested the letter "I" in the name - specifically Russian - Russian and leave it for clarity. So the word "Yandex" was invented. And the program file, respectively, was called yandex.exe.

In 1996, when search was offered to the general public for the first time as a technology, and not as part of a content product (before that, there were the International Classifier of Inventions and the Bible Computer Reference), the line of programs was called Yandex and this name was explained as Language iNDEX. The first programs in the line were Yandex.Site (search on one of your own sites - this product is now called Yandex.Server) and Yandex.Dict (morphological prefix for AltaVista, the only search engine that at that time knew how to somehow work with Cyrillic) .

But, of course, the word "Yandex" has become widespread since September 1997, after the launch of the search engine www.yandex.ru. Since then, users of the system have been offering us their interpretations. For example, Tyoma Lebedev, preparing to draw the first version home page Yandex website, said: “Ah, I understand, if the first “I” in the word index is translated into Russian, it will be “I”, that is, this will be “Yandex”. The authors honestly admitted that they did not think about it, but - a good interpretation, is accepted. Then someone on the Web suggested another option, seeing the two sides of the Internet, INdex and YANDEX. This word has already appeared derivatives, for example, Yandex employees are often called "Yandexoids" and less often - "Yandexians".

Search "Yandex".

Yandex search allows you to search the Runet, Uanet, and Kaznet (since October 14, 2009) for documents in Russian, Ukrainian, Belarusian, Romanian, English, German and French, taking into account the morphology of Russian and English and the proximity of words in a sentence. Since the beginning of 2006, Yandex search has been installed on the Mail.ru portal.

In addition to HTML web pages, Yandex indexes documents in PDF (Adobe Acrobat), Rich Text Format (RTF), Microsoft Word, Microsoft Excel, Microsoft PowerPoint, SWF (Macromedia Flash), RSS (blogs and forums) binary formats.

A distinctive feature of Yandex is the ability to fine-tune the search query. This is implemented using a flexible query language. So, for example, for the exclusion operation, you can specify the scope: the query A ~ ~ B will find documents (pages) in which A is present, but C is not present, and the query A ~ B will find documents where the word B is not present with the word A in one sentence. Similarly, the & operator looks for combinations keywords in a sentence, and && in the entire document.

Operator! allows you to disable morphology for a specific word as well!! allows you to specify the normal form, which allows you to get around some problems associated with homonymy. For example, the query !!Ivanov will find Ivanov and Ivanov, but not Ivanov.

By default, Yandex displays 10 links on each results page; in the search results settings, you can increase the page size to 20, 30, or 50 found documents. Sometimes the order of the sites on these pages may differ, since the databases for these results are not updated at the same time.

If there are a lot of links found for the query, the results page suggests limiting the search range - by region (that is, by IP range) or by date. If nothing is found for any word or words, it is proposed to replace it / them with similar ones (since the proposed options depend on the frequency of finding similar words, funny situations sometimes arise). Also, it is proposed to correct the words typed in the wrong keyboard layout.

From time to time, the Yandex algorithms responsible for the relevance of the issue change, which leads to changes in the results of search queries. The last officially announced changes were in March 2004, April 2005 and January 2007; according to unofficial information, there are much more of them (for example, the last one in August-September 2007).

In particular, these changes are directed against search spam, which leads to irrelevant results for some queries (less often for entire families of queries). Against search spam, which is not automatically filtered out, semi-automatic and manual moderation of the issuance (with the help of the so-called "white hat optimizers") is used, as well as a direct refusal to index "malicious" sites.

Owners, management and performance indicators.

More than 30% of the company, according to its own data, belongs to the investment funds ru-Net Holdings and Baring Vostok Capital Partners, 15% - to the Tiger Technologies fund, about 30% - to the founders of the company and 20% - to managers and other minority shareholders.

In mid-September 2009, it became known that the parent company of Yandex, the Dutch company Yandex N.V., issued a priority share, which was transferred to Sberbank for a symbolic 1 euro. The only right that a share gives is to veto the sale of more than 25% of the company's shares.

Management: Rkady Volozh - General Director, Ilya Segalovich - Technical Director, Elena Kolmanovskaya - Editor-in-Chief, Alexei Tretyakov - Commercial Director, Svetlana Kondrashova - Advertising Director.

All Yandex services.

Information retrieval:

Search and ya.ru

Directory - a directory of websites sorted by citation index. It is replenished manually by catalog editors, there is a possibility of paid registration.

News - The top news of the day, sourced from the mainstream media featured on the Internet. It is possible to search by news, as well as subscribe to news for a given search query.

Yandex.XML - using this service, you can make automatic search queries to Yandex in xml format.

Search on blogs and forums - search for resources that have an RSS-representation, as well as a rating of current queries, popular categories and news.

Market - search for offers for the sale of goods and services, selection of models.

"Meditative" search is the only search service in the world that has a "Search" button, but no search bar.

Dictionaries - encyclopedias, reference books, translation dictionaries.

Pictures - image search.

Video - video search.

Maps - maps of Europe and Russia, maps of major cities of the Russian Federation (up to the house), search on the map, as well as the ability to "wander" through the streets of some cities. [source?]

Addresses - search for contact information by the names of firms and organizations.

Poster - information about available events: cinema, theater, concerts, sports, clubs, etc.

Weather - weather forecast.

TV program - programs of central, regional and satellite channels TV.

Timetables - timetables for trains and planes.

Personalized:

Yandex.Video - video hosting and video search.

Mail - email.

Ya.ru is a blogging service.

Yandex.Fotki - photo hosting.

Spam defense - spam filtering.

People - free hosting for personal web pages, as well as a file storage service.

Yandex money - payment system, which allows you to pay for goods and services on the Internet.

Bookmarks is a bookmark storage system integrated with Yandex. bar."

Subscriptions - subscription to news.

Feed - online RSS reader

Yandex.Direct is a system for placing contextual advertising with pay per click.

The Cup is a regular Internet search competition.

Cities - Internet indices of Russian cities.

Tariff - search by tariffs of Internet providers.

Postcards

Spring - automatic generation of philosophical essays.

Internet - measures the speed of the Internet connection.

Mirror - A mirror of major Linux OS distributions, as well as FreeBSD and other projects.

Yandex. Local network - provides an opportunity to use all Yandex services not at the federal, but at the local rate.

Metrica - allows you to measure traffic, analyze user behavior and evaluate the effectiveness of advertising campaigns.

Software products:

Spam filter Spamodefense for corporate use (paid).

A program for searching Yandex Desktop Search files on a computer.

Ya.Online instant messaging program based on Jabber. It also allows you to receive notifications about new letters from Yandex. Mail, about new events from sites Odnoklassniki.ru and VKontakte.

The Punto Switcher program is an automatic layout switcher.

Widgets for operating systems Mac OS X and Windows Vista, as well as for the Opera browser: Search, Traffic, Clock, News.

Yandex ICQ - a special version of the ICQ client with symbols and integration of some services from Yandex.

Interesting facts.

1) The average length of a query in Yandex.Ru now is 2.7 words. In 1997, it was 1.2 words, when search engine users were accustomed to telegraphic style.

2) Yandex appeared before www.yandex.ru. The word Yandex was invented in 1993, and it was publicly uttered in 1996 and then meant not a company or a search engine, but a search technology on its own server and a morphological prefix to the Altavista.com search engine.

3) www.yandex.ru was launched to demonstrate the capabilities of Yandex technology, no one thought about making money on advertising.

4) The slogan “There is everything” was invented in 2000. In the same year, Yandex launched the first advertisement for the website on Russian television.

5) According to Yandex itself, about 80 percent of its audience is from Russia, about 3 percent from Europe, and just over 1 percent from the United States.

6) Some of the Yandex technical support staff operates under the collective pseudonym "Platon Shchukin".

Conclusion.

So now we have full information about Yandex. We know who manages it, how it works from the inside, what is the history of the company's development and much more. Now we can easily understand why Yandex is the leader in the Russian and global markets. I think the main reason for the success of Yandex is that the search engine copes well with the complexities of the Russian language. That is why search engines that were developed for English cannot index and rank Russian-language documents as well. The second advantage I see is the creative, friendly, cheerful slogans with which Yandex attracts users to use its services. Thematic pictures that Yandex places near its search line are much more accessible for a Russian user.

Leaders, trend growth in the number of proposals will continue. Those present today market electronic payment systems... more one milestone event: Paycash signed an agreement with the largest search engine system ...

Volga Federal District: contemporary status and prospects development(on the example of the Republic of Tatarstan)

Coursework >> Economics

... trends further development. ... leader. ... development one from the most important ... complex search and aerobatic... market. Development ... contemporary technologies, high-performance equipment, contemporary... supertoxicants; - development systems land monitoring...

Modern sociological problems of physical culture and sports

Abstract >> Sociology

To promote political leaders, parties, ... the total subject-object system socio-pedagogical ... creative search engine activity... market and the state. Market ... Trends development contemporary Olympic Movement Russia is one from ...

Trends development oil industry in the global economy

Abstract >> Economics

World market oil: trends development and... already carried out search-exploration work, ... Preliminary assessment. leader in world consumption... is one from essential elements contemporary world economic... world economic system, at the time...

Modern search engines are the most powerful hardware and software systems, the purpose of which is to index documents on the Internet to provide data at the request of users.

To provide quality and relevant information, search engines have to constantly improve their ranking formulas. Ensuring the highest possible quality of search results for users and preventing optimizers from manipulating it are the key goals for the development of search engines.

At a time when search engines were just beginning to appear, their ranking algorithms were very primitive. Thanks to this, the most resourceful optimizers began to promote their sites so that they appear in the search results for the queries they are interested in. As a result, this led to the fact that resources that often did not carry any useful information to the user became the first, thereby relegating more useful sites to the background.

In response to these actions, search engines have become defensive, improving their ranking algorithms, introducing more and more variables into the formulas and taking into account all the new factors. Over time, this struggle between optimizers and search engines has shifted to new level and contributed to the emergence of more advanced algorithms based, including on machine learning.

Stages of development of search engines:

As you can see from the diagram, the development of search engines and their algorithms goes in circles. Some create new algorithms, others adapt to them. It is difficult to say whether this process will ever stop, but personally I am inclined to believe that it will not. Despite the fact that search engine ranking algorithms have recently not only changed the significance of various factors, but also changed qualitatively, this does not frighten optimizers: their arsenal is constantly replenished with more and more new techniques.

How often do search engines change their algorithms?

Let's turn to the main search engine of the Runet - Yandex. Qualitative and fundamental changes in the ranking formulas in it occur on average once a year. Not so long ago, Yandex introduced a new search platform called Kaliningrad. Its essence lies in the formation of personal results for each user based on his search history and preferences.

In addition, do not forget that every search engine, including Yandex, constantly has “twisting” ranking formulas, when in automatic or semi-automatic mode the influence of certain factors is underestimated, while others, on the contrary, are increased. All this is done with only one goal - to improve as much as possible. search results, ridding it of sites that do not meet the needs of users, and thereby increase its relevance.

Looking at the changes in the Google search engine, you can see that the transformation of the ranking formula also occurs constantly, and Google itself reports hundreds of small changes from year to year. But if we are not talking about the ranking formula, but about the filters that help Google clear the search results from low-quality sites, then new versions of algorithms, such as Panda or Penguin, appear every 3-6 months.

The answer to the above question can be as follows: search engines are constantly improving their ranking algorithms, and cardinal changes occur on average once every 6-12 months.

Which search engine algorithms pose a real threat to promotion?

I would like to answer “at the rally” - none, but still, let's figure it out. And for this we need to ask ourselves the question - do search engines make it their goal to prevent search promotion?

I think not. There are several reasons for this:

1. Optimizers help search engines improve their algorithms, which ultimately leads to better search results. After all, if there were no optimizers, then search engines, most likely, would have stopped in their development in the year 2000.

2. Without optimizers, the results for many commercial queries would look like a collection of abstracts and useless informational articles.

If search promotion did not exist in principle, then it would not make sense for search engines to grow and develop as intensively as they do now.

Thus, we come to the following conclusion:

Search engines and SEO are closely and inextricably linked with each other. That is why, following the rules established by them, you can absolutely not be afraid of algorithms, because PS do not aim to destroy SEO as such.

Development of search engine services

Speaking of search engines, do not forget that Yandex, Google or Bing have their own services designed to help users. In addition to search results, over the years of evolution, PSs have studied the behavior of their users in order to increase satisfaction with the results of issuance.

Actually, for this, the Yandex search engine came up with a mechanism for the so-called. "Sorcerers" who help the user quickly get an answer to their question. So, for example, when entering the query "weather forecast", Yandex will display information about the weather for the current date directly on the search results page, thereby saving the user from having to navigate through the results of the issue.

Other search engines, such as Google, went further and instead of "Sorcerers" offered a more interesting solution - the "Knowledge Graph".

“Knowledge Graph”(from English Knowledge Graph) is the first step on Google's path to intelligent search. Thanks to this innovation, the search engine displays in the search results not only standard links, but also direct answers to user questions, a brief reference about the query object and information about facts related to it. Technically, the “Knowledge Graph” is a semantic network that links together various entities: individuals, events, spheres of life, things, categories. Information base for the “knowledge graph” is whole line Sources: Freebase open semantic database, Wikipedia, CIA open data collection and other sources.

What conclusions can be drawn, you ask?

The answer is simple: search and search services will continue to develop towards quick and relevant answers to user questions, providing the opportunity to get all the necessary information directly in the SERP (issue) and eliminating the need to go to other sites.

There is an opinion that search engines, by their desire to answer the user's question here and now, can destroy search engine optimization, becoming a kind of global knowledge base. But such fears are groundless, because in order to become global knowledge bases, they need information, and it is stored by the very sites that the very optimizers work on, who are involved in the fact that search engines do not stand still, but constantly evolve.

As you can see, both SEO and search engines are links in the same chain that cannot exist without each other. Therefore, thoughts about the imminent death of SEO are unfounded. It is quite possible that search engine optimization evolves over time, for example, into consulting, but certainly will not die. I wish you all a successful promotion to the TOP!

To search in the index, the user must formulate a query and send it to the search engine. The request can be very simple, at least it should consist of one word. To build more complex query you need to use boolean operators to refine and expand the search conditions.

The most commonly used Boolean operators are:

AND - all expressions connected by the "AND" operator must be present on the searched pages or documents. Some search engines use the "+" operator instead of the AND word.
OR - at least one of the expressions connected by the "OR" operator must be present on the searched pages or documents.
NOT - the expression or expressions following the "NOT" operator should not (should not) appear on the searched pages or documents. Some search engines use the "-" operator instead of the word NOT.
FOLLOWED BY - one of the expressions must immediately follow the other.
NEAR - one of the expressions must be at a distance from the other, no more than the specified number of words.
Quotes - Quoted words are treated as a phrase to be found in a document or file.

Prospects for the development of search engines

The search given by boolean operators is literal - the machine searches for words or phrases exactly as they are entered. This can cause problems when the entered words are ambiguous. For example, the English word "Bed" can mean a bed, a flower bed, a place where a fish spawns, and much more. If the user is only interested in one of these meanings, he does not need pages with a word that has other meanings. It is possible to build a literal search query aimed at cutting off unwanted values, but it would be nice if the search engine itself could provide appropriate assistance.

One of the variants of the search engine is a conceptual search. Part of this search involves the use statistical analysis pages containing user-entered words or phrases to find other pages that might be of interest to that user. It is clear that conceptual search needs to store more information about each page, and each search query will require more calculations. Many development teams are currently working on improving the performance and performance of these types of search engines. Other researchers have focused on a different area, which is called natural-language queries (natural-language queries).

The idea behind natural language queries is for the user to formulate the query in the same way as they would ask the person sitting next to it - without having to keep track of boolean operators or complex query structures. The most popular natural language search site today is AskJeeves.com, which analyzes the query to identify keywords that are then used to search the site index built by this search engine. This site only handles simple searches, but the developers are in a highly competitive environment developing a natural language search engine capable of handling very complex queries.

Introduction

3.1 Gopher

3.2 WAIS

3.4 AltaVista

3.5 Opentext

3.6 Infoseek

4. Search robots

5.1 Rambler

5.2 Yandex

5.3 Port

6.1 Google

6.2 Yahoo

7.1 Baidu search engine

8. Prospects for the development of search engines

Conclusion

Bibliography

Introduction

Each user on the Internet can find a lot of diverse and interesting information, as well as use all the richest possibilities of the network. The selected topic of the essay is very relevant today, because. search engines are indispensable today, due to the extremely frequent visits world wide web. Internet resources have become a tool for the daily work of people of many professions. The rapid growth of information on the web has made it an ocean of diverse data, the importance of which grows in proportion to their volume. According to experts, the volume of information transmitted via the Internet is doubling every six months. Every day, millions of new documents appear on the web, and it is natural that without search engines, the vast majority of them would remain unclaimed, they would not be found by anyone at all, and all that huge amount of information would turn out to be useless. There was a need to create such tools that would make it easy to navigate the information resources of global networks, quickly and reliably find the information you need. There are special search tools on the Internet. A few years ago, there was such an opinion: the Internet has everything, but it is impossible to find anything there. However, with the advent and rapid development of search directories, search engines, and all kinds of search programs, the situation has changed, and now urgently needed information can sometimes be found on the Web faster than in a book lying on the table.

Unfortunately, search engines often fail to accurately and fairly interpret resources. As a result, sites "far" from the issue being solved often appear in the first positions of the search. At the same time, resources that are of real use are "overboard" in the search.

search engine internet robot

The reason for this situation is simple and lies in the technology of obtaining and presenting results by search engines. Paradoxical as it may seem, this is not the fault of search engines, since they are obliged to hide the rules for building search indexes. It is the fault of the technology itself in organizing the search

A search engine is software that provides access to a collection of semi-structured information. Orientation to semi-structured data, i.e. data that cannot be represented as a relational table distinguishes a search engine from a DBMS.

AT this definition search engine means information of various kinds, i.e. text, audio, video, images, etc. However, it should be noted that it is textual data that is ideal for describing the full functionality of a search engine, because multimedia information search algorithms are primarily based on text search algorithms.

The main task of the search engine is to minimize the time spent by the user searching for the desired information. The question is, what information does the user want? In some circumstances, relevant information can be defined as all information in the database that is relevant to a query. Traditionally, two main characteristics are applied to the search engine: accuracy and completeness, or rather, their dependence. Each time the user submits a query to the system, thereby initiating a search, all documents in the search engine's collection are divided into four parts. Accuracy determines one aspect of search, namely how well a search engine is able to minimize the time a user spends searching for relevant information. this request information. While completeness determines another aspect - how well the system is able to find information relevant to a given query. One can choose the optimal query(s) when every document found is relevant and every relevant document is found.

Search engines play a very important role when using the Internet. There is so much information on the Internet that its search is already turning into a separate task and takes a lot of time. Search engines give thousands of links per query instead of a few pages where there really is necessary information. Users of the World Wide Web, realizing the benefits provided by the ability to analyze spatial data, need a tool that allows them to quickly and easily search and access digital images of the terrain and other spatial information concentrated in many government, commercial and academic organizations.

1. The history of the development of search engines

One of the first ways to organize access to information resources of the network was the creation of catalogs of sites in which links to resources were grouped according to the subject. The first such project was the Yahoo website, which opened in April 1994. After the number of sites in the Yahoo directory increased significantly, the ability to search for information in the directory was added. This, of course, was not a search engine in the full sense, since the scope of the search was limited only to the resources present in the catalog, and not to all resources on the Internet.

Link directories were widely used in the past, but have practically lost their popularity at the present time. The reason for this is very simple - even today's catalogs, containing a huge amount of resources, provide information about only a very small part of the Internet. The largest network directory DMOZ (or Open Directory Project) contains information about 5 million resources, while the search base Google systems consists of more than 8 billion documents.

The first full-fledged search engine was the WebCrawler project, which appeared in 1994.

In 1995, the search engines Lycos and AltaVista appeared. The latter has been a leader in the field of information search on the Internet for many years.

In 1997, Sergey Brin and Larry Page created Google, the most popular search engine in the world today.

September 1997, the Yandex search engine was officially announced, the most popular in the Russian-speaking part of the Internet.

Currently, there are 3 major international search engines - Google, Yahoo and MSN Search, which have their own databases and search algorithms. Most other search engines (of which there are a lot) use the results of the 3 listed in one form or another. For example, AOL search (search. aol.com) and Mail.ru use the Google base, while AltaVista, Lycos and AllTheWeb use the Yahoo base.

In Russia, the main search engine is Yandex, followed by Rambler, Google.ru, Aport, Mail.ru and KM.ru

AltaVista-search system. The name "AltaVista" literally translates as "view from above".

Initially, the AltaVista search engine was a true innovator in the creation of search technologies. In 1995, Alta Vista was created as one of the equipment elements of the Digital Equipment Corporation (DEC) research laboratory. Having appeared, the AltaVista search engine quickly gained recognition from users and became a leader among its own kind. The main merit of the AltaVista system is the support for many languages, including Chinese, Japanese and Korean. Indeed, in 1997, not a single search engine on the Web worked with several languages, especially with rare ones.

In 1998 Compaq Computer Corporation bought DEC (along with AltaVista). And already in early 1999, AltaVista received the status of an independent division. That same year, Microsoft licensed the AltaVista search engine for use on its MSN site. Many people immediately began to use the services of indexing large amounts of information and the ability to instantly search in huge databases. At the same time, the address of the search engine remained the same - altavista. digital.com.

A set in address bar altavista.com led to AltaVista Technology website. As a result, the popularity of the search engine resulted in a huge influx of visitors to the AltaVista Technology site and a loss of potential search engine users. As a result, the altavista.com domain was purchased by Compaq for $3.35 million in August 1998 (the largest deal of its kind at that time). Despite this, Compaq never managed to profit from the search engine. Therefore, in June 1999, negotiations began between Compaq and CMGI Corporation to form a strategic network alliance, in which AltaVista was sold to CMGI. On August 19, 1999, CMGI Corporation announced that it had acquired an 83% stake in AltaVista from Compaq.

In February 2003, AltaVista was acquired by Overture Services, Inc., which was acquired by Yahoo in July 2003. From May 2011 AltaVista switched to search technology Yahoo.

The AltaVista search engine, on the other hand, sought to be a one-stop portal that included an online store, radio station, forums, chat rooms, personal photo albums, and more. But, due to huge cash injections, due to competition with other giant portals and published criticism from the same competitors, 2001 passes for the company under the motto of abandoning claims to portal status and "returning to the roots .

The company turned its activities in a different direction. Now www.altavista.com promotes its search engine to individual Internet users and licenses search technology to businesses, including for use on intranets. The main source of funding for the consumer version of the AltaVista search engine was advertising revenue, including from the most popular ones. For example, real search results are now placed after the link, for which AltaVista is paid by the owner of the respective resource.

Along with its efforts to become a portal, AltaVista continued to improve its search technology.

Also, another source of income for AltaVista is the development of corporate internal search engines.

Despite the obvious lagging behind the competitors, www.altavista.com is absolutely confident in its abilities. We hope that the Alta Vista company will fulfill all the plans and successfully return to its roots. . The AltaVista search engine (www.altavista.com) won the hearts of all Internet users at an early stage of its existence. Its history is a classic example of combining good technology with vague positioning.

2. How search engines work

Search and structuring tools, sometimes referred to as search engines, are used to help people find the information they need. Search tools such as agents, spiders, crawlers and robots are used to collect information about documents located on the Internet. These are special programs that search for pages on the Web, extract hypertext links on those pages, and automatically index the information they find to build a database. Each search engine has its own set of rules that determine how documents are found and processed. Some follow each link on each page they find, and then in turn examine each link on each of the new pages, and so on. Some people ignore the links that lead to graphics and sound files, animation files; others ignore references to resources such as WAIS databases; others are instructed to look at the most popular pages first.

Agents are the most "intelligent" of search tools. They can do more than just search: they can even carry out transactions on your behalf. Already, they can search for specific sites and return lists of sites sorted by their traffic. Agents can process the content of documents, find and index other types of resources, not just pages. They can also be programmed to extract information from pre-existing databases. Whatever information the agents index, they pass it back to the search engine database.

The general search for information on the Web is carried out by programs known as spiders. The spiders report the content of the found document, index it, and extract the resulting information. They also look at the titles, some of the links, and send the indexed information to the search engine's database.

Crawlers look at headers and only return the first link.

Robots can be programmed to follow different links of different nesting depths, perform indexing, and even check links in a document. Due to their nature, they can get stuck in cycles, so they require significant Web resources to follow links, however, there are methods designed to prevent robots from searching sites whose owners do not want them to be indexed.

Agents extract and index different kinds of information. Some, for example, index every single word in an encounter document, while others index only the most important 100 words in each, index the document's size and number of words, title, headings and subheadings, and so on. The type of index built determines what kind of searches can be made by the search engine and how the resulting information will be interpreted.

Agents can also surf the Internet and find information and then put it into a search engine database. Search engine administrators can determine which sites or types of sites agents should visit and index. The indexed information is sent to the search engine database in the same way as described above.

People can put information directly into the index by filling out a special form for the section in which they would like to put their information. This data is passed to the database.

When someone wants to find information available on the Internet, he visits a search engine page and fills out a form detailing the information he needs. Keywords, dates, and other criteria can be used here. The criteria in the search form must match the criteria used by agents when indexing the information they find while navigating the Web.

The database looks up the subject of the request based on the information provided in the completed form and outputs the corresponding documents prepared by the database. To determine the order in which the list of documents will be shown, the database uses a ranking algorithm. Ideally, the documents most relevant to the user's query will be placed first in the list. Different search engines use different ranking algorithms, but the basic principles for determining relevance are as follows:

The number of query words in the text content of the document (i.e. in the html code).

Tags in which these words are located.

The location of the search words in the document.

The share of words with respect to which relevance is determined in the total number of words in the document.

These principles apply to all search engines. And the ones below are used by some, but quite well-known (like AltaVista, HotBot).

Time - how long the page is in the database search server. At first glance, this seems like a rather nonsensical principle. But, if you think about it, how many sites exist on the Internet that live for a maximum of a month! If the site has been around for a long time, this means that the owner is very experienced in this topic and the user is more suitable for a site that has been broadcasting to the world about the rules of conduct at the table for a couple of years than one that appeared a week ago with the same topic.

Citation index - how many links to this page leads from other pages registered in the search engine database.

3. Comparative review reference and search systems

3.1 Gopher

Gopher was widespread on the Internet and was the forerunner of the World Wide Web. According to some reports, until 1995, Gopher was the fastest growing technology on the Internet. The growth rate of the number of eligible servers outpaced that of all other types of servers. In 1993, there were more than one and a half thousand gopher servers in the world. In fact, it was a system for distributed search and transmission of documents at the same time. Moreover, these features were not implemented as additional add-on services, like modern search engines, but were built into the system as its basic functions.

Help special program Veronica searched directly in the Gopher system using a special query language built on keywords. This system worked not only long before the advent of GOPHER (RFC-1436) is a system for searching and delivering documents stored in distributed depositories. The system was developed at the University of Minnesota (the coat of arms of this state depicts a hamster, in English gopher). The Gopher program presents the user with a sequence of menus from which they can select a topic or article of interest. The search object can be text or a binary file (in many repositories even text files stored in an archived and therefore binary form), a graphic or sound image. Gopher also offers gateways to other search engines such as WWW, Wais, Archie, Whois, and network utilities such as telnet or FTP. Gopher can offer more directory convenience than FTP. For access to global network Gopher uses a client-server model. The Gopher system is currently outdated, many of its servers are integrated into the WEB network. But gopher was the prototype of modern WWW interfaces, and that's what makes it interesting.

3.2 WAIS

WAIS is one of the most sophisticated search engines on the Internet. It does not implement only search by fuzzy sets and probabilistic search. Unlike many search engines, the system allows you to build not only nested Boolean queries, calculate formal relevance by various proximity measures, weight query and document terms, but also correct the query by relevance. The system also allows you to use term truncation, splitting documents into fields, and maintaining distributed indexes. It is no coincidence that this particular system was chosen as the main search engine for the implementation of the Britannica Encyclopedia on the Internet.

The distributed information system WAIS was conceived as a network analogue of traditional information retrieval systems (IRS), allowing network users to search in full-text databases using the information retrieval language traditional for IS, the search prescriptions of which are based on keywords and/or their truncations. , interconnected by logical operators 0R or AND.

Initially, the WAIS system was developed by four firms: Dow Jones and Co. (business databases); Think Machines Corporation (information retrieval systems); Apple Computer (user interface) and KPMG Peat Maverick (work with a large number of users). The first prototype of WAIS was a semi-commercial semi-research system with severe restrictions on use by both users and database administrators. The WAIS prototype understood natural English quite well and translated it into the system's search prescriptions. In reality, WAIS became widely used only with the advent of the FreeWAIS version for operating systems. UNIX systems. Today there is a large number of implementations of WAIS, mainly commercial, and the system has become a kind of standard information retrieval engine on Internet networks.

When working with WAIS, users do not have to spend a lot of time to find the materials they need.

There are more than 300 WAIS libraries on the Internet. But since the information is presented mainly by volunteers from academic organizations, most of the material is in the field of research and computer science.

3.3 WWW

WWW is a system for working with hypertext. It is potentially the most powerful search tool. Hypertext connects different documents based on a predefined set of words. For example, when a new word or concept is encountered in a text, a hypertext system makes it possible to navigate to another document in which that word or concept is discussed in more detail. Often used as an interface to WAIS databases, but the lack of hypertext links limits the possibilities. WWW to simple browsing like Gopher.

The user, for his part, can use the ability of the WWW to work with hypertext to link between his data and WAIS and WWW data in such a way that the user's own records are somehow integrated into the information for public access. In fact, this, of course, does not happen, but it is perceived that way.

3.4 AltaVista

Indexing in this system is carried out using a robot. In this case, the robot has the following priorities:

key phrases at the top of the page;

key phrases by the number of occurrences/presence of words/phrases;

If there are no tags on the page, it uses the first 30 words that it indexes and shows instead of a description (tag description)

The most interesting feature of AltaVista is advanced search. Here it is worth mentioning right away that, unlike many other systems, AltaVista supports a single NOT operator. In addition, there is also the NEAR operator, which implements the possibility of contextual search, when the terms should be located side by side in the text of the document. AltaVista allows searching by key phrases, while it has a fairly large phraseological dictionary. Among other things, when searching in AltaVista, you can specify the name of the field where the word should occur: hypertext link, applet, image name, title, and a number of other fields. Unfortunately, the ranking procedure is not described in detail in the system documentation, but it can be seen that the ranking is applied both in a simple search and in an extended query. In fact, this system can be attributed to a system with an extended boolean search.

3.5 Opentext

Information system OpenText is the most commercialized information product online. All descriptions are more like an advertisement than an informative job guide. The system allows you to search using logical connectors, but the query size is limited to three terms or phrases. In this case, we are talking about advanced search. When issuing the results, the degree of compliance of the document with the request and the size of the document are reported. The system also allows you to improve search results in the style of a traditional boolean search. OpenText could be classified as a traditional information retrieval system if it were not for the ranking mechanism.

3.6 Infoseek

In this system, the index is created by a robot, but it does not index the entire site, but only the specified page. In this case, the robot has the following priorities:

words in the title have the highest priority;</p><p>words in the keywords, description tag and the frequency of occurrences/repetitions in the text itself;</p><p>when repeating the same words side by side, it is thrown out of the index</p><p>allows up to 1024 characters for the keywords tag, 200 characters for the description tag;</p><p>if no tags were used, indexes the first 200 words on the page and uses it as a description;</p><p>The Infoseek system has a fairly advanced information retrieval language that allows not only to indicate which terms should be found in documents, but also to weight them in a peculiar way. This is achieved with the help of special signs "+" - the term must be in the document, and "-" - the term must be absent in the document. In addition, Infoseek allows you to conduct what is called contextual search. This means that using a special query form, it is possible to require consecutive co-occurrence of words. You can also specify that some words should occur together not only in one document, but even in a separate paragraph or heading. It is possible to specify key phrases that are a single whole, up to the order of words. Ranking when issuing is carried out by the number of query terms in the document, by the number of query phrases minus common words. All of these factors are used as nested procedures. In a nutshell, Infoseek refers to <a href="https://polarize.ru/en/office/avtorskie-akusticheskie-sistemy-avtorskaya-akustika-ili/">traditional systems</a> with an element of term weighting when searching.</p><p><b><i>4. Search robots</i> </b> <br></p><p>In recent years <a href="https://polarize.ru/en/internet/internet-vsemirnaya-pautina-eto-edinstvo-informacionnyh-resursov/">The World Wide Web</a> became so popular that now the Internet is one of the main means of publishing information. As the size of the Web grew from a few servers and a small number of documents to enormous limits, it became clear that manual navigation of much of the hypertext link structure was no longer possible, let alone <a href="https://polarize.ru/en/mobile/ubiraem-carapiny-s-ekrana-telefona-effektivnye-sposoby-i-metody/">effective method</a> resource research.</p><p>This problem has prompted Internet researchers to experiment with automated Web navigation, called "robots." A webbot is a program that navigates the hypertext structure of the Web, requests a document, and recursively returns all the documents that the document refers to. These programs are also sometimes called "spiders", "wanderers", or "worms" and these names are perhaps more attractive, however, can be misleading, since the terms "spider" and "wanderer" create a false impression that the robot itself moves , and the term "worm" could imply that the robot also reproduces like an Internet worm virus. In reality, the robots are implemented as a simple <a href="https://polarize.ru/en/program/otechestvennoe-programmnoe-obespechenie-dlya-kompyutera-rossiiskoe-programmnoe/">software system</a>, which requests information from remote parts of the Internet using standard network protocols.</p><p><b><i>5. The most popular Russian-language reference and search engines on the Internet</i> </b> <br></p><p><b><i>5.1 Rambler</i> </b> <br></p><p>The Rambler search engine began its existence in 1996. Today it is one of the most popular in RuNet, second only to Yandex (by popularity). According to SpyLog, Rambler accounts for 20-25% of all RuNet search queries.</p><p>The Rambler search engine takes into account the morphology of the Russian language when searching, which provides more opportunities for effective information retrieval. A system of so-called "dressings" has also been implemented, which allows you to display in the search results not only pages containing the query, but also words that are synonyms for the query. Another function of "dressings", I think more significant, is the issuance <a href="https://polarize.ru/en/photoshop/zarabotok-v-internete-s-momentalnoi-vyplatoi-luchshie-igry-s-vyvodom-deneg/">contextual advertising</a> not only for a specific request, but also for requests that are closely related to the original one, this allows you to overlap <a href="https://polarize.ru/en/game/mobilnyi-brauzer-s-bolshim-kolichestvom-nastroek/">large quantity</a> target audience.</p><p>Rambler is rightfully considered the first major advertising platform <a href="https://polarize.ru/en/windows/luchshie-platezhnye-sistemy-elektronnye-dengi-i-kripto-valyuty/">Russian Internet</a> and stands at the origins of the classic network advertising business. <br></p><p><b><i>5.2 Yandex</i> </b> <br></p><p>Today has the most <a href="https://polarize.ru/en/graphics/kak-eksportirovat-dannye-v-mysql-kak-importirovat-bolshuyu-bazu/">big base</a> data, which has a cluster structure and is hosted on several servers.</p><p>In 1996, the company CompTek, created with 100% American participation, officially announced the existence of Yandex at the Internetcom exhibition. It was a morphological prefix to "Altavista", which was distinguished by its speed and ability to build hypotheses. The word index for unfamiliar words is organized in the same way as for dictionary ones - this distinguishes Yandex from other search engines.</p><p>September 1997 "Yandex" became an Internet project. The relevance of documents was calculated depending on <a href="https://polarize.ru/en/photoshop/analiz-achh-amplitudno-chastotnaya-harakteristika-rea-i-ee/">frequency characteristics</a> search words, the weight of a word or expression, the proximity of the search words in the text of the document to each other, and so on. And the main innovation of this search engine, which required an inevitable restructuring of the core, is ranking by links. Other innovations relate mainly to the reformulation of user queries by the system: "what is an item" is converted to "item is.", and if the query begins with the word "how", then in the results they first of all try to issue a FAQ or other reference document . The new "Yandex" began to "understand" alternative vocabulary, which is included in 5 percent of requests. Only in the latest version of Yandex, the citation index began to be directly used by the search engine.</p><p>Currently, Yandex has the most complete database of documents among Russian search engines, as well as the most recognizable brand. <br></p><p><b><i>5.3 Port</i> </b> <br></p><p>The Aport search engine was first demonstrated in February 1996 at the Agama press conference on the opening of the Russian Club. Then she searched only on the site russia. agama.com. The creator of the system was the company "Agama" - developer <a href="https://polarize.ru/en/game/harakteristiki-programmnogo-obespecheniya-gis-primery-programm/">software</a> for the Windows platform, the main of which was the Spelling Corrector. The linguistic developments of "Agama" were used to create a search engine, in which, say, unlike "Rambler", the morphology of words was initially taken into account and the spelling of the request was checked at the request of the client.</p><p>The most important properties of the first version of "Aport" was the translation of the query and search results into English and vice versa, as well as the reconstruction of all indexed pages from its own database (which means the ability to view pages that no longer exist in the original).</p><p>"Aport 2000" became the first Russian search engine built on the basis of issuing results for individual sites. To separate resources into sites, information is used that "Aport" provides the AtRus catalog or information entered into "Aport" by the owners of the resources. At worst, you have to rely on an algorithm that allows you to select individual sites by some formal criteria.</p><p>Users of "Aport" (unlike the regulars of "Yandex") use advanced search little (for 8000 downloads of a simple page, there are 300 calls to the "Advanced search" page).</p><p><b><i>6. The most popular foreign search engines for a Russian-speaking user</i> </b> <br></p><p><b><i>6.1 Google</i> </b> <br></p><p>The name of the search engine Google was formed as a result of a play on letters in the word "googol". This company wants to emphasize their intention to index and process large amounts of information.</p><p>You can search Google in 10 different languages. You can also customize the interface to the language you need. For example, if you are looking for a German site, then you can enter a query in German, and all auxiliary interface labels will be in German.</p><p>A very handy feature is "cache". Thanks to this feature, the user can view the indexed page even if the page is deleted or the server where the page is located is unavailable. You can also use this feature to research your competitors, it also helps to better understand how a page is indexed by a search spider (robot).</p><p>With <a href="https://polarize.ru/en/computer/moe-mestopolozhenie-gugl-seichas-kak-posmotret-istoriyu-peremeshchenii-polzovatelya-android-ustroistva/">Google</a> you can find pages that are not contained in its database. This is possible because the search spider indexes the link text from the pages. <br></p><p><b><i>6.2 Yahoo</i> </b> <br></p><p>Surprisingly, this incredibly popular system, which serves millions of requests daily, began as a simple collection of bookmarks, which was replenished by only 2 people - David Filo and Jerry Yang. Today, Yahoo is no longer just a directory, it is a whole group of various services, including such as the Yahooligans directory - Yahoo for children, the My Yahoo personal channel system, the free E-mail service, the "Shop with Yahoo" system (buy with Yahoo ), a joint project with MTV MTV unfURLed and much more. Among all the systems reviewed, Yahoo is the only purely directory system; Yahoo does not have its own search engine. But the list of categories on Yahoo is the most complete and simple - unlike other directories, on Yahoo it is always easy to determine in which section the necessary information is located. The Yahoo home page loads very quickly - although it has a lot of links, they are all text. <a href="https://polarize.ru/en/office/chto-otnositsya-k-centralnoi-chasti-pk-osnovnye-komponenty-kompyutera-chto-i/">central part</a> page, of course, is occupied by a search box and a list of categories. Links at the top of the page (graphics) provide access to information such as "what's new", "what's good", "More Yahoos". <a href="https://polarize.ru/en/graphics/pri-zapuske-mozilly-otkryvaetsya-srazu-neskolko-vkladok-firefox/">last link</a> recommended to visit - it leads to a page with a huge number of links to a variety of Yahoo directories and services. When setting search criteria for Yahoo, keep in mind that Yahoo only looks for these words in the title and description of the page, because there is no full-text index on Yahoo. Therefore, you should not specify too many terms or synonyms when searching - the number of results from Yahoo will decrease or even be zero. The number of search results on Yahoo is naturally small, but most of them are relevant. For advanced search Yahoo offers a small but very useful set of tools. To get to the advanced search page, you need to follow the "options" link from the main Yahoo page.</p><p><b><i>7. Search engine market in China</i> </b> <br></p><p><b><i>7.1 Baidu search engine</i> </b> <br></p><p>Baidu was founded in 2000 - much later than the world leaders in web search, however, it literally broke into the top ten most visited sites in the world, this is facilitated by the rapid growth of the audience of Internet users in China (as of January 2010 - 360 million! ) .</p><p>The Baidu.com site in China is known to all Internet users: it is not only the most popular Chinese search engine, but also the most visited site in China (according to statistics from Alexa the Web Information Company, at the beginning of March 2010, Baidu is the 8th most visited site in the world). Baidu index contains about 800 million web pages (including more than 100 million - on <a href="https://polarize.ru/en/office/kak-proshit-kitaiskii-telefon-na-russkii-yazyk-proshivka/">Chinese</a>), about 100 million images and over 15 million media files.</p><p>According to the ComCore agency, Baidu processes over 10 billion search queries every month (for comparison: Yandex processes about 3 billion queries per month).</p><p>According to Shanghai-based Iresearch, Baidu controls 63% of the Chinese Internet search market (2nd <a href="https://polarize.ru/en/mobile/koordinaty-google-poisk-po-gps-koordinatam-na-karte-onlain/">Google location</a> - 33%).</p><p>In addition to its main purpose - search - Baidu provides users with the following services:</p><p>Baydupedia is a free and "correct" encyclopedia;</p><p>Baidu. Posts - numerous forums on various topics;</p><p>Baidu. Space - blog and photo album;</p><p>Baidu. Money - payment system;</p><p>Baidu. Download - own file-sharing system;</p> <script>document.write("<img style='display:none;' src='//counter.yadro.ru/hit;artfast_after?t44.1;r"+ escape(document.referrer)+((typeof(screen)=="undefined")?"": ";s"+screen.width+"*"+screen.height+"*"+(screen.colorDepth? screen.colorDepth:screen.pixelDepth))+";u"+escape(document.URL)+";h"+escape(document.title.substring(0,150))+ ";"+Math.random()+ "border='0' width='1' height='1' loading=lazy loading=lazy>");</script> </article> </div> <div class="full-place" data-place="article_before_social" data-priority="0" style="width: 100%;"></div> <div class="single-share"> <div class="single-share__top"> <div class="single-share__title">Liked the article? <span>Share with friends!</span> </div> <a href="#" onClick="window.open('http://www.facebook.com/sharer.php?url=https%3A%2F%2Fpolarize.ru%2Fgraphics%2Fperspektivy-razvitiya-poiskovyh-sistem-sovremennye-poiskovye%2F', 'sharer', 'toolbar=0,status=0, width=700, height=400'); return false" class="single-share__facebook"> <i class="fa fa-facebook-official"></i> <span>Share on <b>Facebook</b> </span> </a> </div> <div class="single-share__bottom"> <div class="single-share__group"> <a href="#" onClick="window.open('http://vk.com/share.php?url=https%3A%2F%2Fpolarize.ru%2Fgraphics%2Fperspektivy-razvitiya-poiskovyh-sistem-sovremennye-poiskovye%2F', 'sharer', 'toolbar=0,status=0, width=700, height=400'); return false" class="single-share-links-item single-share-links-item--vk"> <i class="fa fa-vk single-share-links-item__icon"></i> <span class="single-share-links-item__text"></span> </a> <a href="#" onClick="window.open('http://twitter.com/share?url=https%3A%2F%2Fpolarize.ru%2Fgraphics%2Fperspektivy-razvitiya-poiskovyh-sistem-sovremennye-poiskovye%2F', 'sharer', 'toolbar=0,status=0, width=700, height=400'); return false" class="single-share-links-item single-share-links-item--tw"> <i class="fa fa-twitter single-share-links-item__icon"></i> <span class="single-share-links-item__text">Twitter</span> </a> </div> <div class="single-share__group"> <a href="#" onclick="window.print();" class="single-share-item single-share-item--print"><i class="fa fa-print single single-share-item__icon"></i><span>print</span> </a> </div> </div> </div> <div class="article-user"> <div class="article-user-item article-user-question"> <div class="article-user-item__title">Was this article helpful?</div> <div class="article-user-item__bottom"> <div id="js-send-positive-rating" class="article-user-item__button _yes">Yes</div> <div class="article-user-item__button _no">Not</div> </div> </div> <div id="js-success-send-form" class="article-user-item article-user-question _yes _hidden"> <div class="article-user-item__title">Thanks for your feedback!</div> <div class="article-user-item__text"></div> </div> <div id="js-fail-send-form" class="article-user-item _no _hidden"> <div class="article-user-item__title">Something went wrong and your vote was not counted.</div> </div> <div id="js-negative-rating-form" class="article-user-item article-user-question _no _hidden"> <div class="article-user-item__title">Thank you. Your message has been sent</div> </div> <div class="article-user-item article-user-error"> <div class="article-user-item__title">Did you find an error in the text?</div> <div class="article-user-item__text">Select it, click <strong>Ctrl+Enter</strong> and we'll fix it!</div> </div> </div> <div class="full-place" data-place="article_before_related_articles" data-priority="0" style="width: 100%;"></div> <div class="similar-articles"> <div class="similar-articles__title"> <i class="fa fa-angle-double-left"></i> Related Tips <i class="fa fa-angle-double-right"></i> </div> <div class="block-row"> <div class="block-column _triple"> <div class="similar-articles-block"> <div class="similar-articles-block__image"> <img src="/uploads/22e976503c155987884e0e6ba25ea3c7.jpg" alt="Program for creating passes online" loading=lazy loading=lazy> </div> <a href="https://polarize.ru/en/video/programma-dlya-sozdaniya-propuskov-onlain-izgotov-propusk/" title="Program for creating passes online" class="similar-articles-block__title">Program for creating passes online</a> </div> </div> <div class="block-column _triple"> <div class="similar-articles-block"> <div class="similar-articles-block__image"> <img src="/uploads/cc0c679435050f52e8e4678c60306548.jpg" alt="Choosing a translator that works without the Internet for Android Whether the translator program" loading=lazy loading=lazy> </div> <a href="https://polarize.ru/en/photoshop/v-odin-klik-pyat-besplatnyh-programm-dlya-bystrogo-perevoda-tekstov-vybiraem/" title="Choosing a translator that works without the Internet for Android Whether the translator program" class="similar-articles-block__title">Choosing a translator that works without the Internet for Android Whether the translator program</a> </div> </div> <div class="block-column _triple"> <div class="similar-articles-block"> <div class="similar-articles-block__image"> <img src="/uploads/6e14cf720780a847fba9f774efd38952.jpg" alt="Cleaning your computer from garbage: why clean it, what programs to use and how to do it" loading=lazy loading=lazy> </div> <a href="https://polarize.ru/en/components/programma-dlya-ochistki-diska-s-windows-7-ochistka-kompyutera-ot-musora-zachem/" title="Cleaning your computer from garbage: why clean it, what programs to use and how to do it" class="similar-articles-block__title">Cleaning your computer from garbage: why clean it, what programs to use and how to do it</a> </div> </div> </div> <div class="block-row"> <div class="block-column _triple"> <div class="similar-articles-block"> <div class="similar-articles-block__image"> <img src="/uploads/a31aa451688bcedca5cdba5690591098.jpg" alt="Providing copies of the recipient's passport What do businessmen offering to buy a Russian passport offer" loading=lazy loading=lazy> </div> <a href="https://polarize.ru/en/windows/sdelat-poddelnyi-pasport-onlain-predostavlenie-kopii/" title="Providing copies of the recipient's passport What do businessmen offering to buy a Russian passport offer" class="similar-articles-block__title">Providing copies of the recipient's passport What do businessmen offering to buy a Russian passport offer</a> </div> </div> <div class="block-column _triple"> <div class="similar-articles-block"> <div class="similar-articles-block__image"> <img src="/uploads/d13cc75439cb11e965b883b5dee4a728.jpg" alt="Activation code smart defrag 5" loading=lazy loading=lazy> </div> <a href="https://polarize.ru/en/computer/kod-aktivacii-smart-defrag-5/" title="Activation code smart defrag 5" class="similar-articles-block__title">Activation code smart defrag 5</a> </div> </div> <div class="block-column _triple"> <div class="similar-articles-block"> <div class="similar-articles-block__image"> <img src="/uploads/ed669330b7da55fe7a503fc196852d3e.jpg" alt="Smart Defrag functionality allows" loading=lazy loading=lazy> </div> <a href="https://polarize.ru/en/program/smart-defrag-5-pro-kod-aktivacii-funkcionalnost-smart-defrag-pozvolyaet/" title="Smart Defrag functionality allows" class="similar-articles-block__title">Smart Defrag functionality allows</a> </div> </div> </div> </div> <div class="full-place" data-place="article_after_related_articles" data-priority="0" style="width: 100%;"></div> <div class="comments"> <div id="mc-container"></div> </div> </article> <aside class="sidebar hide-on-mobile"> <div class="day-sovet"> <h3 class="day-sovet__hd">Theme of the day</h3> <div class="day-sovet__box"> <img src="/uploads/c7471e38c078b7c33f5fded92bad1387.jpg" alt="The best plugins for"Фотошопа": обзор, описание, отзывы" class="day-sovet__image" loading=lazy loading=lazy> <a href="https://polarize.ru/en/category/mobile/" class="day-sovet__category"> <span>Mobile</span> </a> </div> <a href="https://polarize.ru/en/mobile/luchshie-plaginy-dlya-photoshop-cs6-luchshie-plaginy-dlya-fotoshopa-obzor/" class="day-sovet__link"> <span>The best plugins for Photoshop: review, description, reviews</span> </a> </div>  <div class="vk-widget"> <h3 class="vk-widget__hd">Advertising</h3> </div> <div class="popular-articles"> <h3 class="black-hd popular-articles__title">Popular materials</h3> <div class="popular-articles__item"> <div class="row-gui"> <div class="row-gui__image"> <a href="https://polarize.ru/en/computer/skachat-kleri-utilit-dlya-smartfona-na-russkom-obzor-besplatnoi-versii/" class=""> <img src="/uploads/f90b97448890544d6cd4c7a44b5ccc30.jpg" alt="Glary Utilities free version review" loading=lazy loading=lazy> </a> </div> <div class="row-gui__content"> <a href="https://polarize.ru/en/computer/skachat-kleri-utilit-dlya-smartfona-na-russkom-obzor-besplatnoi-versii/" class="row-gui__link"> <span>Glary Utilities free version review</span> </a> </div> </div> </div> <div class="popular-articles__item"> <div class="row-gui"> <div class="row-gui__image"> <a href="https://polarize.ru/en/program/samaya-novaya-versiya-itunes-skachat-itunes-na-windows-ustanovka-itunes-na-windows-podrobnaya/" class=""> <img src="/uploads/e20e7facd44e713147a5d2912f5ef5dd.jpg" alt="Download iTunes on Windows" loading=lazy loading=lazy> </a> </div> <div class="row-gui__content"> <a href="https://polarize.ru/en/program/samaya-novaya-versiya-itunes-skachat-itunes-na-windows-ustanovka-itunes-na-windows-podrobnaya/" class="row-gui__link"> <span>Download iTunes on Windows</span> </a> </div> </div> </div> <div class="popular-articles__item"> <div class="row-gui"> <div class="row-gui__image"> <a href="https://polarize.ru/en/game/programmy-dlya-obshcheniya-v-igrah-programmy-dlya-tekstovogo-i-golosovogo/" class=""> <img src="/uploads/621fcc478217cccfc2beef19cf7cd91b.jpg" alt="Programs for text and voice communication (in games, conference mode) Download the program to talk" loading=lazy loading=lazy> </a> </div> <div class="row-gui__content"> <a href="https://polarize.ru/en/game/programmy-dlya-obshcheniya-v-igrah-programmy-dlya-tekstovogo-i-golosovogo/" class="row-gui__link"> <span>Programs for text and voice communication (in games, conference mode) Download the program to talk</span> </a> </div> </div> </div> <div class="popular-articles__item"> <div class="row-gui"> <div class="row-gui__image"> <a href="https://polarize.ru/en/computer/skachat-microsoft-net-framework-vse-versii-pryamye-ssylki-besplatnye/" class=""> <img src="/uploads/45359b021a5fcf3b6c49acc1001c22e9.jpg" alt="NET Framework all versions (direct links)" loading=lazy loading=lazy> </a> </div> <div class="row-gui__content"> <a href="https://polarize.ru/en/computer/skachat-microsoft-net-framework-vse-versii-pryamye-ssylki-besplatnye/" class="row-gui__link"> <span>NET Framework all versions (direct links)</span> </a> </div> </div> </div> <div class="popular-articles__item"> <div class="row-gui"> <div class="row-gui__image"> <a href="https://polarize.ru/en/internet/programma-dlya-testirovaniya-sistemy-v-igrah-utility-dlya/" class=""> <img src="/uploads/5d2db02bea52e1bc559d72a6ce3c9ced.jpg" alt="Utilities for determining computer hardware" loading=lazy loading=lazy> </a> </div> <div class="row-gui__content"> <a href="https://polarize.ru/en/internet/programma-dlya-testirovaniya-sistemy-v-igrah-utility-dlya/" class="row-gui__link"> <span>Utilities for determining computer hardware</span> </a> </div> </div> </div> <div class="popular-articles__item"> <div class="row-gui"> <div class="row-gui__image"> <a href="https://polarize.ru/en/program/kak-zaryadit-aifon-v-mikrovolnovke-tehnologiya-wave-kak-zaryadit/" class=""> <img src="/uploads/ee45182f5d6712a5b5c17974f674edc5.jpg" alt="How to charge an iPhone in a microwave: Wave technology Is it possible to charge a smartphone in a microwave" loading=lazy loading=lazy> </a> </div> <div class="row-gui__content"> <a href="https://polarize.ru/en/program/kak-zaryadit-aifon-v-mikrovolnovke-tehnologiya-wave-kak-zaryadit/" class="row-gui__link"> <span>How to charge an iPhone in a microwave: Wave technology Is it possible to charge a smartphone in a microwave</span> </a> </div> </div> </div> <div class="popular-articles__item"> <div class="row-gui"> <div class="row-gui__image"> <a href="https://polarize.ru/en/office/samyi-moshchnyi-sab-v-mire-luchshie-aktivnye-sabvufery-dlya-avto-i-ih-obzor/" class=""> <img src="/uploads/c7f5996d63513307dee66c0602876dca.jpg" alt="The best active subwoofers for cars and their review" loading=lazy loading=lazy> </a> </div> <div class="row-gui__content"> <a href="https://polarize.ru/en/office/samyi-moshchnyi-sab-v-mire-luchshie-aktivnye-sabvufery-dlya-avto-i-ih-obzor/" class="row-gui__link"> <span>The best active subwoofers for cars and their review</span> </a> </div> </div> </div> </div> </aside> </div> <div class="full-place" data-place="article_desktop_fixed" data-priority="0" style="width: 100%;"></div> <footer class="footer"> <div class="footer__wrapper"> <div class="footer__box"> <a href="https://polarize.ru/en/" class="footer__logo1"><img src="/logo.png" loading=lazy loading=lazy></a><br> <div class="social-buttons"> <div class="social-buttons-list"> <a href="https://vk.com/share.php?url=https://polarize.ru/graphics/perspektivy-razvitiya-poiskovyh-sistem-sovremennye-poiskovye/" class="social-buttons-list__item _vk">Vkontakte</a> <a href="https://www.facebook.com/sharer/sharer.php?u=https://polarize.ru/graphics/perspektivy-razvitiya-poiskovyh-sistem-sovremennye-poiskovye/" class="social-buttons-list__item _fb">Facebook</a> </div> </div> <p class="footer__copyright">© 2022. Hardware and software setup</p> </div> <nav class="footer-menu"> <a class="footer-menu__item" href="https://polarize.ru/en/category/internet/">Internet</a> <a class="footer-menu__item" href="https://polarize.ru/en/category/program/">Programs</a> <a class="footer-menu__item" href="https://polarize.ru/en/category/game/">Games</a> <a class="footer-menu__item" href="https://polarize.ru/en/category/history/">Story</a> <a class="footer-menu__item" href="https://polarize.ru/en/category/windows/">Windows</a> <a class="footer-menu__item" href="https://polarize.ru/en/category/computer/">A computer</a> <a class="footer-menu__item" href="https://polarize.ru/en/category/graphics/">Graphic arts</a> <a class="footer-menu__item" href="https://polarize.ru/en/category/components/">Components</a> </nav> <div class="footer__counters"> </div> </div> </footer> <div class="scroll-up"> <div id="toTop"><span class="up_b"></span> Top</div> </div> <div class="full-place" data-place="article_modal" data-priority="0" style="width: 100%;"></div> <div class="full-place" data-place="article_modal" data-priority="0" style="width: 100%;"></div> <script data-rocketsrc="/assets/sovets24-2017/js/app-20171130102046.js" type="text/rocketscript"></script> <script type="text/rocketscript"> if(window.location.hostname.indexOf('hghltd.yandex.net')!=-1){ // hghltd.yandex.net var i; var x = document.getElementsByClassName("direct"); for (i = 0; i < x.length; i++) { x[i].style.display = "none"; } } (function(H){ H.className=H.className.replace(/\bno-js\b/,'js')} )(document.documentElement); </script> <script type="text/rocketscript"> APP.pages.articleView(); APP.modules.ratingBlock.init(); Engine.article.errorInTextReport(); document.oncopy = Engine.addLinkOnCopy; </script> <script type="text/rocketscript"> var BANNERS = { types: { code: 1, consultant: 6, modal: 9, offer: 7 } } </script> </body> </html>