How does Google decide what comes up first?

How does Google decide what comes up first?

Sergey Brin and Lawrence Page told Stanford’s Computer Science Department that they define Google as an archetype that consists of a comprehensive search system. One that explores the benefit of hypertext and applies to the overall structure of PageRank. Google can be sufficiently used to crawl the data from the website and present the sorted information and more satisfying result to the user. 

Considering the uncertainty, to generate a better result after each search. Moreover, we analyze the problem of how to produce a more utilitarian system for Google. One that can be used to scale complicated and opaque information in hypertext to the well-defined data. Additionally, this large-scale web search engine is trying to prevent the worsening situation of unreliable and uncontrollable hypertext. Where anyone can make public anything without permission. 

Web search engine – Google 

The founders chose Google as the name of a search engine which overcomes the challenging task and performs its ability as the best search engine in the world. With the growing amount of information and websites on the Internet. And that of new users on the Web, the Web creates new challenges for information restoration.

When search engines use keywords to automate match results. It will show its deficiency of huge time consumption and lower satisfying results. Even worse, some search systems let advertisers attract people’s attention by putting their profitable products at the top of searching results to gain benefits. That’s why Brin and Page innovated an efficient scaling system to deal with the existing problems of contemporary search engines.

Google utilizes every possible structure included in hypertext for presenting the best solutions. 

The punchline of Google is to address a variety of problems in efficacy and flexibility. Due to rapid technology advancement and proliferation of the Web, the search system has to dramatically improve its capacity to keep up with the pace of the Web. The authors demonstrate the growing amount of inexperienced users and new Web documents by giving a series of authoritative statistics. 

Google: scaling with Web 

The authors illustrate the magnificent hardware performance of their search engine by pointing out the particular challenge and how their improvement offsets the obstruction. For instance, for fast scrawling ability, they have to quickly gather all related information and keep them in some storage space. The storage space must be large and powerful enough to store a sequence of indices and to index all the data as quickly as possible. The key is to become quicker.

And the problem is also to become quicker as the Web grows. In the process of creating Google, it aims to scale large enough to include all sets of data. Take the growth of the Web and new innovations into consideration, important foundations like disk searching time and data processing. It makes heavy use of the storage space to contain data. Its data structure system is efficient to visualize efficient access to any Web. Furthermore, as far as we are concerned, the less cost the better. The cost means the time and storage space to process data and store indexes. Once the authors manage to reduce the unwanted cost, the better results of the scaling standard that Google can present. 

Design goals 

One of the most important goals is to improve the search standard while the contemporary search engines did not perform well to what customers expect. Some people suggest that completeness is the only factor to decide the quality of the search engine because everything is possible to find on the Web.

However, the article clarifies the truth that there are more factors to tell whether the search engine can present the best results. For instance, if junk output attracts too much attention from consumers not interested in it. They will gradually lose the willingness to use this product anymore. Since there can be thousands of indices on the Web, customers have no energy and time to look over all the information on the Web and are usually willing to check the first page results solely. Therefore, any companies that want to develop the best quality in the search engine have to satisfy customers’ fulfillment and enhance the precision of the search results. 

One of the main goals in designing Google was all about building a large environment. One which has a high tolerance for everyone to propose to do any captivating experiments on Google’s database.
How does Google decide what comes up first?

Even if it could be somewhat complicated to generate otherwise. As well as building an architecture that helps with setting up the fundamental requirement for launching novel research activities. With the unprecedented boom of technology, the Web experienced a change from academic to more about commercial over time. With few details about the technical information of search engine development to the public, more and more companies paid less attention to the essential foundation of search engines and became more advertising and beneficial. 

System feature 

The Google search engine owns two specific useful and distinctive characteristics that benefit itself to generate logical decision-making ability. First of all, the technology makes heavy use of the organization of the Web to calculate a ranking from the best Web you can use on the page. This ranking style was called PageRank[1] which saves a massive amount of users’ time to find what they need. Then, Google facilitates the quality of searching results after all. 

The idea of the link structure of the Web is one of the most important features on the Web which has not been sufficiently used in the existing searching technology. Considering the importance of this resource and the wide usage of this technology, the authors created some so-called maps which included 518 million hyperlinks. These hyperlinks, a sample to represent total manipulation, allow a random amount of calculation of high-intensity of a PageRank. PageRank makes a non-subjective option by comparing its choices of importance and those of people’s subjective perspective of importance.

How does Google decide what comes up first?

After analyzing the similarities and differences of this comparison, PageRank is excellent enough to prioritize the quality of the results of every single search. PageRank brings order to the Web world and, for popular subjects, prioritizes the results while a piece of non-complicated information matching output restricts to the titles of any articles. It can perform as excellently as short text matching in the field of full-text searches. 

How does PageRank work? 

Academic references have largely efficient utilization to the Web, primarily by calculating references or backlinks to a related page. This gives an overview of the importance of standard weights and measures of the page. Page ranking technology can use a simple iterative mathematical algorithm, which is analogous to the main features of the Web’s standardized form of link matrix. PageRank expands this recommendation that you do not count links on all pages on average, but do standardization based on how many links there are on the page. PageRank defines that: 

Let’s assume page A has pages from T1….T n. The parameter d set from 0 to 1. And we usually set it to 0.85. And C(A) defines as the number of links going out of page A.

How does Google decide what comes up first? : The PageRank’s probable calculation:

P R(A) = (1 − d) + d(P R(T1) 

C(T1) + +P R(T n

C(T n)

Let’s take a specific example to understand how Pagework performs its job. First of all, we imagine some kind of model of Web users’ behaviors and a random user, only allowed to click on one link of a page. This person will become bored by clicking the button. And never going back and start to try another random page. The probability of going to visit a page defined as PageRank. Also, the d factor is the probability that the random user will get bored and go on another page. One of the limits of this assumption is to test the experiment solely on one page or on a group of pages. This variance makes different personalization. Making the system impossible for becoming misled by an actor trying to get a precise ranking. 

Another intuitive justification for a high PageRank becomes if a page has linked by many other pages or one of the pages has a high PageRank. Pages should give more importance to those well-cited pages from all over the Webs as well as single citations of a page like Yahoo. A worth-looking homepage will not focus on a low-quality or broken page. PageRank can address all these situations and recursively reproduce and compare differences through the links on the Web. 

Google anatomy 

In this section, we will introduce how Google works comprehensively as in Fig ??. Most of Google’s implementation we find filled with C and C++. And workable on the system of Linux or Solaris. Google’s data structure becomes optimized to cost little time and memory space to index, crawl, and search. One of the major challenges becomes the disk seeker. Usually spends a relatively long time to receive input and produce output no matter how well the computer hardware has been improved. So the authors want to decrease the search time by avoiding using disk seeker whenever it can be possible which requires totally different data structures. 

Figure 1: An image of high level google architecture : How does Google decide what comes up first?

Crawling the Web 

The web crawling technology(extracting data from a page) can be visualized by different web crawler applications. The final goal is to convert a beautiful organization of the Web page to an HTML supporting text file on a computer drive. Running a Web searcher can be a daunting task. There are difficult visualization and reliability issues and, more vitally, social problems. Crawling is the most vulnerable application because it includes interacting with massive Web servers and almost infinite name servers, all of which are free of the control of the central system. 

The authors accomplished this challenging task by figuring out how more than half a million webs can be run by thousands of webmasters. Therefore, crawling involves a strong connection and interaction with many people. Almost every day users get something like, ”You have visited many pages on my website. What are your preferences?” Other connections have copyright problems and obscure errors that may appear accidentally. Since large, complex systems, such as searchers, do inevitably come up with problems, a lot of resources and time are needed to read email and resolve them. 

High quality searching 

The biggest concern for the users is which search engine can perform the best quality of search results. Most of the results are usually frustrating and amusing and time-consuming. Therefore, Google is designed to provide the best searching matches and the most helpful information that users need. As the amount of web grows unprecedentedly. There will be a high requirement for the search engine to seek relative information.

Moreover, Google wants to and is capable of accomplishing this task. By innovating the utilization of hypertextual information consisting of link structure and link text. Google makes use of font information and proximity of data. Furthermore, to speed up the searching time while maintaining the precision of outputs of the contemporary search engine. PageRank technology permits Google to rank the quality of Web pages. Finally, font information and proximity of data sift the irrelevant information. 

Conclusion 

In conclusion, Google wants the world’s greatest search engine. And to provide the most qualified search results in this world of misinformation and disinformation. Furthermore, Google attracts more elites to make use of proximity information, PageRank, and anchor text. Lastly, Google is definitely a revolutionary product and magnificent masterpiece for gathering Web data. Indexing them, visualizing them, then performing an accurate search over them. 

Back To News

Hong Cao’s How does Google decide what comes up first?

References for How does Google decide what comes up first? [1] Sergey Brin and Lawrence Page. “The anatomy of a large-scale hypertextual Web search engine”. In: Computer Networks and ISDN Systems 30 (1998). doi: http://snap.stanford.edu/class/cs224w-readings/Brin98Anatomy.pdf.

How does Google decide what comes up first?