使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
关于 Google 搜索运作方式的深度指南
Google 搜索是一款全自动搜索引擎,会使用名为“网页抓取工具”的软件定期探索网络,找出可添加到 Google 索引中的网页。实际上,Google 搜索结果中收录的大多数网页都不是手动提交的,而是我们的网页抓取工具在探索网络时找到并自动添加的。本文档从网站的角度介绍了 Google 搜索运作方式的各个阶段。掌握这些基础知识可以帮助您解决抓取问题、让您的网页编入索引,并且了解如何优化您的网站在 Google 搜索结果中的呈现效果。
开始之前的一些注意事项
在深入了解 Google 搜索的运作方式之前,请务必注意,Google 不会通过收取费用来提高网站抓取频率或网站排名。任何与此不符的消息均是子虚乌有。
Google 不保证一定会抓取您的网页、将其编入索引或在搜索结果中显示您的网页,即使您的网页遵循 Google 搜索要素也是如此。
Google 搜索的 3 个阶段简介
Google 搜索的工作流程分为 3 个阶段,并非每个网页都会经历这 3 个阶段:
-
抓取:Google 会使用名为“抓取工具”的自动程序从互联网上发现各类网页,并下载其中的文本、图片和视频。
-
索引编制:Google 会分析网页上的文本、图片和视频文件,并将信息存储在大型数据库 Google 索引中。
-
呈现搜索结果:当用户在 Google 中搜索时,Google 会返回与用户查询相关的信息。
抓取
第一阶段是找出网络上存在哪些网页。不存在包含所有网页的中央注册表,因此 Google 必须不断搜索新网页和更新过的网页,并将其添加到已知网页列表中。此过程称为“网址发现”。由于 Google 之前已经访问过某些网页,因此这些网页是 Google 已知的网页。在提取已知网页上指向新网页的链接时,Google 会发现其他网页,例如类别网页等中心页会链接到新的博文。当您以列表形式(站点地图)提交一系列网页供 Google 抓取时,Google 也会发现其他网页。
Google 发现网页的网址后,可能会访问(或“抓取”)该网页以了解其中的内容。我们使用大量计算机抓取网络上的数十亿个网页。执行抓取任务的程序叫做 Googlebot(也称为抓取工具、漫游器或“蜘蛛”程序)。Googlebot 使用算法流程确定要抓取的网站、抓取频率以及要从每个网站抓取的网页数量。Google 的抓取工具也经过编程,确保不会过快地抓取网站,避免网站收到过多请求。此机制基于网站的响应(例如,HTTP 500 错误意味着“降低抓取速度”)。
但是,Googlebot 不会抓取它发现的所有网页。某些网页可能被网站所有者设置为禁止抓取,而其他网页可能必须登录网站才能访问。
在抓取过程中,Google 会使用最新版 Chrome 渲染网页并运行它找到的所有 JavaScript,此过程与浏览器渲染您访问的网页的方式类似。渲染很重要,因为网站经常依靠 JavaScript 将内容引入网页,缺少了渲染过程,Google 可能就看不到相应内容。
能否抓取取决于 Google 的抓取工具能否访问网站。Googlebot 访问网站时的一些常见问题包括:
编入索引
抓取网页后,Google 会尝试了解该网页的内容。这一阶段称为“索引编制”,包括处理和分析文字内容以及关键内容标记和属性,例如 <title>
元素和 Alt 属性、图片、视频等。
在索引编制过程中,Google 会确定网页是否与互联网上的其他网页重复或是否为规范网页。
规范网页是可能会显示在搜索结果中的网页。为了选择规范网页,我们首先会将在互联网上找到的内容类似的网页归为一组(也称为聚类),然后从中选择最具代表性的网页。该组网页中的其他网页可作为备用版本在不同情况下提供,例如用户在移动设备上进行搜索时,或他们正在查找该组网页中的某个具体网页时。
Google 还会收集关于规范网页及其内容的信号,这些信号可能会在下一阶段(即在搜索结果中呈现网页)时用到。一些信号包括网页语言、内容所针对的国家/地区、网页易用性。
所收集的关于规范网页及其网页群组的相关信息可能会存储在 Google 索引(托管在数千台计算机上的大型数据库)中。我们无法保证网页一定会编入索引;并非 Google 处理的每个网页都会编入索引。
是否会编入索引还取决于网页内容及其元数据。一些常见的索引编制问题可能包括:
呈现搜索结果
用户输入查询时,我们的机器会在索引中搜索匹配的网页,并返回我们认为与用户的搜索内容最相关的优质结果。相关性是由数百个因素决定的,其中可能包括用户的位置、语言和设备(桌面设备或手机)等信息。例如,在用户搜索“自行车维修店”后,Google 向巴黎用户显示的结果与向香港用户显示的结果有所不同。
根据用户的查询,搜索结果页上显示的搜索功能也会发生变化。例如,如果您搜索“自行车维修店”,系统可能会显示本地搜索结果,而不会显示图片搜索结果;不过,搜索“现代自行车”更有可能显示图片搜索结果,但不会显示本地搜索结果。您可以在我们的视觉元素库中探索 Google 网页搜索中最常见的界面元素。
Search Console 可能提示您某个网页已编入索引,但您在搜索结果中看不到该网页。
这可能是因为:
虽然本指南介绍了 Google 搜索的运作方式,但我们一直在努力改进算法。
您可以关注 Google 搜索中心博客,及时了解这些更改。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-04。
[null,null,["最后更新时间 (UTC):2025-08-04。"],[[["\u003cp\u003eGoogle Search discovers, analyzes, and ranks web pages to deliver relevant search results to users.\u003c/p\u003e\n"],["\u003cp\u003eThe three stages of Google Search are crawling, indexing, and serving search results.\u003c/p\u003e\n"],["\u003cp\u003eCrawling involves discovering and fetching web pages using automated programs called crawlers.\u003c/p\u003e\n"],["\u003cp\u003eIndexing involves analyzing the content and metadata of web pages to understand their topic and relevance.\u003c/p\u003e\n"],["\u003cp\u003eServing search results involves selecting and ranking relevant pages from the index based on user queries and various factors.\u003c/p\u003e\n"]]],["Google Search operates in three stages: crawling, indexing, and serving. Crawling involves automated web crawlers (Googlebot) discovering and downloading content (text, images, videos) from web pages. Indexing analyzes this content, determining its relevance and canonical status, storing it in Google's database. Serving involves matching user queries with indexed pages and displaying the most relevant results, considering factors like user location and device. Google does not accept payment for crawling, indexing or ranking and can't guarantee that the content will be crawled, indexed or served.\n"],null,["In-depth guide to how Google Search works \n\n\nGoogle Search is a fully-automated search engine that uses software known as web crawlers that\nexplore the web regularly to find pages to add to our index. In fact, the vast majority of\npages listed in our results aren't manually submitted for inclusion, but are found and added\nautomatically when our web crawlers explore the web. This document explains the stages of how\nSearch works in the context of your website. Having this base knowledge can help you fix\ncrawling issues, get your pages indexed, and learn how to optimize how your site appears in\nGoogle Search.\n| Looking for something less technical? Check out our [How Search Works site](https://www.google.com/search/howsearchworks/), which explains how Search works from a searcher's perspective.\n\nA few notes before we get started\n\n\nBefore we get into the details of how Search works, it's important to note that Google doesn't\naccept payment to crawl a site more frequently, or rank it higher. If anyone tells you\notherwise, they're wrong.\n\n\nGoogle doesn't guarantee that it will crawl, index, or serve your page, even if your page\nfollows the [Google Search Essentials](/search/docs/essentials).\n\nIntroducing the three stages of Google Search\n\nGoogle Search works in three stages, and not all pages make it through each stage:\n\n1. [**Crawling:**](#crawling) Google downloads text, images, and videos from pages it found on the internet with automated programs called crawlers.\n2. [**Indexing:**](#indexing) Google analyzes the text, images, and video files on the page, and stores the information in the Google index, which is a large database.\n3. [**Serving search results:**](#serving) When a user searches on Google, Google returns information that's relevant to the user's query.\n\nCrawling\n\n\nThe first stage is finding out what pages exist on the web. There isn't a central registry of\nall web pages, so Google must constantly look for new and updated pages and add them to its\nlist of known pages. This process is called \"URL discovery\". Some pages are known because\nGoogle has already visited them. Other pages are discovered when Google extracts a link from a\nknown page to a new page: for example, a hub page, such as a category page, links to a new\nblog post. Still other pages are discovered when you submit a list of pages (a\n[sitemap](/search/docs/crawling-indexing/sitemaps/overview)) for Google to crawl. \n\n\nOnce Google discovers a page's URL, it may visit (or \"crawl\") the page to find out what's on\nit. We use a huge set of computers to crawl billions of pages on the web. The program that\ndoes the fetching is called [Googlebot](/search/docs/crawling-indexing/googlebot)\n(also known as a crawler, robot, bot, or spider). Googlebot uses an algorithmic process to\ndetermine which sites to crawl, how often, and how many pages to fetch from each site.\n[Google's crawlers](/search/docs/crawling-indexing/overview-google-crawlers)\nare also programmed such that they try not to crawl the site too fast to avoid overloading it.\nThis mechanism is based on the responses of the site (for example,\n[HTTP 500 errors mean \"slow down\"](/search/docs/crawling-indexing/http-network-errors#http-status-codes)).\n\n\nHowever, Googlebot doesn't crawl all the pages it discovered. Some pages may be\n[disallowed for crawling](/search/docs/crawling-indexing/robots/robots_txt#disallow) by the\nsite owner, other pages may not be accessible without logging in to the site.\n\n\nDuring the crawl, Google renders the page and\n[runs any JavaScript it finds](/search/docs/crawling-indexing/javascript/javascript-seo-basics#how-googlebot-processes-javascript)\nusing a recent version of\n[Chrome](https://www.google.com/chrome/), similar to how your\nbrowser renders pages you visit. Rendering is important because websites often rely on\nJavaScript to bring content to the page, and without rendering Google might not see that\ncontent.\n\n\nCrawling depends on whether Google's crawlers can access the site. Some common issues with\nGooglebot accessing sites include:\n\n- [Problems with the server handling the site](/search/docs/crawling-indexing/http-network-errors#http-status-codes)\n- [Network issues](/search/docs/crawling-indexing/http-network-errors#network-and-dns-errors)\n- [robots.txt rules preventing Googlebot's access to the page](/search/docs/crawling-indexing/robots/intro)\n\nIndexing\n\n\nAfter a page is crawled, Google tries to understand what the page is about. This stage is\ncalled indexing and it includes processing and analyzing the textual content and key content\ntags and attributes, such as\n[`\u003ctitle\u003e` elements](/search/docs/appearance/title-link)\nand alt attributes,\n[images](/search/docs/appearance/google-images),\n[videos](/search/docs/appearance/video), and\nmore. \n\n\nDuring the indexing process, Google determines if a page is a\n[duplicate of another page on the internet or canonical](/search/docs/crawling-indexing/consolidate-duplicate-urls).\nThe canonical is the page that may be shown in search results. To select the canonical, we\nfirst group together (also known as clustering) the pages that we found on the internet that\nhave similar content, and then we select the one that's most representative of the group. The\nother pages in the group are alternate versions that may be served in different contexts, like\nif the user is searching from a mobile device or they're looking for a very specific page from\nthat cluster.\n\n\nGoogle also collects signals about the canonical page and its contents, which may be used in\nthe next stage, where we serve the page in search results. Some signals include the language\nof the page, the country the content is local to, and the usability of the page.\n\n\nThe collected information about the canonical page and its cluster may be stored in the Google\nindex, a large database hosted on thousands of computers. Indexing isn't guaranteed; not every\npage that Google processes will be indexed.\n\n\nIndexing also depends on the content of the page and its metadata. Some common indexing issues\ncan include:\n\n- [The quality of the content on page is low](/search/docs/essentials)\n- [Robots `meta` rules disallow indexing](/search/docs/crawling-indexing/block-indexing)\n- [The design of the website might make indexing difficult](/search/docs/crawling-indexing/javascript/javascript-seo-basics)\n\nServing search results Google doesn't accept payment to rank pages higher, and ranking is done programmatically. [Learn more about ads on Google Search](https://www.google.com/search/howsearchworks/our-approach/ads-on-search/).\n\n\nWhen a user enters a query, our machines search the index for matching pages and return the\nresults we believe are the highest quality and most relevant to the user's query. Relevancy is\ndetermined by hundreds of factors, which could include information such as the user's\nlocation, language, and device (desktop or phone). For example, searching for \"bicycle repair\nshops\" would show different results to a user in Paris than it would to a user in Hong Kong. \n\n\nBased on the user's query the search features that appear on the search results page also\nchange. For example, searching for \"bicycle repair shops\" will likely show local results and\nno [image results](/search/docs/appearance/visual-elements-gallery#image-result),\nhowever searching for \"modern bicycle\" is more likely to show image results, but not local\nresults. You can explore the most common UI elements of Google web search in our\n[Visual Element gallery](/search/docs/appearance/visual-elements-gallery).\n\n\nSearch Console might tell you that a page is indexed, but you don't see it in search results.\nThis might be because:\n\n- [The content on the page is irrelevant to users' queries](/search/docs/fundamentals/seo-starter-guide#expect-search-terms)\n- [The quality of the content is low](/search/docs/essentials)\n- [Robots `meta` rules prevent serving](/search/docs/crawling-indexing/block-indexing)\n\n\nWhile this guide explains how Search works, we are always working on improving our algorithms.\nYou can keep track of these changes by following the\n[Google Search Central blog](/search/blog)."]]