Analyzing “How Google Search Works” Changes from Google

Google has made some new substantial changes to their “How Google Search Works” search documents for website owners. And as always when Google makes changes to important documents with impact on SEO, such as How Search Works and the Quality Rater Guidelines, there are some key insights SEOs can gleam from the new changes Google has made.Google对网站所有者的“ Google搜索的工作方式”搜索文档进行了一些新的重大更改。与往常一样,当Google更改对SEO有影响的重要文档时,例如Search How Works和Quality Rater Guidelines,SEO可以从Google所做的新更改中汲取一些重要见解。

 

Of particular note, Google detailing how it views a “document” as potentially comprising of more than one webpage, what Google considers primary and secondary crawls, as well as an update to their reference of “more than 200 ranking factors” which has been present in this document since 2013.特别值得注意的是,Google详细介绍了其如何将“文档”视为可能包含多个网页,Google认为主要和次要抓取内容,以及它们对“超过200个排名因素”的引用的更新 自2013年以来在本文档中。

But here are the changes and what they mean for SEOs.但是,这里有变化及其对SEO的意义。


Contents

l 1 Crawling

l 1.1 Improving Your Crawling

l 2 The Long Version

l 3 Crawling

l 3.1 How does Google find a page?

l 3.2 Improving Your Crawling

l 4 Indexing

l 4.1 Improving your Indexing

l 4.1.1 What is a document?

l 5 Serving Results

l 6 Final Thoughts

l 6.0.1 Jennifer Slegg

l 6.0.2 Latest posts by Jennifer Slegg (see all)

-------------------------------------------------------------

Crawling

Google has greatly expanded this section.谷歌已经大大扩展了本节。

They made a slight change to wording, with “some pages are known because Google has already crawled them before” changed to “some pages are known because Google has already visited them before.”   This is a fairly minor change, primarily because Google decided to include an expanded section detailing what crawling actually is. 他们对措辞做了些微更改,将``某些页面已知,因为Google之前已经对其进行了爬网''更改为``某些页面已知,因为Google之前已经对其进行了访问''。这是一个相当小的更改,主要是因为Google决定包括一个扩展部分,详细说明实际的爬网。


Google removed:


This process of discovery is called crawling. 发现的过程称为爬网。


The removal of the crawling definition was simply because it was redundant.  In Google’s expanded crawling section, they included a much more detailed definition and description of crawling instead. 删除爬网定义仅仅是因为它是多余的。在Google的展开的抓取部分中,它们包含了更详细的抓取定义和说明。


The added definition:


Once Google discovers a page URL, it visits, or crawls, the page to find out what’s on it. Google renders the page and analyzes both the text and non-text content and overall visual layout to decide where it should appear in Search results. The better that Google can understand your site, the better we can match it to people who are looking for your content. Google发现网页网址后,便会访问或爬网该网页以查找其中的内容。Google渲染页面并分析文本和非文本内容以及整体视觉布局,以决定页面应出现在搜索结果中的位置。Google越了解您的网站,我们就越能将其与正在寻找您内容的人匹配。


There is still a great debate on how much page layout is taken into account.  There was the page layout algo that was released many years, in order to penalize content that was pushed well below the fold in order to increase the odds a visitor might click on an advertisement that appeared above the fold instead.  But with more traffic moving to mobile, and the addition of mobile first indexing, the importance of above and below the fold for on page layout seemingly was less important. 关于要考虑多少页面布局仍存在很大争议。有一种发布了多年的页面布局算法,目的是对被远远低于首屏的内容进行处罚,以增加访问者点击首屏上出现的广告的几率。但是,随着越来越多的流量转移到移动设备上,并增加了移动设备首次索引功能,页面布局上下折叠的重要性似乎已变得不那么重要了。


When it comes to page layout and mobile first, Google says:在页面布局和移动优先方面,Google表示:


Don’t let ads harm your mobile page ranking. Follow the Better Ads Standard when displaying ads on mobile devices. For example, ads at the top of the page can take up too much room on a mobile device, which is a bad user experience. 不要让广告损害您的移动页面排名。在移动设备上展示广告时,请遵循“更好的广告标准”。例如,页面顶部的广告可能会在移动设备上占用过多的空间,这是糟糕的用户体验。


But in How Google Search Works, Google is specifically calling attention to the “overall visual layout” with “where it should appear in Search results.” 但是在“ Google搜索的工作方式”中,Google特别呼吁人们注意“整体视觉布局”,即“其应出现在搜索结果中的位置”。


It also brings attention to “non-text” content.  While the most obvious of this refers to image content, the referral to it is quite open ended.  Could this refer to OCR as well, which we know Google has been dabbling in? 它还引起对“非文本”内容的关注。虽然最明显的是图像内容,但对它的引用是开放式的。难道这也指OCR,我们知道Google一直在涉足?


Improving Your Crawling


Under the “to improve your site crawling” section, Google has expanded this section significantly as well. 在“改善您的网站抓取”部分下,Google也在此部分进行了显着扩展。


Google has added this point:


Verify that Google can reach the pages on your site, and that they look correct. Google accesses the web as an anonymous user (a user with no passwords or information). Google should also be able to see all the images and other elements of the page to be able to understand it correctly. You can do a quick check by typing your page URL in the Mobile-Friendly test tool. 确认Google可以访问您网站上的页面,并且看起来正确。Google以匿名用户(没有密码或信息的用户)访问网络。Google还应该能够查看页面中的所有图像和其他元素,以便能够正确理解它。您可以通过在适用于移动设备的测试工具中输入页面网址来进行快速检查。


This is a good point – so many new site owners end up accidentally blocking Googlebot from crawling or not realizing their site is set to be only viewable by logged in users only.  This makes it clear that site owners should try viewing their site without also being logged into it, to see if there are any unexpected accessibility or other issues that aren’t note when logged in as an admin or high level user. 这是一个好点-许多新的网站所有者最终无意间阻止了Googlebot抓取,或者没有意识到自己的网站被设置为只能由登录用户查看。这清楚表明,网站所有者应尝试在未登录的情况下查看其网站,以查看是否有意外的可访问性或其他以管理员或高级用户身份登录时未注意到的问题。


Also recommending site owners check their site via the Mobile-Friendly testing tool is good, since even seasoned SEOs use the tool to quickly see if there are Googlebot specific issues with how Google is able to see, render and crawl a specific webpage – or a competitor’s page. 此外,建议网站所有者通过移动设备友好的测试工具检查其网站是否良好,因为即使是经验丰富的SEO也会使用该工具来快速查看Googlebot在查看,呈现和抓取特定网页方面是否存在特定问题,或者 竞争对手的页面。


Google expanded their specific note about submitting a single page to the index. Google扩展了有关将单个页面提交到索引的特定注释。


If you’ve created or updated a single page, you can submit an individual URL to Google. To tell Google about many new or updated pages at once, use a sitemap. 如果您创建或更新了一个页面,则可以向Google提交一个单独的URL。要一次将许多新页面或更新页面告诉Google,请使用Sitemap。


Previously, it just mentioned submitting changes to a single page using the submit URL tool.  This just adds clarification to those who are newer to SEO that they do not need to submit every single new or updated pages to Google individually, but that using sitemaps is the best way to do that.  There have definitely been new site owners who add each page to Google using that tool because they don’t realize sitemaps is a thing.  But part of this is that WordPress is such a prevalent way to create a new website, yet it does not have native support for sitemaps (yet), so site owners need to either install a specific sitemaps plugin or use one of the many SEO tool plugins that offer sitemaps as a feature. 以前,它只是提到使用“提交URL”工具将更改提交到单个页面。这只是向那些刚接触SEO的人增加了澄清,即他们不需要分别向Google提交每个新的或更新的页面,但是使用站点地图是最好的方法。肯定有新的网站所有者会使用该工具将每个页面添加到Google,因为他们没有意识到站点地图是一回事。但这部分是因为WordPress是创建新网站的一种普遍方式,但是它还没有对站点地图的本机支持(因此),因此站点所有者需要安装特定的站点地图插件或使用众多SEO工具之一 提供站点地图功能的插件。


This new change also highlights using the tool for creating pages as well, instead of just the previous reference of “changes to a single page.” 这项新更改还强调了使用该工具来创建页面,而不仅仅是以前提到的“更改到单个页面”。


Google has also made a change to the section about “if you ask Google to crawl only one page” section as well.  They are now referencing what Google views as a “small site” – according to Google,  a smaller site is one with less than 1,000 pages. Google还对“如果您要求Google仅抓取一页”这一部分进行了更改。他们现在引用的是Google所说的“小型网站”-根据Google的说法,较小的网站就是少于1000页的网站。


Google also stresses the importance of a strong navigation structure, even for sites it considers “small.”  It says site owners of small sites can just submit their homepage to Google, “provided that Google can reach all your other pages by following a path of links that start from your homepage.” Google还强调了强大的导航结构的重要性,即使对于它认为“很小”的网站也是如此。它说,小型站点的站点所有者只需将其主页提交给Google,“前提是Google可以通过遵循从您的主页开始的链接路径来访问您的所有其他页面。”


With so many sites being on WordPress, it is less likely that there will be random orphaned pages that are not accessible by following links from the homepage  But depending on the specific WordPress theme used, sometimes there can be orphaned pages from pages being added but not manually added to the pages menu… in these cases, if a sitemap is used as well, those pages shouldn’t be missed even if not directly linked from the homepage. WordPress上的站点如此之多,不太可能出现随机的孤立页面,这些页面无法通过跟随主页上的链接进行访问,但是根据所使用的特定WordPress主题,有时可以添加页面中的孤立页面,但不能 手动添加到页面菜单中…在这种情况下,如果还使用了站点地图,则即使不直接从首页链接也不会错过这些页面。


In the “get your page linked to by another page” section, Google has added that links in “advertisements links that you pay for in other sites, links in comments, or other links that don’t follow the Google Webmaster Guidelines won’t be followed by Google.”  A small change, but Google is making it clear that it is a Google specific thing that these links won’t be followed, but they might be followed by other search engines. Google在“让您的页面链接到另一个页面”部分中,添加了“您在其他网站上付费的广告链接”,“评论中的链接”或不遵循“ Google网站站长指南”的其他链接中的链接 被Google跟踪。” 这是一个很小的变化,但是Google明确表示Google明确规定这些链接不会被跟踪,但是其他搜索引擎可能会跟踪它们。


But perhaps the most telling part of this is at the end of the crawling section, Google adds: 但Google补充说,其中最有说服力的部分是在爬网部分的末尾:


Google doesn’t accept payment to crawl a site more frequently, or rank it higher. If anyone tells you otherwise, they’re wrong. Google不接受付款来更频繁地抓取网站或对其排名更高。如果有人告诉你,他们错了。


It has long been an issue with scammy SEO companies to guarantee first positioning on Google, to increase rankings or requiring payment to submit a site to Google.  And with the ambiguous Google Partner badge for AdWords, many use the Google Partners badge to imply  they are certified by Google for SEO and organic ranking purposes.  That said, most of those who are reading the How Search Works probably are already aware of this.  But nice to see Google add this in writing again, for times when SEOs need to prove to clients that there is not a “pay to win” option, outside of AdWords, or simply to show someone who might be falling for some scammy SEO company’s claims of Google rankings. 狡猾的SEO公司长期以来一直要保证在Google上的排名第一,提高排名或需要付费才能将网站提交给Google一直是一个问题。而且由于AdWords的Google合作伙伴徽章含糊不清,许多人都使用Google合作伙伴徽章来暗示他们已通过Google的SEO和自然排名认证。就是说,大多数阅读“搜索方式”的人可能已经意识到这一点。但是很高兴看到Google再次以书面形式添加此内容,有时SEO需要向客户证明除了AdWords之外没有“支付即赢”的选择,或者只是向那些可能会欺骗某些SEO公司的人展示 Google排名的声明。


The Long Version


Google then gets into what they call the “long version” of How Google Search Works, with more details on the above sections, covering more nuances that impact SEO. 然后,Google进入所谓的“ Google搜索的工作方式”的“长篇版”,在上述部分中提供了更多详细信息,涵盖了影响SEO的更多细微差别。


Crawling


Google has changed how they refer to the “algorithmic process”.  Previously, it stated “Googlebot uses an algorithmic process: computer programs determine which sites to crawl, how often and how many pages to fetch from each site.”  Curiously, they removed the reference to “computer programs”, which provoked the question about which computer programs exactly Google was using. Google改变了他们对“算法过程”的称呼。以前,它说过“ Googlebot使用算法过程:计算机程序确定要爬网的站点,从每个站点获取的频率和数量。” 奇怪的是,他们删除了对“计算机程序”的引用,这引发了关于Google确切使用的计算机程序的问题。


The new updated version simply states:


Googlebot uses an algorithmic process to determine which sites to crawl, how often, and how many pages to fetch from each site. Googlebot使用算法过程来确定要爬网的站点,从每个站点获取的频率以及获取的页面数量。


Google also updated the wording for the crawl process, changing that it is “augmented with sitemap data” to “augmented by sitemap” data. Google还更新了抓取过程的用语,将其由“通过站点地图数据增强”改为“通过站点地图增强数据”。


Google also made a change where it referenced that Googlebot “detects” links and changed it to “finds” links, as well as changes from Googlebot visiting “each of these websites” to the much more specific “page”.  This second change makes it more accurate since Google visiting a website won’t necessarily mean it crawls all links on all pages.  The change to “page” makes it more accurate and specific for webmasters.  Google还进行了更改,引用Googlebot“检测”链接并将其更改为“查找”链接,以及从Googlebot访问“每个网站”到更具体的“页面”的更改。由于谷歌访问网站并不一定意味着它会爬网所有页面上的所有链接,因此第二个更改使其更加准确。对“页面”的更改使它更加准确和针对网站管理员。


Previously it read:


As Googlebot visits each of these websites it detects links on each page and adds them to its list of pages to crawl. 当Googlebot访问这些网站中的每个网站时,它会检测每个页面上的链接,并将它们添加到要抓取的页面列表中。


Now it reads:


When Googlebot visits a page it finds links on the page and adds them to its list of pages to crawl. 当Googlebot访问页面时,它会找到页面上的链接并将其添加到要爬网的页面列表中。


Google has added a new section about using Chrome to crawl:

During the crawl, Google renders the page using a recent version of Chrome. As part of the rendering process, it runs any page scripts it finds. If your site uses dynamically-generated content, be sure that you follow the JavaScript SEO basics. 在抓取过程中,Google使用最新版本的Chrome渲染页面。作为渲染过程的一部分,它将运行找到的所有页面脚本。如果您的网站使用动态生成的内容,请确保遵循JavaScript SEO基础知识。


By referencing a recent version of Chrome, this addition is clarifying the change from last year where Googlebot was finally upgraded to the latest version of Chromium for crawling, an update from Google only crawling with Chrome 41 for years. 通过引用最新版本的Chrome,此次添加可以明确说明自去年Googlebot最终升级到最新版本的Chromium进行爬网以来所做的更改,这是Google多年来仅使用Chrome 41进行爬网的更新。


Google also notes it runs “any page scripts it finds,” and advises site owners to be aware of possible crawl issues as a result of using dynamically-generated content with the use of JavaScript, specifying that site owners should ensure they follow their JavaScript SEO basics. Google还指出,它会运行“找到的所有页面脚本”,并建议网站所有者注意使用动态生成的内容和JavaScript所导致的爬网问题,并指定网站所有者应确保他们遵循其JavaScript SEO 基本。


Google also details the primary and secondary crawls, something that has garnered much confusion since Google revealed primary and secondary crawls, but Google’s details in this How Google Search Works documents detail it differently than how some SEOs previously interpreted it. Google还详细介绍了主要爬网和辅助爬网,自从Google揭示了主要爬网和辅助爬网以来,这引起了很大的混乱,但是Google在“ Google搜索方式”文档中的详细信息与某些SEO先前对它的解释不同。


Here is the entire new section for primary and secondary crawls: 这是主要和辅助爬网的整个新部分:


Primary crawl / secondary crawl


Google uses two different crawlers for crawling websites: a mobile crawler and a desktop crawler. Each crawler type simulates a user visiting your page with a device of that type. Google使用两种不同的搜寻器来搜寻网站:移动搜寻器和桌面搜寻器。每种搜寻器类型都模拟一个用户使用该类型的设备访问您的网页。


Google uses one crawler type (mobile or desktop) as the primary crawler for your site. All pages on your site that are crawled by Google are crawled using the primary crawler. The primary crawler for all new websites is the mobile crawler. Google使用一种爬虫类型(移动或桌面)作为您网站的主要爬虫。Google抓取的网站上的所有页面都是使用主抓取工具抓取的。所有新网站的主要搜寻器是移动搜寻器。


In addition, Google recrawls a few pages on your site with the other crawler type (mobile or desktop). This is called the secondary crawl, and is done to see how well your site works with the other device type. 此外,Google会使用其他搜寻器类型(移动设备或台式机)来检索您网站上的几个页面。这称为第二次爬网,用于查看您的站点在其他设备类型下的运行情况。


In this section, Google refers to primary and secondary crawls as being specific to their two crawlers – the mobile crawler and the desktop crawler.  Many SEOs think of primary and secondary crawling in reference to Googlebot making two passes over a page, where javascript is rendered on the secondary crawl.  So while Google clarifies their use of desktop and mobile Googlebots, the use of language here does cause confusion for those who use this to refer to the primary and secondary crawls for javascript purposes.  So to be clear, Google’s reference to their primary and secondary crawl has nothing to do with javascript rendering, but only to how they use both mobile and desktop Googlebots to crawl and check a page. Google在本节中将主要和辅助爬网称为特定于它们的两个爬网程序-移动爬网程序和桌面爬网程序。许多SEO会参考Googlebot在页面上进行两次传递来考虑主要和辅助爬网,在页面上会在辅助爬网上呈现javascript。因此,尽管Google澄清了他们对台式机和移动Googlebot的使用,但此处的语言使用确实使使用JavaScript来指代主要和次要抓取的人感到困惑。需要明确的是,Google提及其主抓取和辅助抓取与javascript渲染无关,而仅涉及他们如何使用移动和桌面Googlebots抓取和检查页面。


What Google is clarifying in this specific reference to primary and secondary crawl is that Google is using two crawlers – both mobile and desktop versions of Googlebot – and will crawl sites using a combination of both. Google在此特定参考中对主爬网和辅助爬网的澄清是,谷歌正在使用两个爬网程序(包括Googlebot的移动版和台式机版本),并将结合使用这两个爬网程序来爬网站。


Google did specifically state that new websites are crawled with the mobile crawler in their “Mobile-First Indexing Best Practices” document, as of July 2019.  But this is the first time it has made an appearance in their How Google Search Works document. 谷歌确实在2019年7月的``移动优先索引最佳实践''文档中特别声明了使用移动爬网程序对新网站进行爬网。


Google does go into more detail about how it uses both the desktop and mobile Googlebots, particularly for sites that are currently considered mobile first by Google.  It wasn’t clear just how much Google was checking desktop versions of sites if they were mobile first, and there have been some who have tried to take advantage of this by presenting a spammier version to desktop users, or in some cases completely different content.  But Google is confirming it is still checking the alternate version of the page with their crawlers. Google确实更详细地介绍了它如何同时使用台式机和移动Googlebot,尤其是对于那些目前被Google视为移动优先的网站。尚不清楚Google是否首先检查了桌面版本的网站是否是移动设备,并且有一些人试图通过向桌面用户提供垃圾邮件版本来利用此功能,或者在某些情况下,它们会提供完全不同的内容 。但是Google确认仍在与其抓取工具一起检查网页的备用版本。


So sites that are mobile first will see some of their pages crawled with the desktop crawler.  However, it still isn’t clear how Google handles cases where they are vastly different, especially when done for spam reasons, as there doesn’t seem to be any penalty for doing so, aside from a possible spam manual action if it is checked or a spam report is submitted.  And this would have been a perfect opportunity to be clearer about how Google will handle pages with vastly different content depending on whether it is viewed on desktop or on mobile.  Even in the mobile friendly documents, Google only warns about ranking differences if content is on the desktop version of the page but is missing on the mobile version of the page. 因此,最先移动的网站将看到其某些网页已通过桌面搜寻器进行了搜寻。但是,目前尚不清楚Google如何处理截然不同的情况,尤其是在由于垃圾邮件原因而处理的情况下,因为除了检查可能的垃圾邮件手动操作外,这样做似乎没有任何惩罚。或提交垃圾邮件报告。这将是一个绝佳的机会,可以使您更清楚地了解Google将如何处理在桌面上或在移动设备上查看的内容完全不同的页面。即使在适合移动设备的文档中,如果内容在页面的桌面版本上,但在页面的移动版本上缺少,则Google仅警告等级差异。


How does Google find a page?


Google has removed this section entirely from the new version of the document. Google已从文档的新版本中完全删除了此部分。


Here is what was included in it:


How does Google find a page?


Google uses many techniques to find a page, including:

· Following links from other sites or pages

· Reading sitemaps


It isn’t clear why Google removed this specifically.  It is slightly redundant, but it was missing the submitting a URL option as well. 尚不清楚Google为什么专门删除此内容。它有点多余,但是也缺少提交URL选项。


Improving Your Crawling


Google makes the use of hreflang a bit clearer, especially for those who might just be learning what hreflang is and how it works by providing a bit more detail. Google使hreflang的使用更加清晰,特别是对于那些可能只是通过提供更多细节来了解hreflang是什么以及它如何工作的人。


Formerly it said “Use hreflang to point to alternate language pages.”  Now it states “Use hreflang to point to alternate versions of your page in other languages.” 以前它说“使用hreflang指向备用语言页面。” 现在,它显示“使用hreflang指向其他语言的页面备用版本。”


Not a huge change, but a bit clearer. 变化不大,但更清晰。


Google has also added two new points, providing more detail about ensuring Googlebot is able to access all the content on the page, not just the content (words) specifically. Google还增加了两个新点,提供了更多有关确保Googlebot能够访问页面上所有内容的详细信息,而不仅仅是特定的内容(单词)。


First, Google added:


Be sure that Google can access the key pages, and also the important resources (images, CSS files, scripts) needed to render the page properly. 确保Google可以访问关键页面以及正确呈现页面所需的重要资源(图像,CSS文件,脚本)。


So Google is stressing about ensuring Google can access all the important content.  And it is also specifically calling attention to other types of elements on the page that Google wants to also have access to in order to properly crawl the page, including images, CSS and scripts.  For those webmasters who went through the whole “mobile first indexing” launch, they are fairly familiar with issues surrounding blocking files, especially CSS and scripts, something that some CMS had blocked Googlebot from crawling by default.  因此,Google强调要确保Google可以访问所有重要内容。而且,它还特别要引起人们注意页面上Google希望也可以访问的其他类型的元素,以便正确地爬行页面,包括图像,CSS和脚本。对于那些经历了整个“移动优先索引”发布的网站管理员,他们非常熟悉围绕阻止文件(尤其是CSS和脚本)的问题,某些CMS默认已阻止了Googlebot进行爬网。


But for newer site owners, they might not realize this was possible, or that they might be doing it.  It would have been nice to see Google add specific information on how those newer to SEO can check for this, particularly for those who also might not be clear on what exactly “rendering” means. 但是对于新的网站所有者,他们可能没有意识到这是可能的,或者他们可能正在这样做。很高兴看到Google添加有关SEO的新手如何检查的特定信息,特别是对于那些可能还不清楚“渲染”到底是什么的人。


Google also added:


Confirm that Google can access and render your page properly by running the URL Inspection tool on the live page. 通过运行实时页面上的URL检查工具,确认Google可以正确访问和呈现您的页面。


Here Google does add specific information about using the URL Inspection tool in order to see what site owners are blocking or content that is causing issues when Google tries to render it.  I think these last two new points could have been combined, and made slightly clearer for how site owners can use the tool to check for all these issues. Google确实在此处添加了有关使用URL检查工具的特定信息,以便查看哪些站点所有者正在阻止哪些内容,或者哪些内容导致了在Google尝试呈现该问题时引起问题的内容。我认为可以将这最后两个点结合起来,使站点所有者可以使用该工具检查所有这些问题的方式更加清晰。


Indexing


Google has made significant changes to this section as well. And Google starts off with making major changes to the first paragraph.  Here is the original version: Google也对该部分进行了重大更改。Google首先对第一段进行了重大更改。这是原始版本:


Googlebot processes each of the pages it crawls in order to compile a massive index of all the words it sees and their location on each page. In addition, we process information included in key content tags and attributes, such as <title> tags and alt attributes.


The updated version now reads:


Googlebot processes each page it crawls in order to understand the content of the page. This includes processing the textual content, key content tags and attributes, such as <title> tags and alt attributes, images, videos, and more. Googlebot处理其抓取的每个页面,以了解该页面的内容。这包括处理文本内容,关键内容标签和属性,例如<title>标签和alt属性,图像,视频等。


Google no longer states it processes pages to “compile a massive index of all the words it sees and their location on each page.”  This was always a curious way for them to call attention to the fact they are simply indexing all words it comes across and their position on a page, when in reality it is a lot more complex than that.  So it definitely clears that up. Google不再声明将页面处理为“对所看到的所有单词及其在每个页面上的位置进行大规模索引编制”。对于他们来说,这总是一种奇怪的方式来引起人们注意,他们只是索引所有碰到的单词以及它们在页面上的位置,而实际上这要复杂得多。因此,它肯定可以解决这一问题。


They have also added that they are processing “textual content” which is basically calling attention to the fact it indexes the words on the page, something that was assumed by everyone.  But it does differentiate between the new addition later in the paragraph regarding images, videos and more. 他们还补充说,他们正在处理“文本内容”,这基本上是在引起人们注意它为页面上的单词建立索引的事实,这是每个人都假定的。但这确实区分了本段后面有关图像,视频等的新增内容。


Previously, Google simply made reference to attributes such as title and alt tags and attributes.  But now it is getting more granular, specifically referring to “images, videos and more.”  However, this does mean Google is considering images, videos and “more” to understand the content on the page, which could affect rankings.  以前,Google仅引用诸如标题和alt标签以及属性之类的属性。但是现在,它变得越来越细化,特别是指“图像,视频等”。但是,这确实意味着Google正在考虑使用图片,视频和“更多”内容来理解页面上的内容,这可能会影响排名。


Improving your Indexing


Google changed “read our SEO guide for more tips” to “Read our basic SEO guide and advanced user guide for more tips.” Google将“阅读我们的SEO指南以获取更多提示”更改为“阅读我们的基本SEO指南和高级用户指南以获取更多提示。”


What is a document?


Google has added a massive section here called “What is a document?”  It talks specifically about how Google determines what is a document, but also includes details about how Google views multiple pages with identical content as a single document, even with different URLs, and how it determines canonicals. Google在此处添加了一个很大的部分,称为“什么是文档?” 它专门讨论了Google如何确定什么是文档,还包括有关Google如何查看具有与单个文档相同的内容(甚至具有不同的URL)的多个页面以及如何确定规范的详细信息。


First, here is the first part of this new section:

What is a “document”?


Internally, Google represents the web as an (enormous) set of documents. Each document represents one or more web pages. These pages are either identical or very similar, but are essentially the same content, reachable by different URLs. The different URLs in a document can lead to exactly the same page (for instance, example.com/dresses/summer/1234 and example.com?product=1234 might show the same page), or the same page with small variations intended for users on different devices (for example, example.com/mypage for desktop users and m.example.com/mypage for mobile users). 在内部,Google将网络表示为一组(巨大的)文档。每个文档代表一个或多个网页。这些页面是相同的或非常相似的,但是本质上是相同的内容,可以通过不同的URL访问。文档中不同的URL可以导致完全相同的页面(例如,example.com/dresses/summer/1234和example.com?product=1234可能显示同一页面),或者同一页面具有较小的差异, 用户(例如,台式机用户为example.com/mypage,移动用户为m.example.com/mypage)。


Google chooses one of the URLs in a document and defines it as the document’s canonical URL. The document’s canonical URL is the one that Google crawls and indexes most often; the other URLs are considered duplicates or alternates, and may occasionally be crawled, or served according to the user request: for instance, if a document’s canonical URL is the mobile URL, Google will still probably serve the desktop (alternate) URL for users searching on desktop. Google选择文档中的一个URL并将其定义为文档的规范URL。文档的规范网址是Google最常抓取和编制索引的网址。其他网址则被视为重复网址或备用网址,并可能会根据用户请求进行抓取或提供:例如,如果文档的规范网址是移动网址,则Google仍可能会为用户搜索提供桌面(备用)网址 在桌面上。


Most reports in Search Console attribute data to the document’s canonical URL. Some tools (such as the Inspect URL tool) support testing alternate URLs, but inspecting the canonical URL should provide information about the alternate URLs as well. Search Console中的大多数报告都将数据归于文档的规范网址。某些工具(例如“检查URL”工具)支持测试备用URL,但是检查规范URL也应提供有关备用URL的信息。


You can tell Google which URL you prefer to be canonical, but Google may choose a different canonical for various reasons. 您可以告诉Google您希望使用哪个规范网址,但是出于各种原因,Google可能会选择其他规范网址。


So the tl:dr is that Google will view pages with identical  or near-identical content as the same document, regardless of how many of them there are.  For seasoned SEOs, we know this as internal duplicate content. 因此tl:dr是Google将查看与同一文档具有相同或几乎相同内容的页面,而不管其中有多少页面。对于经验丰富的SEO,我们将其称为内部重复内容。


Google also states that when Google determines these duplicates, they may not be crawled as often.  This is important to note for site owners that are working to de-duplicate content which Google is considering duplicate.  So it would be more important to submit these URLs to be recrawled, or give those newly de-duplicated pages links from the homepage in order to ensure Google recrawls and indexed the new content, so Google de-dupes them properly. Google还声明,当Google确定这些重复项时,它们可能不会被频繁检索。对于正在努力重复删除Google认为重复的内容的网站所有者,这一点很重要。因此,更重要的是提交要重新爬网的这些URL,或从首页提供那些新删除重复的页面的链接,以确保Google重新抓取并索引新内容,以便Google正确地对它们进行重复删除。


It also brings up an important note about desktop versus mobile, that Google will still likely serve the desktop version of a page instead of the mobile version for desktop users, when a site has two different URLs for the same page where is designed for mobile users and the other for desktop.  While many websites have changed to serving the same URL and content for both using responsive design, some sites still run two completely different sites and URLs for desktop and mobile users. 它还提出了有关台式机与移动设备的重要说明,即当网站为移动用户设计的同一页面有两个不同的URL时,Google仍可能会为页面用户提供页面的桌面版本,而不是面向桌面用户的移动版本。另一个用于桌面。尽管许多网站已更改为使用响应式设计为两个网站提供相同的URL和内容,但某些网站仍为台式机和移动用户运行两个完全不同的网站和URL。


Google also mentions that you can tell Google the URL you prefer Google to use as the canonical, but states they can chose a different URL “for various reasons.”  While Google doesn’t detail specifics about why Google might choose a different canonical than the one the site owner specifies, it is usually due to http vs https, if a page is included in a sitemap or not, page quality, if the pages appear to be completely different and should not be canonicalized, or due to significant incoming links to the non-canonical URL. Google还提到您可以告诉Google您希望Google用作规范的URL,但指出“出于各种原因”,他们可以选择其他URL。尽管Google并未详细说明为何Google选择与网站所有者指定的规范不同的规范,但这通常是由于http vs https造成的,如果站点地图中是否包含某个页面,则页面质量,如果页面出现 完全不同且不应规范化,或由于大量输入到非规范URL的链接。


Google has also included definitions for many o the terms used by SEOs and in Google Search Console. Google还为SEO和Google Search Console中的许多术语提供了定义。


Document: A collection of similar pages. Has a canonical URL, and possibly alternate URLs, if your site has duplicate pages. URLs in the document can be from the same or different organization (the root domain, for example “google” in www.google.com). Google chooses the best URL to show in Search results according to the platform (mobile/desktop), user language‡ or location, and many other variables. Google discovers related pages on your site by organic crawling, or by site-implemented features such as redirects or <link rel=alternate/canonical> tags. Related pages on other organizations can only be marked as alternates if explicitly coded by your site (through redirects or link tags). 文件:相似页面的集合。如果您的网站有重复的页面,则有一个规范的URL,并可能有备用URL。文档中的URL可以来自相同或不同的组织(根域,例如www.google.com中的“ google”)。Google会根据平台(移动/台式机),用户语言‡或位置以及许多其他变量,选择在搜索结果中显示的最佳URL。Google通过自然爬网或网站实现的功能(例如重定向或<link rel = alternate / canonical>标签)发现您网站上的相关页面。如果您的站点(通过重定向或链接标记)明确编码,则其他组织上的相关页面只能标记为替代页面。


Again, Google is talking about the fact a single document can encompass more than just a single URL, as Google will consider a single document to potentially have many duplicate or near duplicate pages as well as pages assigned via canonical.  Google makes specific mention about “alternates” that appear on other sites, that can only be considered alternates if the site owner specifically codes it.  And that Google will choose the best URL from within the collection of documents to show. 再次,Google在谈论一个事实,即一个文档可以包含多个URL,因为Google会认为一个文档可能包含许多重复或接近重复的页面以及通过规范分配的页面。Google特别提到了其他网站上出现的“替代”,只有在网站所有者明确编码后才能视为替代。而且Google将从显示的文档集中选择最佳的URL。


But it fails to mention that Google can consider pages duplicate on other sites and will not show those duplicates, even if they aren’t from the same sites, something that site owners see happen frequently when someone steals content and sometimes sees the stolen version ranking over the original. 但是,它没有提及Google可以认为页面在其他网站上重复,即使这些页面不是来自同一网站,也不会显示这些重复,当某人窃取内容并有时看到被盗版本时,网站所有者经常会看到这种情况 超过原来的。


There was a notation added for the above, dealing with hreflang. 上面为hreflang添加了一种表示法。


‡Pages with the same content in different languages are stored in different documents that reference each other using hreflang tags; this is why it’s important to use hreflang tags for translated content. ‡具有不同语言的具有相同内容的页面存储在使用hreflang标记相互引用的不同文档中; 这就是为什么将hreflang标记用于翻译内容很重要的原因。


Google shows that it doesn’t include identical content under the same “document” when it is simply in a different language, which is interesting.  But Google is tressing the importance of using hreflang in these cases. Google表示,仅使用另一种语言,就不会在同一“文档”下包含相同的内容,这很有趣。但是在这些情况下,Google强调使用hreflang的重要性。


URL: The URL used to reach a given piece of content on a site. The site might resolve different URLs to the same page. URL:用于访问网站上给定内容的URL。该网站可能会将不同的URL解析到同一页面。


Pretty self explanatory, although it does have reference to the fact different URLs can be resolved to the same page, presumably such as with redirects or alias. 尽管它确实提到了可以将不同的URL解析到同一页面的事实,这很容易解释,大概是诸如重定向或别名之类的。


Page: A given web page, reached by one or more URLs. There can be different versions of a page, depending on the user’s platform (mobile, desktop, tablet, and so on). 页面:给定的网页,可以通过一个或多个URL进行访问。页面的版本可能不同,具体取决于用户的平台(移动设备,台式机,平板电脑等)。


Also pretty self explanatory, bringing up the specifics that some site owners can be served different versions of the same page, such as if they try and view the same page on a mobile device versus a desktop computer. 这也很不言自明,提出了可以为某些网站所有者提供同一页面的不同版本的细节,例如,如果他们尝试在移动设备而非台式机上查看同一页面。


Version: One variation of the page, typically categorized as “mobile,” “desktop,” and “AMP” (although AMP can itself have mobile and desktop versions). Each version can have a different URL (example.com vs m.example.com) or the same URL (if your site uses dynamic serving or responsive web design, the same URL can show different versions of the same page) depending on your site configuration. Language variations are not considered different versions, but different documents. 版本:页面的一种变体,通常分为“移动”,“桌面”和“ AMP”(尽管AMP本身可以具有移动和桌面版本)。每个版本可以具有不同的URL(example.com与m.example.com)或相同的URL(如果您的站点使用动态服务或响应式网页设计,则相同的URL可以显示同一页面的不同版本),具体取决于您的站点 组态。语言变体不是不同的版本,而是不同的文档。


Simply clarifying with greater details the different versions of a page, and how Google typically categorizes them as “mobile,” “desktop,” and “AMP”. 只需更详细地说明页面的不同版本,以及Google通常如何将其分类为“移动”,“桌面”和“ AMP”即可。


Canonical page or URL: The URL that Google considers as most representative of the document. Google always crawls this URL; duplicate URLs in the document are occasionally crawled as well. 规范页面或URL:Google认为最能代表文档的URL。Google始终会抓取该URL;偶尔也会抓取文档中重复的URL。


Google states here again that non-canonical pages are not crawled as often as the main canonical that a site owner assigns to a group of pages they want canonical.  Google does not include specific mention here that they sometimes chose a different page as the canonical one, even if there is a specific page designated as the canonical one. Google在此再次声明,非规范页面的抓取频率不如网站所有者分配给他们想要的规范页面组的主要规范。Google此处未特别提及他们有时会选择其他页面作为规范页面,即使有特定页面被指定为规范页面。


Alternate/duplicate page or URL: The document URL that Google might occasionally crawl. Google also serves these URLs if they are appropriate to the user and request (for example, an alternate URL for desktop users will be served for desktop requests rather than a canonical mobile URL). 备用/重复页面或URL:Google可能偶尔抓取的文档URL。如果这些URL适合用户和请求,则Google也会提供这些URL(例如,将为桌面用户提供替代URL,而非桌面移动URL)。


The key takeaway here is that Google “might” occasionally crawl the site’s duplicate or alternative page.  And here they stress that Google will serve these alternative URLs “if they are appropriate.”  It is unfortunate they don’t go into greater detail in why they might serve these pages instead of the canonical, outside of the mention of desktop versus mobile, as we have seen many cases where Google picks a different page to show other than the canonical for a myriad of reasons. 这里的主要要点是,Google有时可能会“抓取”该网站的重复页面或替代页面。他们在这里强调,Google将“在适当时”提供这些替代网址。不幸的是,除了提及台式机还是移动设备之外,他们没有更详细地说明为什么可以提供这些页面而不是标准页面,因为我们已经看到许多情况下Google选择了不同的页面来显示标准页面以外的内容 由于种种原因。


Google also fails to mention how this impacts duplicate content found on other sites, we we do know Google will crawl those less often as well. Google也没有提及这会对其他网站上的重复内容产生怎样的影响,我们知道Google也将减少对这些内容的检索。


Site: Usually used as a synonym for a website (a conceptually related set of web pages), but sometimes used as a synonym for a Search Console property, although a property can actually be defined as only part of a site. A site can span subdomains (and even domains, for properly linked AMP pages). 网站:通常用作网站(概念上相关的一组网页)的同义词,但有时也用作Search Console属性的同义词,尽管实际上可以将属性定义为网站的一部分。一个站点可以跨越子域(对于正确链接的AMP页面,甚至可以是域)。


Interesting to note here what they consider a website – a conceptually related set of webpages – and how it related to the usage of a Google Search Console property, as “a property can actually be defined as only part of a site.” 在这里有趣地指出他们认为网站是什么(概念上相关的一组网页),以及它与Google Search Console属性的使用如何相关,因为“属性实际上只能定义为网站的一部分。”


Google does make mention that AMP, which technically appear on a different domain, are considered part of the main site. Google确实提到AMP,它在技术上出现在不同的域中,被视为主站点的一部分。


Serving Results


Google has made a pretty interesting specific change here in regards to their ranking factors.  Previously, Google stated: Google在排名方面做了非常有趣的具体更改。谷歌此前曾表示:


Relevancy is determined by over 200 factors, and we always work on improving our algorithm. 相关性是由200多个因素决定的,我们一直在努力改进算法。


Google has now updated this “over 200 factors” with a less specific one. 谷歌现在用一个不太具体的因素更新了“ 200多个因素”。


Relevancy is determined by hundreds of factors, and we always work on improving our algorithm. 相关性由数百个因素决定,我们一直在努力改进算法。


The 200 factors in the How Google Search Works dates back to 2013 when the document was launched, although then it also made reference to PageRank (“Relevancy is determined by over 200 factors, one of which is the PageRank for a given page”) which Google removed when they redesigned their document in 2018. “ Google搜索的工作方式”中的200个因素可追溯到文档发布时的2013年,尽管当时它还引用了PageRank(“相关性由200多个因素决定,其中之一是给定页面的PageRank”) Google在2018年重新设计文档时将其删除。


While Google doesn’t go into specifics on the number anymore, it can be assumed that a significant number of ranking factors have been added since 2013 when this was first claimed in this document.  But I am sure some SEOs will be disappointed we don’t get a brand new shiny number like “over 500” ranking factors that SEOs can obsess about. 尽管Google不再详细说明该数字,但可以假设,自2013年在本文档中首次提出此要求以来,已经添加了大量的排名因素。但是我敢肯定,有些SEO会让我们感到失望的,因为我们没有获得SEO可以关注的全新的闪亮数字,例如“超过500”的排名因素。


Final Thoughts


There are some pretty significant changes made to this document that SEOs can get a bit of insight from. SEO可以对此文档进行一些相当重要的更改,以获取一些见识。


Google’s description of what it considers a document and how it relates to other identical or near-identical pages on a site is interesting, as well as Google’s crawling behavior towards the pages within a document it considers as alternate pages.  While this behavior has often been noted, it is more concrete information on how site owners should handle these duplicate and near-duplicate pages, particularly when they are trying to un-duplicate those pages and see them crawled and indexed as their own document. Google对它认为是什么文档以及它与网站上其他相同或接近相同页面之间的关系的描述,以及Google朝着它认为是替代页面的文档中页面的爬行行为,都很有趣。尽管经常会注意到这种行为,但它是有关网站所有者应如何处理这些重复和近乎重复的页面的更具体的信息,尤其是当他们尝试取消重复这些页面并看到它们已作为自己的文档进行爬网和建立索引时。


They added a lot of useful advice for newer site owners, which is particularly helpful with so many new websites coming online this year due to the global pandemic.  Things such as checking a site without being logged in, how to submit both pages and sites to Google, etc. 他们为新站点所有者增加了很多有用的建议,这对于今年由于全球大流行而新上线的众多新站点尤其有用。诸如检查未登录的站点,如何将页面和站点都提交给Google等之类的事情。


The mention of what Google considers a “small site” is interesting because it gives a more concrete reference point for how Google sees large versus small sites.  For some, a small site could mean under 30 pages and the idea of a site with millions of pages being unfathomable.  And the reinforcement of a strong navigation, even for “small sites” is useful for showing site owners and clients who might push for navigation that is more aesthetic than practical for both usability and SEO. 提及Google认为“小型网站”很有趣,因为它为Google如何看待大型网站与小型网站提供了更为具体的参考点。对于某些人来说,一个小型站点可能意味着不到30页,而拥有数百万个页面的站点的构想却是深不可测的。而且,即使对于“小型网站”而言,增强的导航功能也很有用,它有助于向可能会要求导航的网站所有者和客户展示对可用性和SEO而言比实际更美观的导航。


The primary and secondary crawl additions will probably cause some confusion for those who think of primary and secondary in terms of how Google processes scripts on a page when it crawls it.  But it is nice to have more concrete information on how and when Google will crawl using the alternate version of Googlebot for sites that are usually crawled with either the mobile Googlebot or the desktop one。 主要和次要爬网选项可能会使那些想到主要和次要爬网的人感到困惑,因为他们会在Google对其进行爬网时如何处理页面上的脚本。但是,对于通常使用移动Googlebot或台式机爬网的网站,有更具体的信息来说明如何使用Googlebot的替代版本以及何时爬网Google会更好。


Lastly, the change from the “200 ranking factors” to a less specific, but presumably much higher number of ranking factors will disappoint some SEOs who liked having some kind of specific number of potential ranking factors to work out. 最后,从“ 200个排名因素”到不太具体但可能更高的排名因素的变化将使一些SEO失望,他们喜欢使用某种特定数量的潜在排名因素来解决。


 来源:JENNIFER SLEGG



点赞(2) 打赏

评论列表 共有 0 条评论

暂无评论

服务号

订阅号

备注【拉群】

商务洽谈

微信联系站长

发表
评论
立即
投稿
返回
顶部