Google Search Engine Problems

With the recent explosion in the number of websites, the use of search engines such as Google, returns an increasing number of hits.

The problem is that the majority of the top-ranked websites, end up being useless for the objective of the person seeking relevant information.

One side of irrelevance, that Google handles, is the question of the website’s importance, partially determined by the nature of its ranking. In its algorithm, Google considers, in a weighted manner, an index of websites that link to the website in question.

The other side of irrelevance is more subtle: besides the words included in the search, there is the issue of context, which is known by the individual performing the search, and therefore, Google has no way of guess this.

For example, when the user types Canon A470, he could be wanting:

  • To buy the camera in a store
  • To find the lowest price
  • To see reviews written by experts
  • Read reader’s opinions about the camera
  • Understand how the camera works
  • Etc.

These purposes cannot be properly resolved by simply adding more words to the search, since, for example, e-commerce websites may include websites with reviews (most often with no review posted), or there could be some product advice, but not expert advice, etc.

However, in the life cycle of a camera, there is only one moment in which a person is buying the camera, and all of the other moments its owner is interacting with the camera and not buying it. Continuously trying to sell a camera to someone who is not looking to buy one is like selling movie tickets to someone already watching the movie.

Furthermore, for most people, thinking of additional words to refine a search is a difficult process. Very few people have the ability, experience and persistence to be usually successful in complex searches.

In many cases, it is not possible precisely and quickly navigate websites, only using words, in order to contextualize the search adequately to serve the user.

The consequence is that many users give up and end up looking at one of their favorite websites, postponing the search, or even taking a different course of action.

The irrelevance increases even more from the use of SEO techniques (Search Engine Optimization), which tends to improve the rank of websites that have the resources and desire to do so.

»Top

Proposal: Topic Browsing in Google Search

Contrary to what may seem at first, this idea does not represent anything that has ever existed.

The Google Directory was discontinued in July 2011, because the directory model based on voluntary work, created in 1998, is doomed to fail because it can not maintain a standard of quality, consistency and continuity for your users, since there are already hundreds of millions of content items, including sites, blogs and social networks. Today this directory (Open Directory) can be accessed at Dmoz.

Another model is the one directory that is maintained by the search engine staff. This model is what made the reputation of Yahoo! since 1994. However, It is too expensive mechanism to guarantee a minimum quality standard, without the cost of continuous updating becomes prohibitive. Yahoo! Directory exists until today, but it is outside from Yahoo!'s homepage since 2002.

The proposal here is a Search Engine using words, like Google, combined with a Web Directory or Directory Search. However, is totally different from previous efforts because the classification will be maintened not by a small group of people, but by millions of content owners. This proposal can be divided into several steps:

1 – Google creates a Classification Tree

Google must create a classification tree for the websites in the style of Google Directory. This topic or theme tree, if so desired, can be somewhat simpler, in order to facilitate classification for those responsible for each of the websites.

The major themes will appear in the first level (e-commerce, references, education, etc.). Each theme opens up to sub-themes, and so forth, until reaching the leaf classification, assumed to be the most detailed level of classification desired.

It is not necessary that the tree have the same number of levels in each of its “branches”.

Replication of the final classification is accepted at different points of the tree, as already happens in Google Directory and the like.

2 – Google develops a cloud Classification Application

Google should develop a simple application that allows the person responsible for a website or content classify it in one theme from classification tree. (For example, Dictionary within Reference => Languages)

If a site addresses more than one theme in different sections, other registrations may be made, provided there is some type of separation of URL addresses (sub-sites). This happens, for example, in the case of any blog, that is located in a generic domain.

This process is technically done by incorporating a specific meta tag provided by Google that uniquely identifies the site or sub-site that is doing the classification.

To prevent the information from being cloned by a competitor (such as Bing), the meta tag is coded and doesn't contains the theme. It should be stored in the Google servers, along with the classification assigned by the user.

This procedure practically guarantees the authenticity of classification, because it should be done by someone who has the authority to update the site or sub-site. Subsequently, if the meta tag is removed from the content, there is no problem because certification of authenticity already happened upon registration. Only the one responsible for the content will be able to change the classification of the website.

3 – Website is classified using the tree

Soon after the user uses this application and uploads the updated website, the website will be “available” for automatic topic classification, to be performed during the next round of page indexing done by Google.

Thus, every time that Google re-indexes the websites, it also re-indexes the theme linked to the website classification, using their internal data bank.

4 – User benefits from Classification

When the user performs any search on Google, it returns the results that it has always returned. Until then, nothing has changed.

However, there is a major difference. Google would place on the left, in an organized and non-intrusive fashion, links that correspond to the first level of classification by themes.

 

Below is an example in which some possible first levels are listed, simply as an example, in the form of links. In reality there would be more links. Here, I am not suggesting what the links should be; this would be irrelevant to grasping the idea.

Google 1

If the user clicks on Products, he drops a level in the classification tree and the websites change accordingly:

 

Google 2

This is the great innovation. The user now has access to websites that contain the desired words, but also relate to the topics that he would like to see. Thus, the results are much more useful to the user than the initial page that also lists hits related to blogs, news, chronicles, which simply “pollute” the results. All with a single click.

Note that the second view shows the “sub-branches” of Products, while the branch Products is displayed on the top of the frame or table as a link, allowing users to return to the main topic with just one click.

Thus, the user can refine their search either by using the classification tree and quickly reducing the amount of websites involved or by adding new words to their previous search, as can already be done.

With another mouse click the user has access to, for example, various Review websites, all together. Overall, he performs, in addition to typing the words “Canon” , "Rebel” and "Eos", just two clicks.


Google 3

 

»Top


Parallel Trees

Based on this mechanism, Google may decide to make other parallel trees with new kind of classifications.

For example, as many things in life revolve around business, there is the very interesting possibility of creating a parallel tree, in the same manner as the topic tree, specifically for including all of the sites that provide a product or service, paid or free, to sales, specialized reviews, price comparison, etc.

Google would benefit from this by having access to a powerful commercial tool while the user would have an even more powerful tool for filtering information.

Google can sell “Ad-What”, allowing the site to be classified within those things that it sells.

This is quite interesting for websites, because it is very common for users to be interested in some product or service and not have a clear idea of how to proceed.

For example, if the user wants to find all of the websites that sell cameras, he doesn’t need to choose any words. Just click on Products: Online Stores in a tree and in another click Procucts End Consumer: Electronics: Cameras. Once this has been done, only the sites that sell cameras will appear

Details of Implementation

  • If the user does not classify their website, they remain on the first level called Not Classified.
  • If the user classifies their website incorrectly, by accident, he can reclassify the website. He must only wait a few days for Google to perform the next round of indexing to have it corrected on Google’s classification tree.
  • If the user intentionally classifies their site incorrectly, users can report the classification error.
  • To reduce fraud, the complaint form should be robot-proof by, for example, using the captchas technique.
  • The number of complaints should be calculated in relation to different IPs, or even better, by the amount of different fixed IPs, that is, using a fixed node linked to the dynamic IP of each user. This entire process serves to minimize fraud.
  • The index to be calculated, within a prescribed amount of time, is
           Complaint Index = Number of fixed IPs that filed complaints /
           Total number of fixed IPs that accessed the website via Google
    If this index passes a threshold percentage, the website automatically changes to Not Classified, and the user will receive an automatic e-mail from Google. The website will remain Not Classified until the user reclassifies the website.
  • The second invalid classification, reported in the manner described above, generates a quarantine period, in which the website remains as Not Classified, regardless of reclassification.
  • If the complaint, despite all of the precautions, is a fraud, the website, if it has a large audience, may be singled out for auditing by Google.
  • If it is singled out for auditing, the Google team will definitively classify the website and send an e-mail to the person responsible for the website, which will have a certain amount of time to accept the website’s reclassification or suggest a proper classification, which will be subject to review.
  • The details of the operation described above follow just one line of thought, and can have another approach, either more or less automated.

The important thing is to have a process that requires almost no manual work by Google, in addition to using the help of other users to progressively improve the quality of classification.

»Top