When semantic search works it appears to be, depending upon your perspective, either magical or intelligent in the way a person is. As it tunes out neither of these two perspectives is correct. Semantic search, just like Boolean search before it is based upon mathematics, statistical models and probabilities. What is different about it however is the degree of complexity and self-checking that goes on in the background, prior to an answer appearing in the search results for a query.

The machine learning algorithms that power all this are trained on a three-step program that uses different sets of Training and Test Data to perform a sequence of:

Training
Validation
Testing

Remember that in the wild data is subject to the 4Vs of:

Volume
Velocity
Variety
Veracity

Solving for the last one first allows us to filter out many of the ambiguities produced by the first three. But here’s the problem: Testing is linked to the quality of the Classification we started off with. If the Classifier we have put in place is not sufficiently 'good' to allow us to be able to recognize things we have never seen before then it will be unable to deal with the new data that comes onto the web all the time and search will be unable to deal with queries it has never encountered before.

This may seem like a simple problem to solve by actually testing the outcomes of validation, seeing where the algorithm has failed and then tweaking the parameters to allow it to perform better. The problem is that as we tweak the parameters we are actually, incrementally informing the algorithm of the nature of our Test Data therefore when it comes to testing it, it appears to perform better and better against the Test Data we are using because it has come to ‘see’ it, but it performs poorly in the wild when it is up against data it has never encountered before. What has happened in this case is that the algorithm knows exactly what we are testing for so it is ‘cheating’ because it knows exactly what we are testing for but is unable to generalize sufficiently to then use its training to understand new data.

It’s a case of programmer bias affecting the performance of the algorithm.

The way around it is a paradox of sorts. Instead of using two data sets we use three. One is the one we test against and tweak but one is a set we do not see. When we test against that we are able to determine whether the algorithm has ‘cheated’ or it is indeed capable of generalizing sufficiently to correctly understand new data sets. This is called Cross Validation.

Why is all of this important? Because if you are asking the question just how much data about you semantic search needs the answer is, as much as you can throw at it. There is never a specific amount of data that can be enough when semantic search relies on constant judgements made based upon its ability to generalize and project. When you consider that it can take more than 30,000 examples of one type of data to train an algorithm you begin to realize the enormity of the task at hand and why doing as much as possible to help search understand you, will only help you and your online business.

Get smart: SEO Help: 20 Semantic Search Steps that Will Help Your Business Grow is a practical step-by-step guide to applying semantic search principles to your business.

Additional Resources

How semantic search works – Lesson #1: Classification

How Semantic search Works: Lesson #2 – Validation

Additional Resources

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112