How To Improve Your Apps Search Functionality

Most applications have the need to provide a search feature for a person. This may be to send them a message, add them to a group or team, grant them access, or a hundred other possible reasons. Applications implement this search functionality in a variety of different ways. Some being far better than others. In this article I am going to document a few different ways to search a database for a person. Each way will become progressively more advanced and cover more use cases.

You can find the code to create the database and load the test data here. The test data contains 100,000 randomly generated users and 9 specific users that always get added to the test set. These 9 users are specified in the table at the end of this article.

I will be using a SQLite database for this article. The concepts and techniques should be applicable to any database although you will need to alter the SQL syntax.

Let’s start by defining our Person table.

CREATE TABLE IF NOT EXISTS person
(
    person_id     varchar(26) PRIMARY KEY,
    first_name    varchar(80),
    middle_name   varchar(80),
    last_name     varchar(80),
    preferred_name varchar(80),
    person_name   varchar(320),
    email_address varchar(320),
    account_name  varchar(320),
    account_id    varchar(256)
);

The column names describe their purpose pretty well, but I will call out preferred_name. That column holds a nickname that the person may prefer to go by such as William preferring to go by Bill or Patricia preferring to go by Patti.

Name Begins With

Our first attempt at searching for a person will perform a begins with search on the person_name column which contains the person’s full name. We use the LOWER function on both the query and the person_name column as well as use the TRIM function to remove any leading or trailing whitespace. It amazes me how many applications don’t perform the TRIM and LOWER functions. The query below is hard enough for the user to try and find the right format to enter the name, adding case sensitivity and random spaces in the mix makes it near impossible to find some people in the application. Our SQL looks like:

SELECT person_id, person_name
FROM person
WHERE LOWER(person_name) LIKE LOWER(TRIM(@query)) || '%';

There are several problems with this approach. The first being that the user performing the search has no way of knowing the format of the person_name column. It could be "FirstName LastName" or "FirstName MiddleName LastName" or "LastName, FirstName" or some other format. The user doesn’t know and will have to attempt formatting their search differently until they discover it. One way to assist the user is to provide placeholder text showing an example search.

Another problem is what if you only know one part of their name. Here are several use cases were the user has issues:

  • If the person_name is formatted "FirstName LastName" and you only know their last name.
  • If the person_name is formatted "FirstName MiddleName LastName" and you know their first name and last name, but not their middle name.
  • If the person_name is formatted "LastName, FirstName" and you only know their first name.
  • If the person goes by their middle name and a user searches any of these formats using the middle name they all will fail to show the person in the results.

First Name Begins With and Last Name Begins With

Our next attempt will perform a begins with search on the first_name and last_name columns.

select person_id, person_name
from person
where (lower(first_name) like LOWER(TRIM(@queryPart1)) || '%' or
       lower(last_name) like LOWER(TRIM(@queryPart1)) || '%')
  and (lower(first_name) like LOWER(TRIM(@queryPart2)) || '%' or
       lower(last_name) like LOWER(TRIM(@queryPart2)) || '%');

This is a large improvement over the previous example. The user doesn’t need to enter the FirstName and LastName in a specific order. The user can enter their search :

  • "FirstName"
  • "LastName"
  • "FirstName LastName"
  • "LastName FirstName"
  • "LastName, FirstName"

This SQL does make some assumptions. First that the user will only enter two search parts such as FirstName and LastName. If the user types in more that two search parts then the SQL above is likely to fail to return the desired results. Second it is not checking the middle_name and preferred_name columns. Let’s fix the later one first. We will update the SQL to include checking middle_name and preferred_name.

select person_id, person_name
from person
where (lower(first_name) like LOWER(TRIM(@queryPart1)) || '%' or
       lower(last_name) like LOWER(TRIM(@queryPart1)) || '%' or
       lower(middle_name) like LOWER(TRIM(@queryPart1)) || '%' or
       lower(preferred_name) like LOWER(TRIM(@queryPart1)) || '%')
  and (lower(first_name) like LOWER(TRIM(@queryPart2)) || '%' or
       lower(last_name) like LOWER(TRIM(@queryPart2)) || '%' or
       lower(middle_name) like LOWER(TRIM(@queryPart2)) || '%' or
       lower(preferred_name) like LOWER(TRIM(@queryPart2)) || '%');

Now the user can enter one or two search parts and if any of the name columns in the person table begins with those search parts the person will be returned. Let’s now handle the issue that the user may enter more than two search parts such as FirstName MiddleName LastName.

Name Contains Search One or More Parts

The SQL below is quite a bit more complex than the previous one. Let’s walk through it. First we don’t know how many search parts the user may enter. It may be 1, 2, 3, 4, 5, or more search parts. So we will need to except a variable length parameter and search over all the parts. One way to do this is to accept the whole search query as one parameter and then split up the user defined query so that we can use the parts to search with. To do this we will use two common table expressions (CTE). The first CTE ‘split’ takes in the users query and recursively searches for spaces and splits off new words at the spaces. In addition to the LOWER and TRIM functions we are also using the REPLACE function to remove any commas in the case they type "LastName, FirstName". The second CTE "query_parts" simply removes any empty strings from the first table so we are only left with non-empty strings. For example the query "FirstName LastName" would become a table with two rows containing "firstname" and "lastname".

We then define our select which checks the first_name, middle_name, last_name, and preferred_name and compares them to each row in the query_parts table. For each query part that matches the match_count will increase. This gives us a nice benefit of being able to rank the results by the match_count so that the most relevant matches are first.

WITH split (item, query) AS (SELECT '', LOWER(REPLACE(TRIM(:query), ',', ' ')) || ' '
                             UNION ALL
                             SELECT SUBSTR(query, 1, INSTR(query, ' ') - 1),
                                    SUBSTR(query, INSTR(query, ' ') + 1)
                             FROM split
                             WHERE query != ''),
     query_parts AS (SELECT item AS value FROM split WHERE item != '')
SELECT person_id,
       person_name,
       (SELECT COUNT(*) AS match_count
        FROM (SELECT value
              FROM query_parts
              WHERE LOWER(first_name) LIKE value || '%'
                 OR LOWER(middle_name) LIKE value || '%'
                 OR LOWER(last_name) LIKE value || '%'
                 OR LOWER(preferred_name) LIKE value || '%')) AS match_count
from person
where match_count > 0
order by match_count desc;

We now have a comprehensive query which does the following:

  • Trims spaces
  • Case insensitive
  • Searches all of the different name columns in the table
  • Allows the user to type in the name in any format
  • Ranks the results based on matched query parts

Concluding Remarks

If all applications provided this minimal level of search functionality user’s would rejoice. I know I would. I keep a list of all of the applications at my company that don’t provide this level of functionality and it is long. Our time reporting system, expense system, authorization system, training system, human resources system, change management system, and the list keeps going. There are plenty of applications that go further and implement full-text search which adds support for misspelled words, phonetic names (Cindy, Cindey, Cindi), and other tools which help the user discover the person they are searching for.

Below is a table showing a number of different test cases, the search query used, and whether the correct user was returned in the top 10 results. The results shown are for our query above. A properly configured full-text search should pass every use case.

FirstName MiddleName LastName PreferredName Query Description Query Example Returned In Top 10 Results
Alice Walker Jones Name Parts Order – FirstName LastName Alice Jones Yes
Alice Walker Jones Name Parts Order – LastName FirstName Jones Alice Yes
Alice Walker Jones Name Parts Separated By Punctionation Jones,Alice Yes
Alice Walker Jones Space After Punctuation Jones, Alice Yes
Alice Walker Jones Spaces Begin, Middle, and End   Alice Jones   Yes
Alice Walker Jones Include MiddleName Alice Walker Jones Yes
Stephen Franklin Glover Jr. Suffix – Jr, Sr, III Glover Jr., Stephen Franklin Yes
Brad Edward St. Phillips Prefix – Dr, MD, St, Brad St. Phillips Yes
Brad Edward St. Phillips Missing Prefix Brad Phillips No
Kurt Jadyn Smitham Query Contains Initials K J Smitham Yes
Charles Robert Stone Chris NickName or PreferredName Chris Stone Yes
Ann Marie Sophia Miller Name Part With a Space Ann Marie Miller No
Mary Beth Sophia Sutton Chapman Two Name Parts With a Space Mary Beth Sutton Chapman Yes
Mary Beth Sophia Sutton Chapman Missing Name Parts Beth Chapman No
Mary Beth Sophia Sutton Chapman Concatenated name parts MaryBeth SuttonChapman No
Charlotte Isabel Smith-Collins Hyphenated Charlotte Smith-Collins Yes
Charlotte Isabel Smith-Collins Missing Hyphen Charlotte Smith Collins Yes
Charlotte Isabel Smith-Collins Missing Hyphenated Name Part Charlotte Collins No
William "Jim" David Thomas Non-alpha charaters William "Jim" Thomas Yes
William "Jim" David Thomas Missing non-alpha characters Jim Thomas No
William "Jim" David Thomas Misspelled david tomas No
Stephen Franklin Glover Jr. Phonetic Spelling Steven Glover No

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: