Most applications have the need to provide a search feature for a person. This may be to send them a message, add them to a group or team, grant them access, or a hundred other possible reasons. Applications implement this search functionality in a variety of different ways. Some being far better than others. In this article I am going to document a few different ways to search a database for a person. Each way will become progressively more advanced and cover more use cases.
You can find the code to create the database and load the test data here. The test data contains 100,000 randomly generated users and 9 specific users that always get added to the test set. These 9 users are specified in the table at the end of this article.
I will be using a SQLite database for this article. The concepts and techniques should be applicable to any database although you will need to alter the SQL syntax.
Let’s start by defining our Person table.
CREATE TABLE IF NOT EXISTS person ( person_id varchar(26) PRIMARY KEY, first_name varchar(80), middle_name varchar(80), last_name varchar(80), preferred_name varchar(80), person_name varchar(320), email_address varchar(320), account_name varchar(320), account_id varchar(256) );
The column names describe their purpose pretty well, but I will call out preferred_name. That column holds a nickname that the person may prefer to go by such as William preferring to go by Bill or Patricia preferring to go by Patti.
Name Begins With
Our first attempt at searching for a person will perform a begins with search on the person_name column which contains the person’s full name. We use the LOWER function on both the query and the person_name column as well as use the TRIM function to remove any leading or trailing whitespace. It amazes me how many applications don’t perform the TRIM and LOWER functions. The query below is hard enough for the user to try and find the right format to enter the name, adding case sensitivity and random spaces in the mix makes it near impossible to find some people in the application. Our SQL looks like:
SELECT person_id, person_name FROM person WHERE LOWER(person_name) LIKE LOWER(TRIM(@query)) || '%';
There are several problems with this approach. The first being that the user performing the search has no way of knowing the format of the person_name column. It could be "FirstName LastName" or "FirstName MiddleName LastName" or "LastName, FirstName" or some other format. The user doesn’t know and will have to attempt formatting their search differently until they discover it. One way to assist the user is to provide placeholder text showing an example search.
Another problem is what if you only know one part of their name. Here are several use cases were the user has issues:
- If the person_name is formatted "FirstName LastName" and you only know their last name.
- If the person_name is formatted "FirstName MiddleName LastName" and you know their first name and last name, but not their middle name.
- If the person_name is formatted "LastName, FirstName" and you only know their first name.
- If the person goes by their middle name and a user searches any of these formats using the middle name they all will fail to show the person in the results.
First Name Begins With and Last Name Begins With
Our next attempt will perform a begins with search on the first_name and last_name columns.
select person_id, person_name from person where (lower(first_name) like LOWER(TRIM(@queryPart1)) || '%' or lower(last_name) like LOWER(TRIM(@queryPart1)) || '%') and (lower(first_name) like LOWER(TRIM(@queryPart2)) || '%' or lower(last_name) like LOWER(TRIM(@queryPart2)) || '%');
This is a large improvement over the previous example. The user doesn’t need to enter the FirstName and LastName in a specific order. The user can enter their search :
- "FirstName LastName"
- "LastName FirstName"
- "LastName, FirstName"
This SQL does make some assumptions. First that the user will only enter two search parts such as FirstName and LastName. If the user types in more that two search parts then the SQL above is likely to fail to return the desired results. Second it is not checking the middle_name and preferred_name columns. Let’s fix the later one first. We will update the SQL to include checking middle_name and preferred_name.
select person_id, person_name from person where (lower(first_name) like LOWER(TRIM(@queryPart1)) || '%' or lower(last_name) like LOWER(TRIM(@queryPart1)) || '%' or lower(middle_name) like LOWER(TRIM(@queryPart1)) || '%' or lower(preferred_name) like LOWER(TRIM(@queryPart1)) || '%') and (lower(first_name) like LOWER(TRIM(@queryPart2)) || '%' or lower(last_name) like LOWER(TRIM(@queryPart2)) || '%' or lower(middle_name) like LOWER(TRIM(@queryPart2)) || '%' or lower(preferred_name) like LOWER(TRIM(@queryPart2)) || '%');
Now the user can enter one or two search parts and if any of the name columns in the person table begins with those search parts the person will be returned. Let’s now handle the issue that the user may enter more than two search parts such as FirstName MiddleName LastName.
Name Contains Search One or More Parts
The SQL below is quite a bit more complex than the previous one. Let’s walk through it. First we don’t know how many search parts the user may enter. It may be 1, 2, 3, 4, 5, or more search parts. So we will need to except a variable length parameter and search over all the parts. One way to do this is to accept the whole search query as one parameter and then split up the user defined query so that we can use the parts to search with. To do this we will use two common table expressions (CTE). The first CTE ‘split’ takes in the users query and recursively searches for spaces and splits off new words at the spaces. In addition to the LOWER and TRIM functions we are also using the REPLACE function to remove any commas in the case they type "LastName, FirstName". The second CTE "query_parts" simply removes any empty strings from the first table so we are only left with non-empty strings. For example the query "FirstName LastName" would become a table with two rows containing "firstname" and "lastname".
We then define our select which checks the first_name, middle_name, last_name, and preferred_name and compares them to each row in the query_parts table. For each query part that matches the match_count will increase. This gives us a nice benefit of being able to rank the results by the match_count so that the most relevant matches are first.
WITH split (item, query) AS (SELECT '', LOWER(REPLACE(TRIM(:query), ',', ' ')) || ' ' UNION ALL SELECT SUBSTR(query, 1, INSTR(query, ' ') - 1), SUBSTR(query, INSTR(query, ' ') + 1) FROM split WHERE query != ''), query_parts AS (SELECT item AS value FROM split WHERE item != '') SELECT person_id, person_name, (SELECT COUNT(*) AS match_count FROM (SELECT value FROM query_parts WHERE LOWER(first_name) LIKE value || '%' OR LOWER(middle_name) LIKE value || '%' OR LOWER(last_name) LIKE value || '%' OR LOWER(preferred_name) LIKE value || '%')) AS match_count from person where match_count > 0 order by match_count desc;
We now have a comprehensive query which does the following:
- Trims spaces
- Case insensitive
- Searches all of the different name columns in the table
- Allows the user to type in the name in any format
- Ranks the results based on matched query parts
If all applications provided this minimal level of search functionality user’s would rejoice. I know I would. I keep a list of all of the applications at my company that don’t provide this level of functionality and it is long. Our time reporting system, expense system, authorization system, training system, human resources system, change management system, and the list keeps going. There are plenty of applications that go further and implement full-text search which adds support for misspelled words, phonetic names (Cindy, Cindey, Cindi), and other tools which help the user discover the person they are searching for.
Below is a table showing a number of different test cases, the search query used, and whether the correct user was returned in the top 10 results. The results shown are for our query above. A properly configured full-text search should pass every use case.
|FirstName||MiddleName||LastName||PreferredName||Query Description||Query Example||Returned In Top 10 Results|
|Alice||Walker||Jones||Name Parts Order – FirstName LastName||Alice Jones||Yes|
|Alice||Walker||Jones||Name Parts Order – LastName FirstName||Jones Alice||Yes|
|Alice||Walker||Jones||Name Parts Separated By Punctionation||Jones,Alice||Yes|
|Alice||Walker||Jones||Space After Punctuation||Jones, Alice||Yes|
|Alice||Walker||Jones||Spaces Begin, Middle, and End||Alice Jones||Yes|
|Alice||Walker||Jones||Include MiddleName||Alice Walker Jones||Yes|
|Stephen||Franklin||Glover Jr.||Suffix – Jr, Sr, III||Glover Jr., Stephen Franklin||Yes|
|Brad||Edward||St. Phillips||Prefix – Dr, MD, St,||Brad St. Phillips||Yes|
|Brad||Edward||St. Phillips||Missing Prefix||Brad Phillips||No|
|Kurt||Jadyn||Smitham||Query Contains Initials||K J Smitham||Yes|
|Charles||Robert||Stone||Chris||NickName or PreferredName||Chris Stone||Yes|
|Ann Marie||Sophia||Miller||Name Part With a Space||Ann Marie Miller||No|
|Mary Beth||Sophia||Sutton Chapman||Two Name Parts With a Space||Mary Beth Sutton Chapman||Yes|
|Mary Beth||Sophia||Sutton Chapman||Missing Name Parts||Beth Chapman||No|
|Mary Beth||Sophia||Sutton Chapman||Concatenated name parts||MaryBeth SuttonChapman||No|
|Charlotte||Isabel||Smith-Collins||Missing Hyphen||Charlotte Smith Collins||Yes|
|Charlotte||Isabel||Smith-Collins||Missing Hyphenated Name Part||Charlotte Collins||No|
|William "Jim"||David||Thomas||Non-alpha charaters||William "Jim" Thomas||Yes|
|William "Jim"||David||Thomas||Missing non-alpha characters||Jim Thomas||No|
|William "Jim"||David||Thomas||Misspelled||david tomas||No|
|Stephen||Franklin||Glover Jr.||Phonetic Spelling||Steven Glover||No|