Exploring SQLite’s new JSON Library

I was recently working on a console application that needed to store data locally. I had initially started by storing the data in files serialized as JSON, but the program needed the ability to perform queries over the data. I could have used LINQ to Objects after reading in the existing JSON files, but I have never used SQLite before and wanted to see how quickly I could implement something using SQLite. I started writing the SQL to generate the tables, but I wondered since I already had the logic written to generate the JSON could I just use SQLite as a document storage system and still be able to write the queries I needed. It turned out that I could. In December of 2015 a new version of SQLite was released, 1.0.99.0, that included new JSON SQL functions. Here are the release notes. I started reading through the documentation on the new functions and it appeared all the functionality I needed was available. Before I started down this path I wanted to build a simple proof of concept that verified the functionality worked as expected and would meet the needs of the project.

For this example I choose to use the Windows EventLog data as the sample data set. I am using the System.Data.SQLite nuget package which is an ADO.NET provider. I am also using Dapper which is a micro-ORM that handles a lot of the boiler plate data access code. The first thing I needed to do was create the SQLite database file. I created a simple class called SqLiteBaseRepository to contain two helper functions for returning the DBFile location and returning a new SQLite connection.

public class SqLiteBaseRepository
{
    public static string DbFile
    {
        get { return Settings.FilePath + "EventLogCollectorDb.sqlite"; }
    }

    public static SQLiteConnection SimpleDbConnection()
    {
        return new SQLiteConnection("Data Source=" + DbFile);
    }
}

And here is the code to create the SQLite database file and table with the ‘Data’ column that will be storing our JSON object.

using (var cnn = SqLiteBaseRepository.SimpleDbConnection())
{
    cnn.Open();
    cnn.Execute(
        @"create table EventLogRecord
          (
             ID                                  INTEGER PRIMARY KEY AUTOINCREMENT,
             Data                                Text,
             ComputerName                        Text,
             CreateDate                          datetime not null
          )");

}

I wanted to see what the performance characteristics of SQLite were and wrote a simple insertion test that inserted 1,000,000 new rows. The test consistently took around 25 seconds which is 40,000 inserts a second. For more information on performance tuning SQLite here is an excellent StackOverflow post. I want to point out a few things in the code below the first is that you need to load the JSON extension module by calling the LoadExtension method and make sure to enable extensions by calling the EnableExtensions method. You also need to use the json function when inserting your data into the table.

using (var cnn = SqLiteBaseRepository.SimpleDbConnection())
{
    cnn.Open();
    cnn.EnableExtensions(true);
    cnn.LoadExtension(@"SQLite.Interop.dll", "sqlite3_json_init");
    using (var transaction = cnn.BeginTransaction())
    {
        for (int x = 0; x < 1000000; x++)
        {
            cnn.Execute(
                @"INSERT INTO EventLogRecord 
                ( Data, CreateDate) VALUES 
                ( json(@Data), @CreateDate )", new { Data = "{\"key\": \"Value\"}", CreateDate = DateTime.UtcNow.ToString("o") });
        }
        transaction.Commit();
    }
}

I am not going to walk through getting the entries from the EventLog, but if you are interested here is the documentation. I will show the structure of the JSON that we will be storing and querying. Note that this is a remote services event identifying that someone remotely connected to this machine. The properties array contains the user id, domain of the user id, and the IP address the user connected from.

 {
    "Id": 1149,
    "Version": 0,
    "Qualifiers": null,
    "Level": 4,
    "Task": 0,
    "Opcode": 0,
    "Keywords": 1152921504606846976,
    "RecordId": 8,
    "ProviderName": "Microsoft-Windows-TerminalServices-RemoteConnectionManager",
    "ProviderId": "c76baa63-ae81-421c-b425-340b4b24157f",
    "LogName": "Microsoft-Windows-TerminalServices-RemoteConnectionManager/Operational",
    "ProcessId": 1004,
    "ThreadId": 2960,
    "MachineName": "TestMachine1",
    "UserId": {
      "BinaryLength": 12,
      "AccountDomainSid": null,
      "Value": "S-1-5-20"
    },
    "TimeCreated": "2015-11-25T14:22:00.0844329-05:00",
    "ActivityId": "f420e78d-0bdf-40b4-9922-219581100000",
    "RelatedActivityId": null,
    "ContainerLog": "Microsoft-Windows-TerminalServices-RemoteConnectionManager/Operational",
    "MatchedQueryIds": [],
    "Bookmark": {
      "BookmarkText": "<BookmarkList>\r\n  <Bookmark Channel='Microsoft-Windows-TerminalServices-RemoteConnectionManager/Operational' RecordId='8' IsCurrent='true'/>\r\n</BookmarkList>"
    },
    "LevelDisplayName": "Information",
    "OpcodeDisplayName": "Info",
    "TaskDisplayName": null,
    "KeywordsDisplayNames": [],
    "Properties": [
      {
        "Value": "johndoe"
      },
      {
        "Value": "domainname"
      },
      {
        "Value": "123.123.123.123"
      }
    ]
  }

Now that we have our database and have inserted some test data lets perform some queries on the data. First lets get all events for a particular user id. We will use the json_tree function to recursively walk through the JSON substructure looking for the user id.

using (var cnn = SqLiteBaseRepository.SimpleDbConnection())
{
    cnn.Open();
    cnn.EnableExtensions(true);
    cnn.LoadExtension(@"SQLite.Interop.dll", "sqlite3_json_init");
    var results = cnn.Query("SELECT EventLogRecord.data, ComputerName, CreateDate from EventLogRecord, json_tree(EventLogRecord.data, '$.Properties') where upper(json_tree.value) = upper(@UserId)", new { UserId = userId});
}

Now lets query for the max RecordId and pull out just the RecordId from the JSON structure using the json_extract function.

using (var cnn = SqLiteBaseRepository.SimpleDbConnection())
{
    cnn.Open();
    cnn.EnableExtensions(true);
    cnn.LoadExtension(@"SQLite.Interop.dll", "sqlite3_json_init");
    var results = cnn.Query("SELECT max(json_extract(EventLogRecord.data,'$.RecordId')) as RecordId, ComputerName from EventLogRecord group by ComputerName");
}

Finally lets modify our query by user id to filter the results to a certain timespan.

using (var cnn = SqLiteBaseRepository.SimpleDbConnection())
{
    cnn.Open();
    cnn.EnableExtensions(true);
    cnn.LoadExtension(@"SQLite.Interop.dll", "sqlite3_json_init");
    var results = cnn.Query("SELECT EventLogRecord.data, ComputerName, CreateDate from EventLogRecord, json_tree(EventLogRecord.data, '$.Properties' ) ,json_each(EventLogRecord.data)  where upper(json_tree.value) = upper(@UserId) and (json_each.key='TimeCreated' and (json_each.value < @EndDate and json_each.value > @StartDate))", new { UserId = userId, EndDate = endDateTime, StartDate = beginDateTime });
}

There are many other functions that I didn’t cover here including functions to modify and validate the json structure. You can read about all of the available functions and see examples in the documenation.

Once I had validated my projects use cases in the proof of concept I moved forward with implementing SQLite as the projects data store and completed it without any major issues. Through this experience I have added SQLite as a new tool in my toolbelt that I can use on future projects.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s