I am trying to parse Android notification logs into a database. The columns are uneven because different notifications omit certain attributes. Essentially, it is a CSV file of key-value pairs. To put it another way, it is a really poorly formatted (& syntactically incorrect) JSON file. Maybe it would be easier to show you...
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
I have truncated them for the sake of brevity, but they do not line up toward the end of the rows. Some text values are quoted, some are not. All of the keys are quoted. I can use string functions to trim the extraneous characters once I get the columns sorted, but therein lies the rub: How do I sort the values into columns? I can do this in about 4 different programming languages, but I am just learning T-SQL.
How can I parse this to SQL Server (2016) in a single query? Is it even possible?
I could import it to a temporary table & then transfer it to the permanent one, but I would like to have one query to digest the info, since this will be repeated on a daily (maybe more) basis. The magic would be if I could declare a second-level delimiter, since all values are preceded by a colon. Declaring the colon as the delimiter breaks it the other way...arg(s)!
The data is all around ragged. Even the notifications from the same app may be drastically different. The way I would do it in PHP, C, JS, etc would be to loop through the keys until I hit the ones I needed, grab the appropriate values following them (or the whole pair), then trim the extraneous chars. I just don't know how to do it in T-SQL & the online docs leave something to be desired.
The expected result would be something like this:
Package Name | Post Time | System Time | Offset | NotificationText
-----------------------------------------------------------------
automatic | 1489795 | 786083 | -1400 | this is the note
gameservice | 1489756 | 786090 | -1400 | this is the note
automatic | 1489799 | 786045 | -1400 | this is the note
The big problem is that the info I need is scattered throughout the rows. I don't need much of it, but it is buried.
-
Yeah, don't do that with T-SQL. Process your text files and import the cleaned versions.Erik Reasonable Rates Darling– Erik Reasonable Rates Darling2017年03月22日 01:21:05 +00:00Commented Mar 22, 2017 at 1:21
-
@sp_BlitzErik Not possible, or prohibitively complicated? The goal is to eliminate steps from the workflow. I can code things that are complicated, I just don't know T-SQL very intimately, & I can't seem to figure out the conditional logic & string matching functions. Thanks for the help all the same though! I totally appreciate it.AVLien– AVLien2017年03月22日 01:30:37 +00:00Commented Mar 22, 2017 at 1:30
-
1I think what @sp_BlitzErik is trying to say is that SQL Server is a ridiculously expensive way to do row-by-row string parsing. Best practice is to do the parsing in a procedural language before you insert it into the database table(s) and leave the database server to do database stuff, like storage, sorting, aggregating.Daniel Hutmacher– Daniel Hutmacher2017年03月22日 13:18:57 +00:00Commented Mar 22, 2017 at 13:18
-
2Hahaha, I fell asleep before I could respond last night. But yeah, @DanielHutmacher is right. I've tried to do stuff like this early on in my T-SQL days as well, and it was so bad I resorted to doing it with PowerShell instead. PowerShell! That's how bad it was.Erik Reasonable Rates Darling– Erik Reasonable Rates Darling2017年03月22日 13:44:38 +00:00Commented Mar 22, 2017 at 13:44
3 Answers 3
You can use the technique shown in Loading line-delimited JSON files in SQL Server 2016 on the SQL Server Database Engine Blog, written by Jovan Popovic .
For example, given a file import.txt
:
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false} {"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false} {"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false} {"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
and a format file import.fmt
:
13.0 1 1 SQLCHAR 0 0 "\r\n" 1 json ""
The file can be bulk imported and parsed with:
SELECT
Parsed.packageName,
Parsed.postTime,
Parsed.systemTime,
Parsed.offset,
Parsed.isOngoing
FROM OPENROWSET
(
BULK 'C:\Temp\import.txt',
FORMATFILE= 'C:\Temp\import.fmt'
) AS BulkLog
CROSS APPLY OPENJSON(BulkLog.[json])
WITH
(
packageName varchar(50),
postTime bigint,
systemTime bigint,
offset bigint,
isOngoing bit
) AS Parsed;
Giving:
Not sure why you think this is syntactically incorrect JSON, but according to json.org, it looks fine. Well, except for 2 minor things:
- need commas between sets of
{
and}
- Need
[
and]
around the entire thing
But outside of that, it is supposed to be that all keys are quoted, and that numbers and booleans values are not quoted.
SQL Server 2016 has built-in JSON parsing. After fixing those two minor issues with the sample data in the question, the below parses it just fine:
DECLARE @JSON NVARCHAR(MAX) =N'
[{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false},
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false},
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false},
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}]
';
SELECT [PackageName],
[IsOngoing]
FROM OPENJSON(@JSON)
WITH (
[PackageName] VARCHAR(MAX) '$.packageName',
[IsOngoing] BIT '$.isOngoing'
);
Returns:
PackageName IsOngoing
com.automatic.mustang 0
com.enhance.gameservice 0
com.automatic.mustang 0
com.enhance.gameservice 0
This format is called JSON lines and is the JSON format used by several tools such as Spark and spyql.
I suggest using spyql to generate INSERT
statements that you should be able to import to any sql database:
$ spyql -Otable=my_table "SELECT json->packageName, json->postTime, json->systemTime, json->offset, json->isOngoing FROM json TO sql" < my_file.json
INSERT INTO "my_table"("packageName","postTime","systemTime","offset","isOngoing") VALUES ('com.automatic.mustang',1489795785973,1489795786083,-14400000,False),('com.enhance.gameservice',1489040766066,1489040766136,-18000000,False),('com.automatic.mustang',1489795785973,1489795786083,-14400000,False),('com.enhance.gameservice',1489040766066,1489040766136,-18000000,False);
You would have to previously create the table in your database, but that's it.
Disclosure: I am the author of spyql.
Explore related questions
See similar questions with these tags.