Monday, March 15, 2021
After 20+ years in IT .. I finally discovered this useful command
Very similar to my After 20+ years in IT .. I finally discovered this... post, I discovered yet another command I should have know about
Originally I was not going to write this post, but after I found out that several other people didn't know this, I figured what the heck, why not, maybe someone will think this is cool as well
Open up a command prompt or powershell command window, , navigate to a folder, type in tree... hit enter
Here is what I see
I was watching a Pluralsight course and the person typed in the tree command.. and I was like whoaaaa.. How do I not know this? Perhaps maybe because I don't use the command window all that much? Anyway I thought that this was pretty cool
As you can see tree list all the directories and sub directories in a tree like structure. This is great to quickly see all the directories in one shot
The same command will work in Powershell
Finally, here is an image of the command window output as well as the explorer navigation pane side by side
Hopefully this will be useful to someone
Thursday, February 14, 2019
Finding rows where the column starts or ends with a 'bad' character
Aha... I have plenty of war stories with Excel so I said, it's probably some non printable character that is in the column.. either a tab (char(9)) or a non breaking space (char(160))..especially if the value was copied from the internet
He said isnumeric was returning 0 for rows that looked valid, I then told him to run this query on those rows
SELECT ASCII(LEFT(SomeColumn,1)), ASCII(RIGHT(SomeColumn,1)),* FROM StagingData s
That would give them the ascii numerical value. For example a tab is 9, linefeed = 10....
Here is a chart for the characters between 0 and 32
Binary | Oct | Dec | Hex | Abbreviation | [b] | [c] | [d] | Name (1967) | ||
---|---|---|---|---|---|---|---|---|---|---|
1963 | 1965 | 1967 | ||||||||
000 0000 | 000 | 0 | 00 | NULL | NUL | ␀ | ^@ | 0円 | Null | |
000 0001 | 001 | 1 | 01 | SOM | SOH | ␁ | ^A | Start of Heading | ||
000 0010 | 002 | 2 | 02 | EOA | STX | ␂ | ^B | Start of Text | ||
000 0011 | 003 | 3 | 03 | EOM | ETX | ␃ | ^C | End of Text | ||
000 0100 | 004 | 4 | 04 | EOT | ␄ | ^D | End of Transmission | |||
000 0101 | 005 | 5 | 05 | WRU | ENQ | ␅ | ^E | Enquiry | ||
000 0110 | 006 | 6 | 06 | RU | ACK | ␆ | ^F | Acknowledgement | ||
000 0111 | 007 | 7 | 07 | BELL | BEL | ␇ | ^G | \a | Bell | |
000 1000 | 010 | 8 | 08 | FE0 | BS | ␈ | ^H | \b | Backspace [e] [f] | |
000 1001 | 011 | 9 | 09 | HT/SK | HT | ␉ | ^I | \t | Horizontal Tab [g] | |
000 1010 | 012 | 10 | 0A | LF | ␊ | ^J | \n | Line Feed | ||
000 1011 | 013 | 11 | 0B | VTAB | VT | ␋ | ^K | \v | Vertical Tab | |
000 1100 | 014 | 12 | 0C | FF | ␌ | ^L | \f | Form Feed | ||
000 1101 | 015 | 13 | 0D | CR | ␍ | ^M | \r | Carriage Return [h] | ||
000 1110 | 016 | 14 | 0E | SO | ␎ | ^N | Shift Out | |||
000 1111 | 017 | 15 | 0F | SI | ␏ | ^O | Shift In | |||
001 0000 | 020 | 16 | 10 | DC0 | DLE | ␐ | ^P | Data Link Escape | ||
001 0001 | 021 | 17 | 11 | DC1 | ␑ | ^Q | Device Control 1 (often XON) | |||
001 0010 | 022 | 18 | 12 | DC2 | ␒ | ^R | Device Control 2 | |||
001 0011 | 023 | 19 | 13 | DC3 | ␓ | ^S | Device Control 3 (often XOFF) | |||
001 0100 | 024 | 20 | 14 | DC4 | ␔ | ^T | Device Control 4 | |||
001 0101 | 025 | 21 | 15 | ERR | NAK | ␕ | ^U | Negative Acknowledgement | ||
001 0110 | 026 | 22 | 16 | SYNC | SYN | ␖ | ^V | Synchronous Idle | ||
001 0111 | 027 | 23 | 17 | LEM | ETB | ␗ | ^W | End of Transmission Block | ||
001 1000 | 030 | 24 | 18 | S0 | CAN | ␘ | ^X | Cancel | ||
001 1001 | 031 | 25 | 19 | S1 | EM | ␙ | ^Y | End of Medium | ||
001 1010 | 032 | 26 | 1A | S2 | SS | SUB | ␚ | ^Z | Substitute | |
001 1011 | 033 | 27 | 1B | S3 | ESC | ␛ | ^[ | \e [i] | Escape [j] | |
001 1100 | 034 | 28 | 1C | S4 | FS | ␜ | ^\ | File Separator | ||
001 1101 | 035 | 29 | 1D | S5 | GS | ␝ | ^] | Group Separator | ||
001 1110 | 036 | 30 | 1E | S6 | RS | ␞ | ^^ [k] | Record Separator | ||
001 1111 | 037 | 31 | 1F | S7 | US | ␟ | ^_ | Unit Separator |
Source: https://en.wikipedia.org/wiki/ASCII
He then ran the following to grab all the rows that ended or started with tabs
SELECT * FROM StagingData s WHERE LEFT(SomeColumn,1) = char(9) OR RIGHT(SomeColumn,1) = char(9)
So let's take another look at this to see how we can make this a little better
Let's create a table that will hold these bad characters that we don't want, in my case ACII values 1 untill 32
Here is what we will do to create and populate the table
CREATE TABLE BadCharacters( BadChar char(1) NOT NULL, ASCIINumber int NOT NULL, CONSTRAINT pk_BadCharacters PRIMARY KEY CLUSTERED( BadChar ) ) GO INSERT BadCharacters SELECT char(number),number FROM master..SPT_VALUES WHERE type = 'P' AND number BETWEEN 1 AND 32 OR number = 160A quick look at the data looks like this
SELECT * FROM BadCharacters
Now let's create our staging table and insert some data so that we can do some tests
CREATE TABLE StagingData (SomeColumn varchar(255) ) INSERT StagingData SELECT CONVERT(VARCHAR(10),s1.number) + '.' + CONVERT(VARCHAR(10),s2.number) FROM master..SPT_VALUES s1 CROSS JOIN master..SPT_VALUES s2 WHERE s1.type = 'P' AND s2.type = 'P'
That inserted 4194304 rows on my machine
Time to insert some of that bad data
Here is what some of the data inserted will look like
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
And this is the query to generated and insert those bad rows, on my machine it generated 1089 such rows
;WITH cte as(SELECT CONVERT(VARCHAR(10),number) as num FROM master..SPT_VALUES WHERE type = 'P' AND number BETWEEN 1 AND 1000) --INSERT StagingData SELECT b.BadChar + c1.num + '.' + c2.num + b2.BadChar FROM cte c1 CROSS JOIN cte c2 JOIN BadCharacters b on c1.num = b.ASCIINumber JOIN BadCharacters b2 on c2.num = b2.ASCIINumberThe query create a value by using a bad value, a number a dot a number and a bad value, you can see those values above
Now it's time to find these bad rows, but before we do that, let's add an index
CREATE INDEX ix_StagingData on StagingData(SomeColumn)
OK, we are ready...
Of course I here you saying, why don't we just do this
SELECT * FROM StagingData WHERE TRY_CONVERT(numeric(20,10),SomeColumn) IS NULL
Before we start, let's set statistics io on so that we can look at some performance
SET STATISTICS IO ON GO
Here are the queries to find the bad characters at the start
SELECT * FROM StagingData s JOIN BadCharacters b on b.BadChar = LEFT(s.SomeColumn,1) SELECT * FROM StagingData s JOIN BadCharacters b on s.SomeColumn like b.BadChar +'%'
Here is what the reads look like
(1089 row(s) affected)
Table 'BadCharacters'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'StagingData'. Scan count 9, logical reads 10851, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1089 row(s) affected)
Table 'BadCharacters'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'StagingData'. Scan count 33, logical reads 135, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
As you can see from the stats, the top query is non-SARGable and generates a lot more reads, the bottom query can use the index. Always make sure to write your queries in a way so that SQL Server can you an index
What about the last character, how can we find those
SELECT * FROM StagingData s JOIN BadCharacters b on b.BadChar = RIGHT(s.SomeColumn,1) SELECT * FROM StagingData s JOIN BadCharacters b on s.SomeColumn like +'%' + b.BadChar
Here are the stats again
(1089 row(s) affected)
Table 'BadCharacters'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'StagingData'. Scan count 9, logical reads 10851, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1089 row(s) affected)
Table 'BadCharacters'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'StagingData'. Scan count 33, logical reads 445863, physical reads 0, read-ahead reads 13, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
So both of these queries suck the life out of your SQL Server instance, so what can be done?
One thing we can do is add a computed column to the table that will hold just the last character of the column, then we can index the computed column
Here are the commands to do that
ALTER TABLE StagingData ADD RightChar as RIGHT(SomeColumn,1) GO CREATE INDEX ix_RightChar on StagingData(RightChar) GO
And now we can just run the same queries again
SELECT * FROM StagingData s JOIN BadCharacters b on b.BadChar = RIGHT(s.SomeColumn,1) SELECT * FROM StagingData s JOIN BadCharacters b on s.SomeColumn like +'%' + b.BadChar
Here are the stats
(1089 row(s) affected)
Table 'StagingData'. Scan count 33, logical reads 1223, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'BadCharacters'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1089 row(s) affected)
Table 'StagingData'. Scan count 33, logical reads 1223, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'BadCharacters'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Did you expect to get the same exact reads for both queries?
So what is going on? Well lets take a look
In both cases, the optimizer was smart enough to use the index on the computed column
Hopefully this will make someone's life easier and you can expand the table to add other character you consider bad. You can also add constraint to reject values or you can add triggers and then move those bad rows to a bad rows table
Finally if you need to worry about unicode you might want to change the table to be nvarchar
Enjoy.. importing that data..... we all know..it's only getting bigger and bigger
Monday, February 04, 2019
After 20+ years in IT .. I finally discovered this...
Last week I was on a remote session with 2 clients, each run the Advent program . The team I am part of provides a script to run the advent (APX or Axys) executable. This will then generate the portfolios, composites, price, security master, splits and other files. We then zip it up and sftp it over for ingestion so that we can run analytics and attribution
During these calls I interact with system administrators because usually the need to give permissions so that the script runs correctly
None of these admins knew that what I will show you existed. All the co-workers I asked didn't know this either (This could be because they are developers and not admins)
Back in the day (win 98 or perhaps NT 4), there was a windows powertool that you could install and if you right clicked on a folder you would get an option to open a command window and it would be in the path that you right clicked on
Those power tools don't exist anymore and you could do the same by hacking the registry, it's like a 16 step process
But there is a faster way.....
So what I usually did before 2 months ago is that I would select the path
And then I would open a command prompt, type CD and then paste the path...not too complicated
But here is the faster way.... instead of copying the path...just type in cmd in the address bar and hit enter
Boom shakalaka... a command prompt is opened immediately and you are in the same path
Did you know this also works when you type Powershell in the address bar, Eric Darling left me a comment on twitter informing me that it works with powershell as well
Here is what you see after typing it
So there you have it... hopefully it will save you some minutes of valuable time in a year
Also if you knew about this or did not know..leave a comment and let me know
Tuesday, January 08, 2019
Print.. the disruptor of batch deletes in SQL
Someone had an issue where a batched delete script was not deleting anything. I looked over some code in our repository and noticed two patterns the way queries are written to handle batch deletes
One is a while loop that runs while @@rowcount is greater than 0
WHILE @@rowcount > 0 BEGIN DELETE TOP (5000) FROM SomeTable END
The other way is to run a while loop which is always true and then check if @@rowcount is 0, if it is 0 then break out of the loop
WHILE 1 = 1 BEGIN DELETE TOP(5000) FROM SomeTable IF @@ROWCOUNT = 0 BREAK END
I have always used WHILE @@rowcount > 0 but you have to be careful because @@rowcount could be 0 when your while loop starts
Let's take a look at an example. This is a simplified example without a where clause..but let's say you have to delete several million rows from a table with many more millions of rows and the table is replicated... in that case you want to batch the deletes so that your log file doesn't fill up, replication has a chance to catch up and in general the deletes should run faster
SELECT TOP 20000 row_number() OVER(ORDER BY t1.id) AS SomeId, getutcdate() AS SomeDate, newid() AS SomeValue INTO SomeTable FROM sys.sysobjects t1 CROSS JOIN sys.sysobjects t2 SELECT COUNT(*) from SomeTable SELECT * FROM SomeTable WHERE 1= 0 WHILE @@rowcount > 0 BEGIN DELETE TOP (5000) FROM SomeTable END SELECT COUNT(*) from SomeTable
DROP TABLE SomeTable -- Added here as cleanup in case people run the example
SELECT TOP 20000 row_number() OVER(ORDER BY t1.id) AS SomeId, getutcdate() AS SomeDate, newid() AS SomeValue INTO SomeTable FROM sys.sysobjects t1 CROSS JOIN sys.sysobjects t2 SELECT COUNT(*) from SomeTable PRINT' Starting my update now....' WHILE @@rowcount > 0 BEGIN DELETE TOP (5000) FROM SomeTable END SELECT COUNT(*) from SomeTable
DROP TABLE SomeTable -- Added here as cleanup in case people run the example
The count is 20000 before and after the loop, nothing got delete, this is because a print statement will reset @@rowcount to 0.
SELECT 1 UNION ALL SELECT 2 SELECT @@rowcount as 'Rowcount' PRINT '1' SELECT @@rowcount as 'RowcountAfterPrint'
SELECT TOP 20000 row_number() OVER(ORDER BY t1.id) AS SomeId, getutcdate() AS SomeDate, newid() AS SomeValue INTO SomeTable FROM sys.sysobjects t1 CROSS JOIN sys.sysobjects t2 SELECT COUNT(*) from SomeTable PRINT' Starting my update now....' DELETE TOP (5000) FROM SomeTable WHILE @@rowcount > 0 BEGIN DELETE TOP (5000) FROM SomeTable END SELECT COUNT(*) from SomeTable
If you run the delete this way if there was something to delete, the while loop would be entered, if the table was empty then there would no need to enter the while loopDROP TABLE SomeTable -- Added here as cleanup in case people run the example
Also keep in mind that it is not just PRINT that will reset @@rowcount back to 0.
From Books On Line:
Statements such as USE, SET <option>, DEALLOCATE CURSOR, CLOSE CURSOR, BEGIN TRANSACTION, or COMMIT TRANSACTION reset the ROWCOUNT value to 0.
Wednesday, September 26, 2018
A little less hate for: String or binary data would be truncated in table
The error String or binary data would be truncated is one of the more annoying things in SQL Server. You would insert some data and you would get back the error String or binary data would be truncated. Then you would have to spend a good amount of time to see what caused the error.
I even posted about this as part of T-SQL Tuesday #86: String or binary data would be truncated
I read the SQL Server 2019 CTP 2 whitepaper and on page 17 it has the following
Improve truncation message for ETL DW scenarios—the error message ID 8152 String or binary data would be truncated is familiar to many SQL Server developers and administrators who develop or maintain data movement workloads; the error is raised during data transfers between a source and a destination with different schemas when the source data is too large to fit into the destination data type. This error message can be time-consuming to troubleshoot because of its lack of specificity. SQL Server 2019 introduces a new, more specific error message for this scenario: ID 2628 String or binary data would be truncated in table '%.*ls', column '%.*ls'. Truncated value: '%.*ls'. The new error message provides more context for the problem, simplifying the troubleshooting process. So that it cannot break existing applications parsing message ID 8152, this new message ID 2628 is an opt-in replacement, which can be enabled with trace flag 460.
Oh really... they fixed this? Let's take a look
First I downloaded SQL Server 2019 CTP 2 and installed it. Then I created a database with a simple table, I also inserted some data that wouldn't fit
CREATE DATABASE truncatetest GO USE truncatetest GO CREATE TABLE TruncateMe(somevalue varchar(5),somevalue2 varchar(5), somedecimal decimal(5,2)) GO INSERT TruncateMe SELECT '333333','444444',5.3
I then received the following error message, so this is the same as in SQL Server 2018 and earlier, notice message id 8152
Msg 8152, Level 16, State 30, Line 10
String or binary data would be truncated.
The statement has been terminated.
To enable the new functionality, we need to enable trace flag 460, you can do that by running the DBCC TRACEON command like this
DBCC TRACEON(460)
Now let's try that insert statement again
INSERT TruncateMe SELECT '333333','444444',5.3And there we go, you get the table name, the column name as well as the value, notice that the message id changed from 8152 to 2628 now
Msg 2628, Level 16, State 1, Line 20
String or binary data would be truncated in table 'truncatetest.dbo.TruncateMe', column 'somevalue'. Truncated value: '33333'.
The statement has been terminated.
So it looks it only returns the first value that generates the error, let's change the first value to fit into the column and execute the insert statement again
INSERT TruncateMe SELECT '3','444444',5.3Now you will see that the error is for the somevalue2 column
Msg 2628, Level 16, State 1, Line 27
String or binary data would be truncated in table 'truncatetest.dbo.TruncateMe', column 'somevalue2'. Truncated value: '44444'.
The statement has been terminated.
What will happen if you have more than one row that fails?
insert TruncateMe select '333333','444444',5.3 union all select '3','444444',5.3Here is the error
Msg 2628, Level 16, State 1, Line 37
String or binary data would be truncated in table 'truncatetest.dbo.TruncateMe', column 'somevalue'. Truncated value: '33333'.
The statement has been terminated.
What about Table Variables, will you also get an error with the column and value like for real tables?
declare @table table (somevalue varchar(5),somevalue2 varchar(5), somedecimal decimal(5,2)) insert @table select '333333','444444',5.3
Here is the error
Msg 2628, Level 16, State 1, Line 53
String or binary data would be truncated in table 'tempdb.dbo.#A6AD698B', column 'somevalue'. Truncated value: '33333'.
As you can see you also get the error, the table name is the internal table name for the table variable tied to your session
What about Table Valued Parameters?
CREATE TYPE TestTypeTrunc AS TABLE ( somevalue varchar(5),somevalue2 varchar(5)); GO DECLARE @table TestTypeTrunc INSERT @table SELECT '555555555','444444'
Here is the error for that
Msg 2628, Level 16, State 1, Line 58
String or binary data would be truncated in table 'tempdb.dbo.#AC6642E1', column 'somevalue'. Truncated value: '55555'.
Pretty much the same message as for the table variable
I do applaud Microsoft for finally fixing this.
Here it is also in beautiful technicolor