Monday, February 20, 2017
The strange case of the missing indexes....
This past week I needed to run some queries on production to verify there were indexes added on a table. There were several scripts that needed to be run and the last one was the addition of the indexes. The query given to me was something like the following
SELECT * FROM LinkedServerName.DatabaseName.sys.indexes WHERE object_id =(OBJECT_ID('TableName'))
So I ran the query..nothing. Aha maybe they are still running the scripts before that, setting up replication, snapshotting the table etc etc. I will check again in a bit I thought.
Then I checked 2 more times nothing. So I then contacted them and asked if they ran everything..yes they said and no errors. Mmm, okay..what could be wrong. The only way I can access this server is through a linked server call. I decided to look at the query again.... bingo.. I see what the problem is.....
Let's take a look. First create the following database with 2 tables and 2 indexes on the Test table
CREATE DATABASE Test1 GO USE Test1 GO CREATE TABLE Foo(ID int) GO CREATE TABLE Test(ID int, ID2 int, ID3 int) GO CREATE INDEX ix_Test on Test(ID) CREATE INDEX ix_Test2 on Test(ID2)
Now if you run the following query against the sys.indexes object catalog view
SELECT FROM sys.indexes WHERE object_id = OBJECT_ID('Test')
You will get back the following information
object_id name index_id type type_desc
901578250 NULL 0 0 HEAP
901578250 ix_Test 2 2 NONCLUSTERED
901578250 ix_Test2 3 2 NONCLUSTERED
As you can see we have information about our two indexes and the table itself, since there is no clustered index on this table, we get a row back that shows that the table is a heap
Now let us connect to another database on the same server, in this case we will use the tempdb database
Create a table with the same name
Create a table with the same name
USE tempdb GO CREATE TABLE Test(foo int) GO
Now run the same query again but point to the Test1 database
SELECT * FROM Test1.sys.indexes WHERE object_id = OBJECT_ID('Test')
And you get nothing back. What does the OBJECT_ID() function return?
SELECT OBJECT_ID('Test')
For me it returns the number 965578478. The problem is that that object_id is the id for that object in the tempdb database, not for the one in the Test1 database
So what can you do? There are two ways to do this
One way is to join sys.indexes with the sys.tables object catalog view
Here is what the query looks like
SELECT i.* FROM Test1.sys.indexes i JOIN Test1.sys.tables t ON i.object_id = t.object_id WHERE t.name = 'Test'
Running the query like that displays the correct information
object_id name index_id type type_desc
901578250 NULL 0 0 HEAP
901578250 ix_Test 2 2 NONCLUSTERED
901578250 ix_Test2 3 2 NONCLUSTERED
If you want to use this against a linked server, use 4 part notation, just prefix Test1.sys with the linked server name, I used LinkedServerName as an example
SELECT i.* FROM LinkedServerName.Test1.sys.indexes i JOIN LinkedServerName.Test1.sys.tables t ON i.object_id = t.object_idWHERE t.name = 'Test'
That query with the join between sys.indexes and sys.tables can be used for a linked server as well as a different database on the same instance, if you just have to go to another database like we have done before, you can simplify it like this
SELECT * FROM Test1.sys.indexes WHERE object_id = OBJECT_ID('Test1.dbo.Test')
Basically, you pass in the database name, schema name and object name to the OBJECT_ID() function
That's all for today..
Importing The Metropolitan Museum of Art's Open Access Initiative data into SQL Server
The Met just made some data available for download.
The Metropolitan Museum of Art presents over 5,000 years of art from around the world for everyone to experience and enjoy. The Museum lives in three iconic sites in New York City—The Met Fifth Avenue, The Met Breuer, and The Met Cloisters. Millions of people also take part in The Met experience online.
Since it was founded in 1870, The Met has always aspired to be more than a treasury of rare and beautiful objects. Every day, art comes alive in the Museum's galleries and through its exhibitions and events, revealing both new ideas and unexpected connections across time and across cultures. The Metropolitan Museum of Art provides select datasets of information on more than 420,000 artworks in its Collection for unrestricted commercial and noncommercial use.
I decided to take a look. The first thing we will do is download the CSV file from their GitHub repository. You can find that here: https://github.com/metmuseum/openaccess
If you go to that link, you will see the following, grab the MetObjects.csv file
Be warned, this file is 214 MB.
One word of caution... if you try to import this file with a regular BULK INSERT command....good luck...let me know how many tries you need. No problem, I will just use a format file... and now you have 2 problems.. The real issue is that the file is somewhat problematic, there are quotes where there shouldn't be quotes, there are no quotes where there should be quotes. So what do you do?
Are you on SQL Server vNext 1.1 or higher? If you are, good news, you can use BULK INSERT and csv format, this is new in vNext 1.1
Ok let's get started, first create the following table
Now it is time to import the file
Just to let you know, you will get a couple of errors, however all data except for these 4 rows will be imported
Msg 4864, Level 16, State 1, Line 62
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 213266, column 25 (Medium).
Msg 4864, Level 16, State 1, Line 62
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 217661, column 25 (Medium).
Msg 4863, Level 16, State 1, Line 62
Bulk load data conversion error (truncation) for row 226222, column 16 (ArtistDisplayBio).
Msg 4863, Level 16, State 1, Line 62
Bulk load data conversion error (truncation) for row 258639, column 16 (ArtistDisplayBio).
Let's do a quick count
The file that I imported resulted in 446026 rows. I downloaded this file on 2/10/2017, your file might have more data if they updated the file after the date I downloaded it
Now that we have the data we need, we can run some queries.
Let's see what kind of objects are in the collection, we will grab the top 15 objects
Here is what the results looks like
I don't know why..but I somehow thought painting would be the most occuring object..but what do I know
You can also treat this table as you own museum catalog, let's say you want to look at van Gogh's Madame Roulin and Her Baby painting? No problem, run this query
Scroll to the LinkResource column, you will see the following: http://www.metmuseum.org/art/collection/search/459123
Clicking on that link will give you the following
Now you can download this image and do something with it, it is after all in the public domain
Here are a couple of more queries you can play around with
I am not a big art person, but if you are and you have some interesting queries that you ran against this data please let me know in the comments
Also if you manage to get this file to import with plain old BCP or BULK INSERT with or without a format file...let me know the magic you used.... :-)
The Metropolitan Museum of Art presents over 5,000 years of art from around the world for everyone to experience and enjoy. The Museum lives in three iconic sites in New York City—The Met Fifth Avenue, The Met Breuer, and The Met Cloisters. Millions of people also take part in The Met experience online.
Since it was founded in 1870, The Met has always aspired to be more than a treasury of rare and beautiful objects. Every day, art comes alive in the Museum's galleries and through its exhibitions and events, revealing both new ideas and unexpected connections across time and across cultures. The Metropolitan Museum of Art provides select datasets of information on more than 420,000 artworks in its Collection for unrestricted commercial and noncommercial use.
I decided to take a look. The first thing we will do is download the CSV file from their GitHub repository. You can find that here: https://github.com/metmuseum/openaccess
If you go to that link, you will see the following, grab the MetObjects.csv file
Be warned, this file is 214 MB.
One word of caution... if you try to import this file with a regular BULK INSERT command....good luck...let me know how many tries you need. No problem, I will just use a format file... and now you have 2 problems.. The real issue is that the file is somewhat problematic, there are quotes where there shouldn't be quotes, there are no quotes where there should be quotes. So what do you do?
Are you on SQL Server vNext 1.1 or higher? If you are, good news, you can use BULK INSERT and csv format, this is new in vNext 1.1
Ok let's get started, first create the following table
CREATE TABLE MetOpenData( ObjectNumber nvarchar(4000), IsHighlight nvarchar(4000), IsPublicDomain nvarchar(4000), ObjectID nvarchar(4000), Department nvarchar(4000), ObjectName nvarchar(4000), Title nvarchar(4000), Culture nvarchar(4000), Period nvarchar(4000), Dynasty nvarchar(4000), Reign nvarchar(4000), Portfolio nvarchar(4000), ArtistRole nvarchar(4000), ArtistPrefix nvarchar(4000), ArtistDisplayName nvarchar(4000), ArtistDisplayBio nvarchar(4000), ArtistSuffix nvarchar(4000), ArtistAlphaSort nvarchar(4000), ArtistNationality nvarchar(4000), ArtistBeginDate nvarchar(4000), ArtistEndDate nvarchar(4000), ObjectDate nvarchar(4000), ObjectBeginDate nvarchar(4000), ObjectEndDate nvarchar(4000), Medium nvarchar(4000), Dimensions nvarchar(4000), CreditLine nvarchar(4000), GeographyType nvarchar(4000), City nvarchar(4000), State nvarchar(4000), County nvarchar(4000), Country nvarchar(4000), Region nvarchar(4000), Subregion nvarchar(4000), Locale nvarchar(4000), Locus nvarchar(4000), Excavation nvarchar(4000), River nvarchar(4000), Classification nvarchar(4000), RightsandReproduction nvarchar(4000), LinkResource nvarchar(4000), MetadataDate nvarchar(4000), Repository nvarchar(4000)) GO
Now it is time to import the file
Just to let you know, you will get a couple of errors, however all data except for these 4 rows will be imported
Msg 4864, Level 16, State 1, Line 62
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 213266, column 25 (Medium).
Msg 4864, Level 16, State 1, Line 62
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 217661, column 25 (Medium).
Msg 4863, Level 16, State 1, Line 62
Bulk load data conversion error (truncation) for row 226222, column 16 (ArtistDisplayBio).
Msg 4863, Level 16, State 1, Line 62
Bulk load data conversion error (truncation) for row 258639, column 16 (ArtistDisplayBio).
Here is what the BULK INSERT with FORMAT= 'CSV' command looks like. Change the filepath to point to the location where you have the MetObjects.csv file saved
BULK INSERT MetOpenData FROM 'c:\Data\MetObjects.csv' WITH (FORMAT = 'CSV');
Let's do a quick count
SELECT COUNT(*) FROM MetOpenData
The file that I imported resulted in 446026 rows. I downloaded this file on 2/10/2017, your file might have more data if they updated the file after the date I downloaded it
Now that we have the data we need, we can run some queries.
Let's see what kind of objects are in the collection, we will grab the top 15 objects
SELECT TOP 15 ObjectName,count(*) FROM MetOpenData GROUP BY ObjectName ORDER BY 2 DESC
Here is what the results looks like
ObjectName Count
Print 88582
Photograph 28071
Drawing 24905
Book 13360
Fragment 9405
Piece 8638
Negative 6258
Painting 5862
Baseball card, print 4985
Bowl 3534
Figure 3081
Baseball card 3046
Polaroid 2706
Vase 2698
Dress 2473
I don't know why..but I somehow thought painting would be the most occuring object..but what do I know
You can also treat this table as you own museum catalog, let's say you want to look at van Gogh's Madame Roulin and Her Baby painting? No problem, run this query
SELECT * FROM MetOpenData WHERE ArtistDisplayName like'%van%gogh%' and title = 'Madame Roulin and Her Baby'
Scroll to the LinkResource column, you will see the following: http://www.metmuseum.org/art/collection/search/459123
Clicking on that link will give you the following
Now you can download this image and do something with it, it is after all in the public domain
Here are a couple of more queries you can play around with
SELECT city, count(*) FROM MetOpenData GROUP BY city ORDER BY 2 DESC SELECT Dynasty, count(*) FROM MetOpenData GROUP BY Dynasty ORDER BY 2 DESC SELECT Period, count(*) FROM MetOpenData GROUP BY Period ORDER BY 2 DESC SELECT ArtistNationality, count(*) FROM MetOpenData GROUP BY ArtistNationality ORDER BY 2 DESC SELECT * FROM MetOpenData WHERE ArtistDisplayName like'%pablo picasso%' SELECT * FROM MetOpenData WHERE ArtistDisplayName like'%rembrandt van rijn%' SELECT * FROM MetOpenData WHERE ObjectName like'%Postage stamps%'
I am not a big art person, but if you are and you have some interesting queries that you ran against this data please let me know in the comments
Also if you manage to get this file to import with plain old BCP or BULK INSERT with or without a format file...let me know the magic you used.... :-)
Subscribe to:
Posts (Atom)