I have to provide reports on file system usage.
I'm collecting statistics on file server usage down to individual file level so we can see who is using what files/folders, how much storage they're using, how many files they have, when they were created and last used.
To do this I have 2 powershell scripts.
The first reads through the file system and captures the attributes I want and saves them to a file.
dir -rec G:\ | Select LastWriteTime, Directory, Name, Extension, Length, @{Name="Owner";Expression={get-acl $_.FullName| select Owner}} | export-csv FileInfo.csv
The 2nd script reads the csv file and inserts the data into a table.
Once the data is in SQL I can parse the text and split it into various columns and and then produce a variety of reports and analyse the data in different ways. My approach works but it's cumbersome.
Is there a better way to collect NTFS information and save it into SQL Server? What are the alternatives? SSIS?
Edit: Could this all be combined to operate together in a single process?
-
1take a look at this link, in which the MSFT guys talk about loading disk space stats into SQL Server using just PS: blogs.technet.com/b/heyscriptingguy/archive/2010/11/01/…beeks– beeks2016年01月06日 22:44:20 +00:00Commented Jan 6, 2016 at 22:44
2 Answers 2
SSIS is well equipped to handle CSV files and load them into SQL Server.
You can have a very simple package using the Flat File Source
.
enter image description here
The dialogue and setup is a familiar windows "wizard" like process, and most of it is automated... what you need to pay attention to is that it has correctly guessed your file for lengths and data types. You can either adjust the settings in the connection manager or you can later change data types with SSIS tasks. Note that if you have say 10,000 rows of integers and then start getting characters the flat file source may easily assign an integer data type to that column, then fail when it encounters the characters. Thus with large files that may not be well structured you have to pay more attention to these settings. The Suggest Types...
button allows you to increase the number of inspected rows, but I have found that even this can still recommend the wrong data types.
SSIS is a huge tool and you can perform data clean-up tasks or even split data into different tables from the single CSV. If you have different tables use tasks like Multicast
or Conditional Split
. You may also find that Data Conversion
and Derived Column
can help you efficiently produce the data you need as it moves through your package.
I wouldn't do much more than clean, split, modify, and load the data into SQL Server with SSIS though. SQL Server is highly optimized to produce aggregates, sorts, etc., while SSIS is less capable for such tasks. Tasks like Aggregate
are blocking transforms which essentially means it can stall your SSIS package and consume a lot of memory.
As an example the below SSIS dataflow performs the following tasks:
- Reads a CSV file
- Creates derived columns which are just trimmed versions of the originals
- Performs a look-up to see if the record already exists in the destination
- If the record was not found then it is inserted in the destination
-
Thanks for the reply. Good post. I'm familiar with SSIS and realise I could use it for the Transform/Load. But could it's still several steps. And I'm really trying to reduce the numbers of steps. Could SSIS replace or incorporate the the first powershell part?Sir Swears-a-lot– Sir Swears-a-lot2016年01月04日 23:51:58 +00:00Commented Jan 4, 2016 at 23:51
-
You should be able to by using the
Execute Process Task
in the control flow, then call PowerShell with the appropriate arguments. Something like:PowerShell.exe -file "C:\Scripts\NTFSUsage.ps1"
Dave– Dave2016年01月05日 00:00:57 +00:00Commented Jan 5, 2016 at 0:00
Another option that side-steps the external call, the CSV file, SSIS, etc is to use SQLCLR. You would use either the DirectoryInfo.EnumerateFiles method (newer, can enumerate the list prior to the list being filled) or the DirectoryInfo.GetFiles method (older, cannot enumerate while the list is being created). The EnumerateFiles
method is new as of .NET 4.0, hence it is only available if you are using SQL Server 2012 or newer.
Those methods return a collection of FileInfo objects that will get you most of the properties directly. In order to get the Owner, you need to do a little more work, similar to what you are doing in your PowerShell script. You would use the FileInfo.GetAccessControl method, then call the GetOwner method. GetOwner
takes a parameter of type
and the MSDN documentation does not have any examples, but according to this S.O. answer, Find out File Owner/Creator in C#, it should just be:
FileProperties _Obj = new FileProperties();
DirectoryInfo _Directory = new DirectoryInfo(@"G:\", SearchOption.AllDirectories);
foreach (FileInfo _File in _Directory.EnumerateFiles())
{
_Obj.Size = _File.Length;
_Obj.Owner = _File.GetAccessControl().GetOwner(typeof(System.Security.Principal.NTAccount)).ToString();
_Obj.other properties = _File.other properties
yield return _Obj;
}
The code above assumes you have a struct or class named FileProperties
that will be used to pass back the rows in a streaming TVF.
Using this method, the values returned can (and should) be strongly-typed. Hence, you can populate your table as follows:
INSERT INTO dbo.FileProperties (Name, Length, Path, Owner, ...)
SELECT Name, Length, Path, Owner, ...
FROM dbo.GetFileProperties();
And GetFileProperties
can even be updated to accept an input parameter for the starting directory :-).
Explore related questions
See similar questions with these tags.