I was messing around with some log files for the podcast and exported a CSV. There's lots of ways for us to get stats, but I was just poking around.
Given a CSV format like this:
"File","Hits","Bandwidth"
"/hanselminutes_0026_lo.wma","78173","163625808"
"/hanselminutes_0076_robert_pickering.wma","24626","-1789110063"
"/hanselminutes_0077.wma","17204","1959963618"
"/hanselminutes_0076_robert_pickering.mp3","15796","-55874279"
"/hanselminutes_0078.wma","14832","-1241370004"
"/hanselminutes_0075.mp3","13685","-1840937989"
"/hanselminutes_0075.wma","12129","1276597408"
"/hanselminutes_0078.mp3","11058","-1186433073"
And noticing that there are different file extensions, but one logical show...for example there's an MP3 of Show 76 and WMA of that same show I wanted to find out how many downloads per show.
I fired up PowerShell. First I can bring in the CSV file and notice that it's parsed into a PowerShell object list nicely:
PS C:\> import-csv File.csv | Select File,Hits
File Hits
---- ----
/hanselminutes_0026_lo.wma 78173
/hanselminutes_0076_robert_pickering.wma 24626
/hanselminutes_0077.wma 17204
/hanselminutes_0076_robert_pickering.mp3 15796
/hanselminutes_0078.wma 14832
/hanselminutes_0075.mp3 13685
/hanselminutes_0075.wma 12129
/hanselminutes_0078.mp3 11058
Notice that there are headings in this output? PowerShell's Import-CSV is smart enough to infer the headers and positions and split things up. After the first | pipe I don't ever have to think CSV again.
I'm going to need a new column, though, called Show with the Show number it in. I'll probably glean it from the Filename. Just to get started, I'll make a new "calculated column."
PS C:\> import-csv file.csv | select File, Hits, @{Name="Show";Expression={ $_.File } }
File Hits Show
---- ---- ----
/hanselminutes_0026_lo.wma 78173 /hanselminutes_0026_lo.wma
/hanselminutes_0076_robert_pickering... 24626 /hanselminutes_0076_robert_pickering...
/hanselminutes_0077.wma 17204 /hanselminutes_0077.wma
/hanselminutes_0076_robert_pickering... 15796 /hanselminutes_0076_robert_pickering...
/hanselminutes_0078.wma 14832 /hanselminutes_0078.wma
/hanselminutes_0075.mp3 13685 /hanselminutes_0075.mp3
/hanselminutes_0075.wma 12129 /hanselminutes_0075.wma
/hanselminutes_0078.mp3 11058 /hanselminutes_0078.mp3
/hanselminutes_0077.mp3 11003 /hanselminutes_0077.mp3
/hanselminutes_0074.mp3 6494 /hanselminutes_0074.mp3
Except you can see that while I have a new column called "Show" it's just got the File duplicated. Notice the format for making a new Column:
select File, Hits, @{Name="Show";Expression={ $_.File } }.
Inside that { } scriptblock I need to do my work to get a show number.
It's a classic joke, so I'll say it again: "So you've got a problem and you've decided to use Regular Expressions to solve it. Now you've got two problems."
The show format has changed a bit over the years and some have leading zeros, some have names at the end. It's organic. So, in front of the expression we add a Regular Expression. You can make them easily in PowerShell by putting [regex] in front of a string. The ; is just a separator between statements. The first statement is just the assignment of a RegEx to a variable called $re. Later in my script block I apply that RegEx to the $_.File (remember that $_ is the equivalent of "this" in PowerShell, so $_.File is this.File).
The @{Name="Value";Name="Value"} format is how to express a Hashtable to PowerShell.
PS C:\> $re =[regex]"\d{2}(?=[_.])"; import-csv file.csv |
select File, Hits, @{Name="Show";Expression={$re.matches($_.File)[0] } }
File Hits Show
---- ---- ----
/hanselminutes_0026_lo.wma 78173 26
/hanselminutes_0076_robert_pickering... 24626 76
/hanselminutes_0077.wma 17204 77
/hanselminutes_0076_robert_pickering... 15796 76
/hanselminutes_0078.wma 14832 78
/hanselminutes_0075.mp3 13685 75
/hanselminutes_0075.wma 12129 75
/hanselminutes_0078.mp3 11058 78
/hanselminutes_0077.mp3 11003 77
/hanselminutes_0074.mp3 6494 74
The RegEx "\d{2}(?=[_.])" says "find me the first two decimals to the left of either a "_" or a "." It's not foolproof, at least not until show 100, but this is a Poor Man's Solution.
Next, I'll sort by Show descending then group all the like shows together.
PS C:\> $re =[regex]"\d{2}(?=[_.])"; import-csv file.csv |
select File, Hits, @{Name="Show";Expression={$re.matches($_.File)[0] } } |
sort Show -desc | group Show
Count Name Group
----- ---- -----
9 79 {@{File=/Hanselminutes_0079.wma; Hits=27; Show=79},...
12 78 {@{File=/hanselminutes_0078.m4b.torrent; Hits=18; Show=78},...
12 77 {@{File=/hanselminutes_0077.wma.torrent; Hits=52; Show=77},...
18 76 {@{File=/hanselminutes_76.mp3; Hits=1; Show=76}, ...
11 75 {@{File=/hanselminutes_0075_lo.wma; Hits=468; Show=75},...
In this listing I can see via Count that there were 9 different formats of Show 79 downloaded in this time period. However, the good data is trapped in the Group column. All my previous line items are grouped in there. I need to sum up all the Hits for all the downloads of the the 9 different formats of Show 76...and all the shows.
PS C:\> $re =[regex]"\d{2}(?=[_.])"; import-csv file.csv |
select File, Hits, @{Name="Show";Expression={$re.matches($_.File)[0] } } |
sort Show -desc | group Show |
select Name,{($_.Group | Measure-Object -Sum Hits).Sum }
Name ($_.Group | Measure-Object -Sum Hits).Sum
---- ------------------------------------------
79 9205
78 27575
77 29807
76 42798
75 27174
74 13060
73 10532
72 10145
71 9826
70 8745
69 8065
68 8132
67 7024
66 8535
65 13041
Now, I select out Name and then take each item's Group (remember $_ is this, so $_.Group means "the current item's Group." Each item will go through Measure-Object which has options like -sum and -average, etc. We grab them up and Sum them.
But the Column Header looks silly. I want a pretty name.
PS C:\> $re =[regex]"\d{2}(?=[_.])"; import-csv file.csv |
select File, Hits, @{Name="Show";Expression={$re.matches($_.File)[0] } } |
sort Show -desc | group Show |
select Name,
@{Name="Hits";Expression={($_.Group | Measure-Object -Sum Hits).Sum }}
I just make a Hashtable with the Name "Hits" and put the scriptblock in a key called "Expression." Add some more sorting and:
Name Hits
---- ----
26 78673
76 42798
77 29807
78 27575
75 27174
74 13060
65 13041
73 10532
Now I can see that show 76 got 42798 downloads this time period and the classic show 26 did nicely with 78673.
It this representative of the way you SHOULD do things in PowerShell? No, but it's a transcript of a real-life problem being together at the command line. It took less time than it would take to write a C# program and it's easily modified ad-hoc. Here's what Lee Holmes said he would have done, accomplishing the same thing not as a one-liner but as a proper PowerShell function:
function Get-ShowHits
{
$regex = '/hanselminutes_(\d+).*'
$shows = Import-CSv File.csv | Select File,Hits | Group { $_.File -replace $regex,'$1' }
foreach($show in $shows)
{
$showOutput = New-Object System.Management.Automation.PsObject
$showOutput | Add-Member NoteProperty Name $show.Name
$showOutput | Add-Member NoteProperty Hits ($show.Group | Measure-Object -Sum Hits).Sum
$showOutput
}
}
Get-ShowHits | Sort -Desc Hits
Both are great examples of why PowerShell rocks my world. When are you going to try it? I swear, everyone has "been meaning to try PowerShell but I haven't had the time." Just do it.
Hosting By