The Get-ExifDateTaken PowerShell Script Cmdlet

(Part 2 of 3) (Part 1) (Part 3)

{Updated: 08/2013}

The last blog post described how to update a whole bunch of photos using two script cmdlets called Get-ExifDateTaken and Update-ExifDateTaken.   This post describes the first of these script cmdlets in more detail.

There are several Windows APIs (Wia, GDI etc) that can read Exif values from image files.  I tend to avoid using COM if there’s a .Net alternative, so I used the [System.Drawing.Imaging.Metafile] class and its associated GetPropertyItem() method to read the Exif data I needed.

The script needs to handle pipeline input, so the first thing to do is to get the script Param() statement coded correctly:

[CmdletBinding()]Param (
[Parameter(Mandatory=$True,
ValueFromPipeline=$True,
ValueFromPipelineByPropertyName=$True)
]
[Alias(‘FullName’, ‘$FileName’)]
$Path
)

This allows image file names to be passed in as normal parameters or to be read from the pipeline using various aliases if needed.

The script cmdlet’s Process{} block needs to accept multiple file names, arrays of names, wildcards and so on.  This is all handled with a call to the Resolve-Path cmdlet and we iterate over the results in a foreach:

Process
{
# Cater for arrays of filenames and wild-cards by using Resolve-Path
Write-Verbose “Processing input item ‘$Path'”$PathItems=Resolve-Path $Path -ErrorAction SilentlyContinue -ErrorVariable ResolveError
If ($ResolveError) {
Write-Warning “Bad path ‘$Path’ ($($ResolveError[0].CategoryInfo.Category))”
}    Foreach ($PathItem in $PathItems) {
# Read the current file and extract the Exif DateTaken property# (……SNIP…)} # End Foreach Path} # End Process Block

Within the loop we open each image file into a FileStream (this is quicker than using the $Img.FromFile(…) method) and get the Exif DateTaken value by using GetPropertyItem(‘36867’) – Exif property 36867 is DateTaken; just love those magic numbers…

The value is converted to a [DateTime] and passed down the pipeline attached to the PathInfo object that we got from Resolve-Path.  Passing rich objects in this way (rather than just outputting the DateTaken value on its own, for example) ensures that stages further down the pipeline have access to all the information they might need.

Note that in the code snippet here the error checking code has been removed for clarity – see the full listing in the last post for the complete source:

$ImageFile=$PathItem.Path$FileStream=New-Object System.IO.FileStream($ImageFile,
[System.IO.FileMode]::Open,
[System.IO.FileAccess]::Read,
[System.IO.FileShare]::Read,
1024,     # Buffer size
[System.IO.FileOptions]::SequentialScan
)
$Img=[System.Drawing.Imaging.Metafile]::FromStream($FileStream)
$ExifDT=$Img.GetPropertyItem(‘36867’)# Convert the raw Exif data$ExifDtString=[System.Text.Encoding]::ASCII.GetString($ExifDT.Value)# Convert the result to a [DateTime]
# Note: This looks like a string, but it has a trailing zero (0x00)
# character that confuses ParseExact unless we include the zero
# in the ParseExact pattern….$OldTime=[datetime]::ParseExact($ExifDtString,“yyyy:MM:dd HH:mm:ss`0”,$Null)

$FileStream.Close(); $Img.Dispose()

Write-Verbose “Extracted EXIF infomation from $ImageFile”
Write-Verbose “Original Time is $($OldTime.ToString(‘F’))”

# Decorate the path object with the EXIF dates and pass it on…

$PathItem | Add-Member -MemberType NoteProperty -Name ExifDateTaken -Value $OldTime
Write-Output $PathItem

Note how easily PowerShell enable calls to .Net methods – a great feature of  PowerShell.

The final part will look at updating Exif dates and included the full code for the Get-ExifDateTaken and Update-ExifDateTaken cmdlets.

Reading and Changing Exif Photo Times with PowerShell

(Part 1 of 3) (Part 2) (Part 3)

{Updated: 08/2013}

Back in 2011 I went off with a bunch of like-minded folk on a long-distance cycle (see, for example, here).  We all had a great time, but with six separate cameras in use we came back with a whole load of photos that needed sorting out.

We wanted to combine all the photos from the trip into a single consolidated collection, but apart from renaming all the image files it was also going to be necessary to change the times (the Date Taken values) so the photos all appeared in the correct order when viewed.

Rather than trying to accurately synchronise the times on all of cameras, we used a simple trick of simultaneously pointing them all at a scene and taking a common image; this would allow us to later correct the times of each image to at least the nearest second.  (See below for the actual photos!)

The obvious question is then how to actually change the times and rename the files?  Well, there are countless photo programs (e.g. Windows Live Photo Gallery, Picassa, etc, etc) that would do this, but for batch updates it’s a lot easier to use PowerShell.

Here’s how to rename the files:

$FormStr=”LeJog 2011 {0:MM-dd HH.mm.ss dddd} ({1}).jpg”
gci *.jpg | Get-ExifDateTaken | Ren -New {$FormStr -f $_.ExifDateTaken,(Split-Path (Split-Path $_) -Leaf)}

There are a couple of things to notice here…. Firstly, the new file name is specified in a scriptblock, this returns a dynamically calculated file name based on the original PathInfo item that was passed to the Rename cmdlet (in the $_ variable) and the time the photo was taken (passed in $_.ExifDateTaken).

The PowerShell format operator (-f) combines the template string, in the $FormStr variable, with the values necessary to calculate the new name. In this case, the photos from each camera were held in a folder named for the camera’s owner (so my photos were all in ..\Chris\img001.jpg … etc); so ‘Split-Path (Split-Path $_) –Leaf’ just returns the parent folder names – in this case, ‘Chris’, ‘David’, ‘Mick’, ‘Mike’ or ‘Andy’ [name plugs].

The date-time part of the name just uses standard DateTime format strings, so the resulting renamed image might be something like:

“LeJog 2011 07-10 14.27.21 Sunday (Chris).jpg”

…the important point here is that you can create any name you want; you might not like my choice of filename (although it is at least sortable!) but flexibility is unlimited here.

The other thing to notice is the Get-ExifDateTaken cmdlet.  The source for this, along with the  companion Update-ExifDateTaken cmdlet will be published in a the next two blog entries (Part 2) (Part 3).

The example above shows how to rename a bunch of image files.  To actually change the Date Taken value needs a different approach.  Photos store meta-data such as the Date Taken value as Exif information which is actually combined with the image data within the image file, so the image file must be opened and saved to modify the Date Taken Exif value.  The Update-ExifDateTaken script cmdlet does this:

gci *.jpg | Update-ExifDateTaken -Offset ‘-0:07:10’ -PassThru | ft Path, ExifDateTaken

Here, we specify the amount of time the Date Taken value on each image file should be changed by.  In the example the offset is negative, so any Date Taken values will be moved forwards to earlier times (presumably this particular camera’s clock was too fast).

Note that this technique can also be useful if you go across time zones and forget to change your camera’s clock…

Finally, here’s an example that shows how to modify the Exif Date Taken meta-data and rename the image file in the same command:

$FormStr=”LeJog 2011 {0:MM-dd HH.mm.ss dddd} ({1}).jpg”
gci *.jpg | Update-ExifDateTaken -Offset ‘-0:07:10’ -PassThru |
Ren -New {$FormStr -f $_.ExifDateTaken,(Split-Path (Split-Path $_) -Leaf)}

The Get-ExifDateTaken and the Update-ExifDateTaken script cmdlets were written with input from James O’Neill’s session at the 2011 European PowerShell Deep Dive event in Frankfurt; James has written-up the session in a blog here: “Maximize the reuse of your PowerShell”.

The script cmdlets will be published in the follow-up blog entries: (Part 2) (Part 3)

Oh, and the image we used to sync the camera times?  Here’s one of our tireless support drivers, Andy, holding up the lunchtime sausage:

LeJog 2011 07-06 14.13.54 Wednesday (Andy) LeJog 2011 07-06 14.13.54 Wednesday (Chris) LeJog 2011 07-06 14.13.54 Wednesday (David)
LeJog 2011 07-06 14.13.54 Wednesday (Mick) (2) LeJog 2011 07-06 14.13.54 Wednesday (Mick) LeJog 2011 07-06 14.13.54 Wednesday (Mike)

Regex Toolkit, Prayer-Based Parsing, Bad Examples

There was a recent entry on the Scripting Guy blog showing how to use PowerShell to parse email message headers. While it’s true to say that I have a number of problems with the script itself in this article it was the regex that really caught my eye. You need to read the Scripting Guy article to understand the context, but here’s the regex:

‘Received: from([\s\S]*?)by([\s\S]*?)with([\s\S]*?);([(\s\S)*]{32,36})(?:\s\S*?)’

Unfortunately there are serious issues with the Regex. I explain why and present an alternative later in this post.

What looked immediately strange to me in the regex was the character set ‘[\s\S]’. This matches a single character that is either a space (‘\s’) or is not a space (‘\S’) – in other words it matches *any* single character (which is [almost] the same as the ‘.’ matching character)

It’s clear, too, that the regex will likely fail whenever the server names in the email headers contain the substrings ‘by’ or ‘with’ as there are no delimiters around these characters (it would be better/safer/more correct to test for white space around the delimiters using ‘\s+’ – which means ‘match one or more white space characters’; so ‘\s+by\s+’ and ‘\s+with\s+’)

Looking further on I was struggling to see what this part of the regex was supposed to do: ‘([(\s\S)*]{32,36})’, so I broke it down … The surrounding parens in this case mean it captures something – taking that away leaves ‘[(\s\S)*]{32,36}’.

The {32,36} part says ‘match the preceding pattern between 32 and 36 times’; the pattern actually being matched in this case is then ‘[(\s\S)*]’.

Because this pattern is enclosed in square brackets it means that ‘[(\s\S)*]’ is actually any single one of a set of characters – matching any of the single characters in the set. The characters it will match in this case are therefore: an open paren, ‘any space character’, ‘any non-space character’, a close paren or an asterisk. By inspection you can see that this matches any character (repeated 32-36 times).

Huh?

At this point I was thoroughly confused. This is a good time to say that (a) I have no affiliation with the RegexBuddy company or (b) that I’m not getting any payment for a plug! I can say that I’m a big fan of the RegexBuddy program; if you need to write a regex that is more complex than the average then RegexBuddy is a real help (and the documentation and regex library are great too).  So, I started up RegexBuddy. Its decoding of the regular expression is included at the end of this post but it confirmed what I thought, seriously broken…

So, in the Scripting Guy article, the author is right when he says ‘If you are good at Windows PowerShell and still haven’t used regular expressions, you are missing an important weapon in your Windows PowerShell arsenal’.  But, unfortunately, his solution is misleading and dangerous as an example.

Where’s the QA on the article??  This is posted on a Microsoft site; a site aimed at beginners – unfortunately not good.

A Fixed Regex

Here’s a better way.  Initially this is a simple version that defines a domain name (e.g. outgoing.red.com) as any sequence of any non-blank characters; we’ll refine that in a moment:

Received: from\s+([^\s]+)\s+by\s+([^\s]+)\s+with\s+([^;]+);\s(.+)

I’m not claiming this is perfect; Regex gurus will undoubtedly have refinements. However, I will say that in comparison to the original this is shorter, more robust and, importantly, correct.

Improving Further

We can refine this further by using RegexBuddy’s library (or looking on the web) for a domain name pattern. RegexBuddy suggests this:

\b([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}\b

This matches ‘a-z’ or ‘0-9’ characters one of more times and allows embedded hyphens, all this followed by a dot. This pattern is then repeated (it must occur at least once). It then matches ‘a-z’ characters (at least 2 of them) in order to match the top-level domain name.

If you haven’t yet got the regex way of things, this looks incomprehensible. Breaking it down into chunks makes it manageable. If you’re trying to learn regex then buy yourself a copy of “Mastering Regular Expressions” by Jeffrey Friedl. This is the regex reference, no question. If you can, get a copy of the second edition which covers .Net regex. (Or get a copy of RegexBuddy and read the extensive help).

Non-Capturing Parens

We need to revise this domain name pattern slightly because some of the parenthesised parts of the regex here are used for grouping. Because it isn’t explicitly stated otherwise these grouping parens will, by default, also capture whatever they happen to match. This isn’t a big deal but at the very least it means that referring to the captured groups will need to use different indexes. To avoid this we can modify the grouping parens so that they don’t capture by changing them from ‘(…)’ to ‘(?:…)’, so:

\b(?:[a-z0-9]+(?:-[a-z0-9]+)*\.)+[a-z]{2,}\b

Even more gobbledygook!

Outstanding Issues

This is pretty robust now, but I can spot at least one outstanding risk. If the name of SMTP system includes a semicolon followed by white space (unlikely I agree) then this will be taken to be the delimiter between the SMTP system name and the date. This could be fixed by parsing explicitly along the date, but because the date is in the rather unfortunate RFC822 format [(Use RFC 3339/ISO 8601 format people!)] it’s not so easy to tie down. Instead, we can make sure that the semicolon we match as a delimiter is the last semicolon before the end of the string (of course, this fix assumes the date will never contain a semicolon!)

To modify the regex to do this we can change the delimiter and the final capture to:

‘;\s([^;]+)’

This gives us the following as the final regex:

Received: from\s+(\b(?:[a-z0-9]+(?:-[a-z0-9]+)*\.)+[a-z]{2,}\b)\s+by\s+(\b(?:[a-z0-9]+(?:-[a-z0-9]+)*\.)+[a-z]{2,}\b)\s+with\s+([^;]+);\s([^;]+)

Oh dear.  We’ve fixed a bunch of things, but this is not very user friendly (even if you have got the regex way of thinking…)

Layout

Things can be made better by splitting the regex over multiple lines. In order to be able to do this we first have to include the ‘(?x)’ flag at the start of the regex. This turns on ‘Extended mode’ and allows white space (including newlines) in the regex pattern, as well as allowing comments. Here’s the final formatted pattern, enclosed in a Here-string. This is more complex than the original working solution but it’s also likely to work more of the time…

$regex= @’

(?x)Received:\sfrom\s+  # Starting delimiter

(

\b(?:[a-z0-9]+(?:-[a-z0-9]+)*\.)+[a-z]{2,}\b  # Match domain name and capture

)

\s+by\s+    # Delimiter between domain names

(

\b(?:[a-z0-9]+(?:-[a-z0-9]+)*\.)+[a-z]{2,}\b  # Match domain name and capture

)

\s+with\s+   # Delimiter between domain names and SMTP system name

(

[^;]+   # Capture SMTP system name

)

;\s     # Delimiter between SMTP system name and date

(

[^;]+   # Capture the date

)

‘@

Aren’t objects great?!  If the SMTP headers were emitted as objects we wouldn’t need to do this ‘Prayer-based Parsing’ James O’Neill has recently posted on.

Finally

Finally, note that this parsing is still potentially broken. The RFC2822 header format (RFC2822 supersedes RFC822) defines a number of optional components in the header. Here’s an example extracted from the RFC:

Received: from x.y.test

by example.net

via TCP

with ESMTP

id ABC12345

Oops!  Where did that ‘via TCP’ part come from??!  Obviously Prayer-Based parsing is – errr, pretty shaky.

 

—————

Here’s what RegexBuddy says when decoding the faulty regex :

Received: from([\s\S]*?)by([\s\S]*?)with([\s\S]*?);([(\s\S)*]{32,36})(?:\s\S*?)

Match the characters “Received: from” literally «Received: from»

Match the regular expression below and capture its match into backreference number 1 «([\s\S]*?)»

Match a single character present in the list below «[\s\S]*?»

Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»

A whitespace character (spaces, tabs, line breaks, etc.) «\s»

Any character that is NOT a whitespace character «\S»

Match the characters “by” literally «by»

Match the regular expression below and capture its match into backreference number 2 «([\s\S]*?)»

Match a single character present in the list below «[\s\S]*?»

Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»

A whitespace character (spaces, tabs, line breaks, etc.) «\s»

Any character that is NOT a whitespace character «\S»

Match the characters “with” literally «with»

Match the regular expression below and capture its match into backreference number 3 «([\s\S]*?)»

Match a single character present in the list below «[\s\S]*?»

Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»

A whitespace character (spaces, tabs, line breaks, etc.) «\s»

Any character that is NOT a whitespace character «\S»

Match the character “;” literally «;»

Match the regular expression below and capture its match into backreference number 4 «([(\s\S)*]{32,36})»

Match a single character present in the list below «[(\s\S)*]{32,36}»

Between 32 and 36 times, as many times as possible, giving back as needed (greedy) «{32,36}»

The character “(” «(»

A whitespace character (spaces, tabs, line breaks, etc.) «\s»

Any character that is NOT a whitespace character «\S»

One of the characters “)*” «)*»

Match the regular expression below «(?:\s\S*?)»

Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s»

Match a single character that is a “non-whitespace character” «\S*?»

Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»

Created with RegexBuddy

Scam Phone Calls Really Do Happen.

I just answered the phone to a rather helpful chap who proceeded to try to get me to install software on my PC.  I’ve read stories about these calls and always thought it would be fun to play along, so as soon as I realised what was going on I changed from my usual, “What the hell are you doing calling me and where did you get my number from” attitude towards cold-callers and swapped into, “Oh dear, how did my PC become so messed-up” innocence.

The call appeared to come from the number  0000000 (I assume they’re spoofing the CLI but they could’ve chosen something slightly less noticeable…)

Once the chap had convinced me I was a valid registered Windows user he offered to help me to check if my PC had issues.  I was told to press Win-R (get a Run box) then to enter the characters I, N and F (he said this was short for “Infection”) and press enter.

Once a list of all the .INF files on my system was displayed I was told that these were the infected files.  I’ve got around 1200 matching files – man, I must be in trouble.  He asked if I recognised any of the files, to which I replied no.

I was then invited back to the Run box and asked to enter http://www.ammyy.com.  At this point I’d had enough of playing along so I asked him a few times where he’d got my details, receiving some blustery reply.  I then asked him if he felt guilty about causing people problems that would likely cost them money and time to resolve.  Our conversation ended shortly after and I was left thinking how easy it would’ve been for my parents (or many other people I’m sure) to have been convinced by the scam.

I checked later and the ammyy website appears to offer a remote control download package.

Scary stuff…

Land’s End–John O’Groats, Day 13, Thursday, John o’Groats

There was a strong 20mph+ headwind blowing us back up the road – but earlier days’ progress meant we only had around 40 miles to go to reach John o’Groats… possible surely?!

With around 5 miles left to go the target came into view!

LeJog Day 13 012 LeJog Day 13 014

Andy stayed close by in the car in case any of us expired, but at around 2pm we made it!  Mick is clearly pleased…

LeJog Day 13 015 LeJog Day 13 018

12 1/2 days, just over 985 miles and almost 53,000ft of climbing – I think we were all glad to arrive Smile

LeJog Day 13 020

Day 13, http://connect.garmin.com/activity/98096428  40.75 Miles, 1742 ft climbing

Thanks to everyone who supported and special thanks, of course, to our tireless drivers/logistics handlers/shoppers/cooks/proxy-mothers – a big hand for Andy and Andrew – we couldn’t have done it without you!

Land’s End–John O’Groats, Day 12, Wednesday, The North Coast

We set off from the Invershin Hotel where we’d spent a pleasant evening in front of the fire in the bar with some decent real ale and a take-away currySmile  Then it was up the hill and into the moors

LeJog Day 12 004 LeJog Day 12 005
LeJog Day 12 006 LeJog Day 12 013

The weather wasn’t as good as normal – but still no rain (the only rain to speak of we had all week was, fittingly, in Manchester; Mick maintains it rained in Cornwall but it was just a heavy mist!)

We were heading for a place called Tongue on the north coast, another watershed point in the journey and although we missed the village itself it was great to reach the north coast road and to see the sea – and the first mention of John o’Groats!

LeJog Day 12 015 LeJog Day 12 027

Day 12, http://connect.garmin.com/activity/97281059  66.46 Miles, 3,885 ft climbing

Land’s End–John O’Groats, Day 11, Tuesday, The Great Glen

Up and over from the campsite and back down to the Caledonian Canal – where the swing bridge was open, luckily!  We could still get across on quite easily on the next lock gate but it was an excuse to stop for a photo…  A Wire-Mesh Nessie?!

LeJog day 11 001 LeJog day 11 008

Along to Loch Ness and Urquart Castle

LeJog day 11 011 LeJog day 11 017

…Mike shows his usual road sense and we move north from Loch Ness to Dingwall

LeJog day 11 019 LeJog day 11 022

Unfortunately, accommodation was hard to find, so we pressed on (another thirty b*&^*$” miles) to Invershin and the Invershin Hotel.  Nothing to do with sheer gratitude at having arrived – this was the most hospitable place we stayed all week.

Day 11, http://connect.garmin.com/activity/97281087  95.08 Miles, 4,167 ft climbing