Scott Hanselman

The Weekly Source Code 26 - LINQ to Regular Expressions and Processing in Javascript

May 10, '08 Comments [11] Posted in ASP.NET | Javascript | LINQ | Silverlight | Source Code
Sponsored By

I've been getting more and more interested in how folks extend their applications using plugins and things. In my new ongoing quest to read source code to be a better developer, Dear Reader, I present to you twenty-sixth (half a year!) in a infinite number of posts of "The Weekly Source Code."

Sometimes when I read code, I kick myself (mentally) and say "Man, I should have thought of that!" Then I realize I'm not nearly as good a programmer as I think I am, and then I just let the source just wash over my brain.

Here's some source by smart people I've been reading this week that I should have thought of. ;) Coincidentally they are both examples of languages ported or re-imagined in another language.

"Processing" in JavaScript

When I say "Processing" I mean the open-source Java-based visualization language from http://processing.org/. Jeff calls it "more akin to sketching than coding" while I say it's sketching with code! Jeff yearns: "for the day when web pages are regularly illustrated with the kind of beautiful, dynamic visualizations that Ben Fry creates."

molten2

Well, John Resig, arguably already considered one of the best JavaScript coders on the planet after he gave us the tour de force that is JQuery, has ported Processing to Javascript and gives us Processing.js.

You can interact with it in two ways. First, as an elegant and tight Javascript API:

var p = Processing(CanvasElement);
p.size(100, 100);
p.background(0);
p.fill(255);
p.ellipse(50, 50, 50, 50);

Or, you can tunnel the actual Processing language like this:

Processing(CanvasElement, "size(100, 100); background(0);" + "fill(255); ellipse(50, 50, 50, 50);");

clockThis release is specifically targeted to Firefox3, Opera 9.5 and the Webkit Nightlies (Safari) - all unreleased, beta browsers. I'm going to try it under the DLR with Javascript in Silverlight. Heh heh.

Here are his demos. Remember, these don't work in IE7.

There's a load of demos, but here's a powerful one. A working clock in 17 lines of code.

void setup() {
size(200, 200);
stroke(255);
smooth();
}
void draw() {
background(0);
fill(80);
noStroke();
// Angles for sin() and cos() start at 3 o'clock;
// subtract HALF_PI to make them start at the top
ellipse(100, 100, 160, 160);
float s = map(second(), 0, 60, 0, TWO_PI) - HALF_PI;
float m = map(minute(), 0, 60, 0, TWO_PI) - HALF_PI;
float h = map(hour() % 12, 0, 12, 0, TWO_PI) - HALF_PI;
stroke(255);
strokeWeight(1);
line(100, 100, cos(s) * 72 + 100, sin(s) * 72 + 100);
strokeWeight(2);
line(100, 100, cos(m) * 60 + 100, sin(m) * 60 + 100);
strokeWeight(4);
line(100, 100, cos(h) * 50 + 100, sin(h) * 50 + 100);
}

His code leans heavily on the Canvas which is why IE7 doesn't work. Much of the processing.js file is mapping from one API (the processing API) to Javascript constructs, usually canvas ones. For example, making a point(x,y) is:

  p.point = function point( x, y )
{
var oldFill = curContext.fillStyle;
curContext.fillStyle = curContext.strokeStyle;
curContext.fillRect( Math.round( x ), Math.round( y ), 1, 1 );
curContext.fillStyle = oldFill;
}

Note the rectangle that is 1 by 1. That's funny, but that's the life an API mapper. Remind me someday to tell you, Dear Reader, how I got filled pie charts working on an Original Palm Pilot that not only didn't support Put/GetPixel but didn't have floating point math. That was a hoot.

Anyway, one really good example of this guy's clean cleverness is the triangle function. Remember, this is a processing function and he's not only got to implement it, but also make the building blocks for doing it cleanly.

To start:

  p.triangle = function triangle( x1, y1, x2, y2, x3, y3 )
{
p.beginShape();
p.vertex( x1, y1 );
p.vertex( x2, y2 );
p.vertex( x3, y3 );
p.endShape();
}

Obvious, right? Well, not really, considering that the 2D Canvas doesn't have any of those three higher-level methods. Begin and EndShape are fairly clean. However, he had to implement a nice Fill, Stroke and ClosePath to do this cleanly.

  p.beginShape = function beginShape( type )
{
curShape = type;
curShapeCount = 0;
}

p.endShape = function endShape( close )
{
if ( curShapeCount != 0 )
{
curContext.lineTo( firstX, firstY );

if ( doFill )
curContext.fill();

if ( doStroke )
curContext.stroke();

curContext.closePath();
curShapeCount = 0;
pathOpen = false;
}

if ( pathOpen )
{
curContext.closePath();
}
}

It's about four layers deep, each primitive building on the next until he gets a nice clean triangle implementation, but then he can use it for quad() and it the same method handles bezierVertex as well. It would do you well to FireBug your way through his code. It's a wonderful fun way to re-learn Javascript from a gentleman who knows what he's doing.

LINQ to RegEx and Fluent Regular Expressions

I was trying to re-re-re-learn Regular Expressions again this week for a small task. It's funny how Regular Expressions are the first thing to leave my brain even though there are a bunch of Regular Expression Tools out there. Josh Flanagan came up with a Fluent Interface for Regular Expressions like:

Regex socialSecurityNumberCheck = new Regex(@"^\d{3}-?\d{2}-?\d{4}$");

would look like this:

Regex socialSecurityNumberCheck = new Regex(Pattern.With.AtBeginning 
.Digit.Repeat.Exactly(3)
.Literal("-").Repeat.Optional
.Digit.Repeat.Exactly(2)
.Literal("-").Repeat.Optional
.Digit.Repeat.Exactly(4)
.AtEnd);

It took me a second to like this. OK, it took me a while. Breathe for a minute, and read it out loud. It kind of makes sense, actually, although there is a reasonable argument against in the comments of Josh's post:

[It] strikes me that a user of this library needs to learn a fairly complex syntax which is almost as far from "plain english" as regex, when they could simply learn how to do regex.

Sure, but it's fun to try new things. If you look a his source, it's really just a really smart string concatenator. I think it would actually be a very interesting way to teach or learn regular expressions, especially if you're a casual RegEx'er like me.

Krzysztof Ko┼║mic created a similar API in 2007. His fluent interface over RegEx looks like this:

Pattern pattern = Pattern.Define().
As("Kot".Count(Times.AtLeast(2))).
FollowedBy(Any.Except('a','b','c')).
Start(At.BeginingOfStringOrLine);

Then Roy Osherove took Josh's API further and took Josh's Fluent Interface to RegEx from 2006 and applied a LINQ query syntax , creating in the process, LINQ to Regex.

Here's Roy's example:

public void FindEmailUsingPattern()
{
var query = from match in
RegexQuery.Against("sdlfjsfl43r3490r98*(*Email@somewhere.com_dakj3j")
where match.Word.Repeat.AtLeast(1)
.Literal("@")
.Word.Repeat.AtLeast(1)
.Literal(".")
.Choice.Either(
Pattern.With.Literal("com"),
Pattern.With.Literal("net"))
.IsTrue()
select match;
foreach (var match in query)
{
Assert.AreEqual("Email@somewhere.com",match.Value);
}
}

After the "from match in", the simple heart of it is Roy's static Against() call that returns a RegexQuery that is IEnumerable of Match, thereby supporting the foreach later on:

namespace Osherove.LinqToRegex 
{
public class RegexQuery : IEnumerable
{
private readonly string input;
private object lastPatternRetVal;
private RegexQuery(string input)
{
this.input = input;
}
public static RegexQuery Against(string input)
{
return new RegexQuery(input);
}

private string _regex;
public RegexQuery Where(Expression<func><pattern,bool> predicate)
{
_regex = new PatternVisitor().VisitExpression(predicate).ToString();
return this;
}

public RegexQuery Select<t>(Expression<func><pattern,t> selector)
{
return this;
}
#region IEnumerable Members

IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable)this).GetEnumerator();
}

public IEnumerator GetEnumerator()
{
MatchCollection matches = Regex.Matches(input, _regex);
foreach (Match found in matches)
{
yield return found;
}
}
}
}

You can find all the source for Roy's project up at his assembla.com project site and Josh's source is on his blog. It is worth noting, though that you can combine LINQ Queries with Regular Expressions without any tricks because Matches are returned in a MatchCollection an LINQ loves things that are IEnumerable.

You can use LINQ projections to pull objects out of a collection of matches like:

    List<yourType> = (from Match m in matches
select new YourType
{
Id = m.Groups[1].Value,
Something = m.Groups[2].Value
}).ToList();

So, we've got two sides of the coin here. First, the creation of the Regular Expression. That can be the standard way, or with a fluent interface. Either way, you end up with a string. Second, you've got the extraction of the information. Most often you'll care about the MatchCollection that comes back. You'll usually want to pull information out, so while you're foreach'ing your way over the collection, you can use LINQ to create an object projection that's chopped up and sorted and grouped all with one query, regardless of how you created the query in the first place.

Choice is good.

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. I am a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by ORCS Web
Saturday, May 10, 2008 8:49:25 AM UTC
Something weird is going on with your source there. In your LinqToRegex code, it finishes off with a bunch of literal HTML end tags that seem to be matching up to the generic types:

</match></match></pattern></func></t></pattern></func></match>
Saturday, May 10, 2008 12:38:15 PM UTC
Maybe Microsoft can create these kind of methods in a RegularExpressions.Linq namespace as extension methods, and then make the compiler translate query expressions to these methods so we can truly have a dsl for querying text, after all, regexes have the same downsides as sql strings when used in code. Maybe something for .NET 4.0?
mike
Saturday, May 10, 2008 12:52:52 PM UTC
Maybe IE should get onboard with other better browsers ?
Steve
Saturday, May 10, 2008 6:08:38 PM UTC
Really what's happening here is that LINQ is enabling the creation of a DSL.

Nothing but respect for Roy & Josh, but personally I don't think creating another way of expressing Regex helps. It's not the syntax of the language that's all that difficult, it's the behavior of the engine itself that's tough to wrap your head around. Replacing symbols with words doesn't help, IMO, and arguably makes it harder to comprehend. But I applaud the attempt -- it's an interesting exercise.

Saturday, May 10, 2008 7:37:40 PM UTC
Dave - Exactly. I'm not touting this as THE way, or even a good way, but it's a VERY interesting exercise as we all try to be more effective.

Gwyn - Yes, I have some encoding problems with < and > that I'll fix tonight. Thanks!
Sunday, May 11, 2008 9:10:02 PM UTC
Scott, you're starting to have too many ads on your site. I understand bills need to be paid somehow, but there's too many ads on this blog now.
Dean
Monday, May 12, 2008 12:37:52 AM UTC
Dean - Thanks for the feedback. One question, though. The number of ads on this blog hasn't changed in two years. Which ads are bothering you most, or which ads do you think are new?
Monday, May 12, 2008 3:30:58 PM UTC
@Dave,

"It's not the syntax of the language that's all that difficult, it's the behavior of the engine itself that's tough to wrap your head around."

I couldn't disagree more. I have a decent understanding of regular expression behavior, but every time I find myself needing to write a regular expression, I have to drag out the docs so I can remember the syntax that I need to use to get the behavior I want. Having a discoverable, intellisense-enabled library that I can use without having to drag out documentation is potentially very useful.
Wednesday, May 14, 2008 4:31:38 AM UTC
I think the regex expressions are easy to replace, test, store and pass around as compared to FindEmailUsingPattern() method where the same logic is strongly typed. It is also easy to scan the regex expressions and look for patterns and do copy and past. I think the example given is a good demonstration of the language syntax and capabilites. MS should build and provide regex building tool in there development environment expose developers who reluctant to use regex. Just my 3 cents.
Adnan
Wednesday, May 14, 2008 5:07:17 AM UTC
Thanks for another outstanding Weekly Source Code.

This is really helpful and very much appreciated. I've been following the browser/client-side graphics support thing for a while, and while some folks like dojo have been able to translate to vml when in IE, it hasn't caught on at all. Thanks for the Processing.js pointer, that really proves the point well.

I linked to Josh's original article (see my post titled A simple example of a fluent interface) and got a lot of angry comments, but I really think that there's no contest. Regex is powerful, but it's assembly language for pattern matching. A friendlier pattern matching DSL only makes sense.

Thanks again for all your work on these.
Thursday, May 15, 2008 8:18:16 AM UTC
It should be possible to use VML in IE to do the same. I remember Google having an open source library to abstract over the canvas tag and VML
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.