Scott Hanselman

HTTP, HEAD, and Range Requests...

July 9, '03 Comments [0] Posted in Web Services
Sponsored By

Venkat writes that he has a text file (CSV) containing over 50,000 URLs. “I want to run a program that will take this file as input and output a text file which contains only the valid URLs. Basically I need a URL/Link Validator that can perform this job.  I tried to put together a custom C# program to do this, but it takes several minutes just to do a hundred URL. Is there any program/code you are aware that can do this?”

I recommended a Range Retrieval Request, such as those used by GETRIGHT. 
GetRight uses a Range Retrieval Request, like this.  You can do this in .NET by just adding the name/values for Range to the Headers collection.  NOTE: The Server CAN (and many will) ignore this request.   If you get partial content, you won’t get an OK 200, you’ll get a 206 and the Content-Length will have the amount of data included. 

However, another fellow, more clever than myself wrote me to say that a HEAD (rather than a GET) should provide enough information - namely the headers - to determine page existance, without the trouble of the HTTP Body Content.  Good stuff!

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.2
http://www.vbip.com/winsock/winsock_http_08_01.asp

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Sponsored By
Hosting By
Dedicated Windows Server Hosting by SherWeb
Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.