Scott Hanselman

You're just another carriage return line feed in the wall

February 20, 2013 Comment on this post [77] Posted in Musings
Sponsored By

An image of a screaming face from Pink Floyd's "The Wall" album, coming out of a wall as if the wall were elastic, with the characters CR/LF in its mouth

I love getting pull requests on GitHub. It's such a lovely gift when someone wants to contribute their code to my code. However, it seems there are three kinds of pull requests that I get.

  1. Awesome, appreciated and wanted.
  2. Not so good, thanks for trying, but perhaps another time.
  3. THE WALL OF PINK

I'd like to talk about The Wall of Pink. This is a pull request that is possibly useful, possibly awesome, but I'll never know because 672 lines (GitHub tells me) changed because they used CRs and I used LFs or I used CRLF and they used LF, or I used...well, you get the idea.

There is definitely a problem here. But what's the problem? Well, it's kind of like endianness, except we're still talking about it in 2013.

"A big-endian machine stores the most significant byte first—at the lowest byte address—while a little-endian machine stores the least significant byte first." - Endianness

Did you know for a long time Apple computers were big endian and Intel computers were little endian? The Java VM is big endian. I wrote shareware code generator 16 years ago that generated a byte array on an Intel PC that was later entered into a PalmPilot running a Motorola 68328. This was the last time I thought about endianness in my career. Folks working on lower-level stuff do think about this sometimes, admittedly, but the majority of folks don't sweat endianness day to day..

TCP/IP itself is, in fact, big endian. There was a time when we had to really think about the measurable performance hit involved in using TCP/IP on a little-endian processor. But we don't think about that anymore. It's there but the abstraction is not very leaky.

It's years later, but CR/LF issues plague us weekly. That Wall of Pink I mentioned? It looks like this. I had to scroll 672 lines before I saw the +green where the added lines were added. Who knows what really changed here though? Can't tell since this diff tool thinks every line changed.

image

Sigh.

Whose fault is this?

Perhaps we blame Émile Baudot in 1870 and Donald Murray in 1899 for adding control characters to instruct a typewriter carriage to return to the home position plus a line feed to advance the paper on the roller. Or we blame Teletype machines. Or the folks at DEC, or perhraps Gary Kidall and CP/M for using DEC as a convention. Then the bastards at IBM who moved to ASCII from EBCDIC and needed a carriage return when punch-cards fell out of favor.

The text files we have to day on Windows still have a CR LF (0D 0A) after every line. But Apple uses just uses a line feed (LF) character. There's no carriage to return, but there are lines to advance so it's a logical savings.

Typewriter picture via Wikimedia Commons

Macs and PCs are sharing text more than ever. We live in a world where Git is FTP for code, we're up a level, above TCP/IP where Endianness is hidden, but still in text where CR LF's aren't.

We store our text files in different formats on disk, but later when the files are committed to Git, how are they stored? It depends on your settings and the defaults are never what's recommended.

You can setup a .gitattributes per repo to do things like this:

*.txt -crlf

Or you can do what GitHub for Windows suggests with text=auto.

# Auto detect text files and perform LF normalization
* text=auto

What's text=auto do?

This ensures that all files that git considers to be text will have normalized (LF) line endings in the repository. The core.eol configuration variable controls which line endings git will use for normalized files in your working directory; the default is to use the native line ending for your platform, or CRLF if core.autocrlf is set.

GitHub for Windows offers to normalize the repository's line endingsIt uses the native line ending for your platform. But if you spend a few minutes googling around you'll find arguments several ways with no 100% clear answer, although most folks seem to believe GitHub has the right one.

If this is the right answer, why isn't it a default? Is it time to make this the default?

This is such a problem that did you know GitHub for Windows has dedicated "normalize your repo's CRLF" code? They'll fix them all and make a one-time commit to fix the line endings.

I think a more complete solution would also include improvements to the online diff tool. If the GitHub repro and server knows something is wrong, that's a great chance for the server to suggest a fix, proactively.

Solutions

Here's some possible solutions as I see it.

  • Make Windows switch all text files and decades of convention to use just LF
  • Git needs to install with correct platform-specific defaults without needing .gitattributes file
  • Have the GitHub web application be more proactive in suggesting solutions and preventing badness
  • Have the GitHub for Windows desktop application proactively notice issues (before I go to settings) and offer to help
  • Make the diff tool CR/LF aware and "do the right thing" like desktop diff tools that can ignore line ending issues

Until something is done, I'll always tense up when I see an incoming pull request and hope it's not a Wall of Pink.

Thoughts?

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Hosting By
Hosted in an Azure App Service
February 20, 2013 12:46
Doesn't WinMerge already do this? There's an option for ignoring CR/LF and other whitespace differences

http://manual.winmerge.org/Configuration.html
February 20, 2013 12:47
Then I read your last line :-) Sorry!
February 20, 2013 13:22
Given my earlier comprehension failure (note to self: read, comprehend then reply) I'd like to point out that the last point, making the diff tool CR/LF and whitespace aware is essential - unless these characters have actually meaning in your language (e.g. coding in Whitespace) it makes pragmatic sense to be able to ignore differences resulting from this.

At worst it could be an option!

As you've pointed out, on many occasions, it's a diverse world out there, and, as GitHub can and does accept contributions from multiple platforms, ignoring minor differences in formatting when diffing files is a way of removing one of those annoyances we get when sharing things between OS's.

Imagine rejecting someone's submission because they'd revised the code under Mono on Linux and the file was originally edited in Visual Studio on Windows leading to the 'sea of red'. Sure, the line endings may be different, but they don't matter :-)

So my vote, in this world of lots of different people on lots of different operating systems, is to make the diff tool aware of these potential differences and to allow them to be ignored. Make it an on/off button to keep people (and Whitespace developers) happy. I don't really like the other options as they are more a "my platform is right, yours is wrong" viewpoint.

Dash

PS I really, really wish I could remove my first two comments :-D

February 20, 2013 13:45
Thought: DIY, GIT is open-source, so fix it when it annoys you and put some normalization code in there that is run by default without the need for a config (or in the DIFF which is probably also open source).
February 20, 2013 13:47
Franky - I can fix my own, but the solution is to solve it for everyone. That means either they take a theoretical pull request from me, or GitHub takes a leadership role in sensible defaults for all.
February 20, 2013 13:50
I haven't been working much on Visual Studio for a year at least, but I remember it used to prompt a lot "This file has a different line ending, do you want to normalize it?"

http://stackoverflow.com/questions/553548/what-does-visual-studio-mean-by-normalize-inconsistent-line-endings
February 20, 2013 14:05
That's all very nice, Dash, but quest for normalization against the quest for awareness of all possible differences is lost by the latter from start.

Besides that, any language where none-visual characters (indent included and mainly) are relevant for syntax is bad, to say the least.
February 20, 2013 14:09
Basically when you create a pull request you are responsible. That includes checking the actual final contents of your pull request on Github.

When you notice CR/LF problems popping up: clean your code, rebase it to squash commits and force push to update your feature branch from where the pull request was made.

Then check your pull request again and repeat when needed.

When I create a pull request I want the receiving end to focus on my functional changes. And we have the tools to prevent this diff explosion from happening. Although I agree it would be nice if Github would catch this in their diff.

In a way it reminds me how Pascal would complain about a missing period '.' at the end of programs, but never default to putting it there for me ;)
February 20, 2013 14:10
I think the last option would be the option, with the path of least resistence right now. I use WinMerge for comparing, which probably means that I am frequently unaware of line ending issues, and for me that would be the most logical solution, of the online diff tools at least included an option making it possible to ignore line-ending differences.

Regarding languages with significant whitespace like Python: I would be surprised if any of these languages didn't have some kind of compiler fix for different line endings. The tabs should be the same on all platforms shouldn't they?

That said: I think that windows (which is my platform of choice) should stop using CRLF for line endings, it seems a bit silly, using a standard tied to an old mechanical platform.

Jesper Hauge
February 20, 2013 14:19
If only there were computers to solve and normalize these straightforward and understood problems. ;)

The tools should handle it. It shouldn't be the pull request-maker's job.
February 20, 2013 15:01
The "best" solution will be: Windows, please use everyone else's convention

The "good" solution will be: Developers, configure all your tools to use LF. Any developer tool minimally mature is able to handle "the problem" and transform the code transparently...

The "reasonable" solution I'm afraid is to make GitHub aware of the differences. Ideally, it should be possible to "set your project" to use LF (or, gods forbid CR/LF)
At the very least, GitHub should give you the option to ignore LF differences in diff... But diff in GitHub is a joke. Curiously, I wrote about it earlier this week on my blog (http://wrongsideofmemphis.com/2013/02/19/github-for-reviewing-code/)
In short, the diff page and review needs a lot of work, and most of it is pretty basic (like side-by-side diff) and I don't know why GitHub hasn't done that yet...
The problem with that is that I can't see any movement around that recently in GitHub....
February 20, 2013 15:11
'Make the diff tool CR/LF aware and "do the right thing" like desktop diff tools that can ignore line ending issues'

That's the way it should be, right? I mean, you can't expect all developers in the world to be aware of this, and change their development environment!

People saying windows should change to only LF, keep on dreaming, what about the million lines of legacy code?

I noticed something strange in Chrome lately, the error text from failed ASP.NET requests shows up as one big line of error text in the developer console, is this also related to these CRLFs?



February 20, 2013 15:21
@Fred Hoogduin: is there any chance of an English translation of this comment? You've basically summed up all the problems with Git in one comment: you need to study a Masters level course to understand how to use it.

In other words: if you're a windows developer posting an open source C# project on GitHub and forcing me to use an arcane version control system (from the commandline, no less) then you should consider yourself lucky you get any pull requests at all.
Kat
February 20, 2013 16:42
Hey Scott,

Why don't you fork git, fix the problem, and send them a pull request?
February 20, 2013 16:54
I really think when Unicode was created it should have solved this problem. There should have been a single new line character according to wikipedia these are supported as a new line in unicode:
LF: Line Feed, U+000A
VT: Vertical Tab, U+000B
FF: Form Feed, U+000C
CR: Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL: Next Line, U+0085
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029

But as a pragmatic approach setting the diff tool to be new line aware would be best in my opinion.
February 20, 2013 17:00
Its going to be hard solving that one globally. As many solutions to previous issues, it could take time. I don't think the *nix world is going to change their always beloved ways and neither will the Windows community. I would suggest a universal solution. One that would respect each other's way. Maybe select a specific character from the ascii or Unicode spectrum, say $chr(x) to be used intead of the linefeed occurence and thenhave al text editors be able to interpret that mistery character into their preferred option.

February 20, 2013 17:02
But CR and LF are different. CR moves the cursor to the beginning of the line, and LF moves it down a line. Arguably using just LF or just CR is wrong because you either need infinite columns, or you overwrite the same line over and over.

Or use Winmerge ;)
February 20, 2013 17:04
Kat, Git is anything but arcane and running from a command line doesn't make something old or outdated. :-)

My thought is that normalization should occur on the server IF it makes it past the developer based on pre-configured defaults, i.e. enable the configuration of the Git to denote a default EoL and normalize to it on commit.

Outside of that, just making the "right" settings the "default" within the tools would probably solve a lot of headaches.
February 20, 2013 17:13
But CR and LF are different. CR moves the cursor to the beginning of the line, and LF moves it down a line.


That is the archaic meaning of CR and LF. I doubt that there is any modern use for using two characters to indicate a newline--everyone always wants the text to begin one line down, at the left side.
February 20, 2013 17:35
Don't we have the same issue with "tab" vs "spaces". I see that a lot also. I use tabs but when others using spaces. When someone adds or updates my files then I "resharper" them I notice a change in odd areas of the file. Turns out it was because the other developer used spaces instead of tabs. Just wondering if you think that is an issue similar to the line ending. I know we have more control over tabs vs spaces but do we? Programmers are hard to change and/or control. :)
February 20, 2013 17:40
Any sane tool should support mixed line endings... There is nothing more useless than commits that "fix" line endings. They just add noise to the repository history
February 20, 2013 17:57
I don't like the "Make the diff tool CR/LF aware" solution. It's just a workaround, hiding the issue instead of solving it. Git has a feature to deal with this problem, everyone should use it. Git is just as important tool in your toolbelt as your IDE or compiler - learn to use it properly and don't be lazy to configure it.

And developers should check their diffs before submitting a change: if you changed 20 lines of code but diff shows 600, don't initiate a pull request but fix it first. And when you receive a pull request like that you can kindly ask the sender to clear up the change first, maybe with a link to this blogentry or some related articles/man pages.
February 20, 2013 18:20
I'm always a big fan of Ron Popeil solutions. Using Text=auto and autocrlf=true seems to be a nice combo.

Set it....and forget it.
February 20, 2013 18:20
Another example of the wall of pink:

https://twitter.com/codinghorror/status/298913370646642689
February 20, 2013 18:32
You need a better diff tool. I abandoned the Visual Studio built-in one pretty quickly. A good diff tool will let you ignore line endings and/or white space changes.
Sam
February 20, 2013 18:46
@Scott Have you tried viewing you GitHub pull request with a "w=t" query parameter? This may help.

https://github.com/blog/967-github-secrets
February 20, 2013 19:02
That is the archaic meaning of CR and LF. I doubt that there is any modern use for using two characters to indicate a newline--everyone always wants the text to begin one line down, at the left side.


The windows terminal still uses this convention, and for text based applications it makes sense (it you want to override the current line use CR, if you need a new line use CF+LF).
February 20, 2013 19:26
EXCELLENT comment by @AnthonyCapone!

I'll second the link: https://github.com/blog/967-github-secrets
February 20, 2013 19:35
I just think that the git clients on Windows should all be auto-configured to convert to <LF> on commit (or, git should just do that automatically anyway). It's the way it's done in Unix-land, and by making that the convention, we wouldn't have this problem.

Incidentally, it is better than it used to be. Pre-OS X, Mac's used <CR> for line-endings, with *nix on <LF>, and Windows using <CR+LF>.
February 20, 2013 19:43
@michiel - "I noticed something strange in Chrome lately, the error text from failed ASP.NET requests shows up as one big line of error text in the developer console, is this also related to these CRLFs?"

Absolutely! I'm facing an issue now with imported data in my web app having a mixture of CR, CR+LF and LF which renders differently in text boxes in different browsers. Some show the text as one-line and others (correctly) split the text across lines.

However, even if I do replace CR+LF and CR to just LF in my import (or whatever I decide to use) - I still have the different browsers (IE, Chrome, FF, Opera, Safari etc.) submitting either CR, CR+LF, LF when the form is posted. There's no consistency across browser versions either.

@Scott - it must be National CRLF week, as I've literally hit this issue in the last 24 hours and my weekly checkup of your blog came up with this!

CR... LF........ pah!

Mal
Mal
February 20, 2013 19:58
We live in an HTTP world, so if standardising on CRLF as end-of-line marker was good enough for the folks writing the HTTP spec, it's good enough for me :)

February 20, 2013 20:05
I agree with @AnthonyCapone and @jer0enh, the whitespace parameter will probably be of use to you.

https://github.com/blog/967-github-secrets

February 20, 2013 20:35
Isn't is also possible that the problem is caused my the age old tabs vs. spaces debate? We use bitbucket, and some people have Productivity Power Tools for VS 2012 installed here, which helpfully offers to tabify files that you are working on. Since some of us are set to save files with tabs and others are set to save files with spaces (because it's default for some strange reason), I've had more than a few pull requests in bitbucket come through with the wall of pink.

I think online repos like github and bitbucket just need to be more intelligent with treating whitespace. I don't have these issues in Winmerge or Beyond Compare.
February 20, 2013 20:44
That is the archaic meaning of CR and LF. I doubt that there is any modern use for using two characters to indicate a newline--everyone always wants the text to begin one line down, at the left side.


As someone who's built a telnet server and online multi-player RPG, I have to disagree with the statement made above as posted. Terminal emulation is still used in many places where CR and LF have very different meanings.

That being said, I think if we dropped the blanket "everyone always wants" and instead said for source code repo purposes we could all agree on one standardized end of line character, most would be agreeable.
February 20, 2013 21:47
I second https://github.com/blog/967-github-secrets. I wrote an extension for Chrome to add &w=1 automatically. It's not published in Google's WebStore though because I don't get why I should pay US$5.00 for that.
February 20, 2013 22:12
This is actually a larger problem than just the visualization of changes to a repository. There are concerns here with index growth just based on inconsistent line endings.

This is normally not a problem for people who are all developing in the same environments, but this can become a much larger problem on projects where there are differing operating systems underneath the commits. With the rise of people working on Ruby and node in windows, this is an issue that more people need to be monitoring.

While it is very simple to add the proper .gitattributes file and re-normalize a repository, it's highly deterring to people wanting to contribute to be shot down only because of something as inconsequential as a line ending configuration.

February 20, 2013 22:17
For GitHub specifically, you can simply add ?w=1 to the end of the URL to have the diff ignore whitepace. This helps, but still isn't perfect. Also, you can't make inline comments in this "ignore whitespace" mode (I've submitted this as a feature request). Check out https://github.com/blog/967-github-secrets.
February 20, 2013 23:20
@shanselman wish ya'd just make a declarative statement on what you think should be the convention/fix so we can just follow it and link to it. :) Yes the tech is broken, but no one taking a decisive stand on a proposed solution is the larger issue to be fixed first, then the tech tools, config defaults, and OS will eventually follow.
February 21, 2013 0:10
Unless you enjoy herding cats, the diff application should handle it. At work, my diff application is Araxis Merge and it includes options to ignore the line ending differences as well as ignore whitespace differences at the start of lines (for the tabs vs. spaces issue). Otherwise, you'll have to deal with programmers who use a variety of software that would all be required to follow whatever standard is chosen. If I saw your "Wall of Pink" and had no way to get the diff application to ignore the line ending differences, I would regard that as a failure of the diff application.
February 21, 2013 0:37
Hi Scott,

A bit out of topic - but the code looks like It's open for SQL injection attack ?

unless of course, tablename, column and where were checked against schema.

cheers.
YS
February 21, 2013 1:51
Isn't it simpler to just do git config --global core.autocrlf true and forget about the problem?
BK
February 21, 2013 2:24
Hi Scott,

I'm not 100% sure that this fixes your problem, but Editor Config is a way of specifying this for your project.

You specify in a text file in the root of your project what should be used for line feeds as well as tabs and spaces.

There is a plugin for VS that works quite well. Other tools support it as well, for example Sublime Text.

Thanks,
Don
February 21, 2013 4:03
For some reason, this issue falls into the blind spot of otherwise intelligent people. Here's my tally of responses to this post so far:
  • "Why don't you just use my one-off personal fix to fix this just for yourself?" - 22
  • "I don't have a fix to propose but I'm commenting anyway" - 10
  • "I agree, we need a universal fix" - 6
That's less than 1 in 5 of your readers who "got it" and agreed that a bigger-picture fix is needed.

Sometimes it seems like the majority of programmers are narrow-minded "I have hammer! Bang!" individuals who rather enjoy smashing the same nail over and over again, not realizing that millions of others are also wasting time smashing the same exact nail, each in their own way.

I liked Jaime Buelta's best/good/reasonable dichotomy. It turns out that "best" and "good" aren't actually reasonable in this case, causing people despise the "reasonable" solution.

I also agree with Noah Coad that it would help if you made a declarative statement about the convention/fix. As evidenced by the responses to this post, this is a social/cultural problem as much as a technical problem.
February 21, 2013 4:13

There is no way a system will change convention. The barriers are only partial technical.

Which system should it switch? Browsing the comments I see quite a few "Windows should ..."
Why? Why would an OS with 92% market share change its ways? What is wrong with the Windows (and HTTP) convention?

And btw, Mac used CR until OS X, it only moved to LF when it switched to a Unix kernel. And for a while it was quite painful, as various files used/required different conventions on the same machine.


The pragmatic solution would probably be a mixture:
* decent defaults ("auto") for git and GitHub (few people change the defaults if they are not bothered by them)
* a diff/merge tool that ignores line ending type by default
* maybe some kind of "normalize everything to my settings before showing it to me" option, also by default
February 21, 2013 4:22
When I install MSysgit, which is also part of GitExtensions, it offers me what sounds like a very nice default. It says it'll convert line Endings to Unix/Mac format on commit, and convert them to Windows format on checkout / pull.

Msysgit is the "official" git for Windows. If all clients use the official offering like Gitextensions (and maybe TortoiseGit) do, we shouldn't have this problem.
February 21, 2013 5:30
On GitHub Pull Requests, just append ?w=1 to the URL and whitespace/line ending changes are ignored.
February 21, 2013 5:31
I just accessed your site with fiddler open in the background, and a message popped up warning me of a protocol violation accessing the url: http://www.assoc-amazon.com/e/ir?t=diabeticbooks&l=as2&o=1&a=0805091106

It says:

The Server did not return properly formatted HTTP Headers. HTTP headers should be terminated with CRLFCRLF. These were terminated with LFLF.

Must be ironic
February 21, 2013 5:58
I love you guys but if you think this is solved by a diff tool, you've missed the point. This is a systemic defaults issue. Git needs better defaults and tools need to hide these issues. @Mihai gets it.
February 21, 2013 6:23
While it doesn't address the conflict at hand, I agree with the others that the responsibility lies with the pull-requestor to make their request not be Pink.
February 21, 2013 13:43
There are a couple of considerations here, as I see it:

As a repository maintainer...
- I don't want commits that change every line. What the diff tool is capable of ignoring is beside the point.
- I don't want a mixture of CRLF and LF in the same file. That's just untidy.
- I don't want some files with CRLF and some files with LF line endings. That's also untidy.

As a developer...
- I don't want to think about line endings
- I don't care whether a file has CRLF or LF endings
- I want my tools (be it editor or source control system) to handle line endings automagically for me

Most editors are quite happy to open a file with any style of line endings and go with the flow. I'm actually quite surprised to find that the Wall of Pink turns up at all. However, I have a suspicion: On Windows, people use msysgit, and msysgit asks a very difficult question during install: What do you want to do about line endings?

http://uncod.in/images/msysgit7.png

How on earth should you know what to do here? There is a valid argument for all three options.

Personally, I would go for the option to check out as is, check in normalized LF. Editors should deal with files containing just LF and do the same when you edit the files, while Git should detect that you are trying to commit a text file with CRLF and normalize that to LF.
February 21, 2013 15:23
@Ove Gram Nipen

I know exactly what you mean with msysgit, and I always have to take a break and think when I see it :-)

==== Rant part, not targeted at @Ove =====

The clean thing (I think) is to "check-out using platform preferred style" and "I don't give a dime how you check-in". If git decides to convert all line-ending to Unicode line separator (U+2028) or something else, I don't care, as long as it is done consistently. A bit like the libraries dealing with TCP/IP, where you don't get to "see" the endianess that the TCP/IP headers use on the wire.

That would be the cleanest thing. There are some exceptions though, here are some examples:
* when you deal with other "smart" tools (for instance cygwin will ask you at install time what line ending to use, and if you choose UNIX, then some of the tools will choke on the Windows native conventions)
* sometimes this might break unit testing, if your unit test produces files that you compare in binary mode with reference files from source control

Also, you don't want to convert what looks like new-lines inside a .jpg file :-) So you need control by type, with local overrides at file level if the developer wants that.
This is something that other tools (i.e. perforce) solved long ago. Maybe because it was designed with cross-platform in mind, they make money from it, and don't come with the "we are right, Windows should change" attitude. Things are what they are, just think about it and solve it, don't pass the buck to thousands of developers by asking silly questions.

February 21, 2013 15:50
Make the diff tool CR/LF aware and "do the right thing" like desktop diff tools that can ignore line ending issues.

I think you're asking why you've never tried using Beyond Compare as your diff tool. You won't use the built in tools again after you have and no, I don't work for them.
February 21, 2013 15:51
1. why is this a problem in the first place? more than 50 comments and nobody had an argument why this (having to distinguish between crlf/lf) is a useful thing!
2. is crlf a skeumorphism?
3. @Anthony Capone: the secret feature for github needs ?w=1 as parameter, not w=t
4. sometime i lie awake at night and dream about the idea that someone sometime fixes these kind of problems (utf vs. ascii, bom, cr/lf, ...). at least i can dream it
5. i am with @mihai:
The pragmatic solution would probably be a mixture:
* decent defaults ("auto") for git and GitHub (few people change the defaults if they are not bothered by them)
* a diff/merge tool that ignores line ending type by default
* maybe some kind of "normalize everything to my settings before showing it to me" option, also by default
February 21, 2013 17:56
Pragmatic and easy short-term solution: configure your tools and be responsible -- always check your diff before you push. This is primarily a human problem; thinking about line endings is just one detail in a whole list of things people don't bother to do when changing code, albeit a detail for which there might be a long-term systematic solution.

<rant>Also, anyone who believes that Git is an "arcane version control system" and that command line tools are obsolete/implicitly hard to use should get out of the business, right now. A pointing device is not a tool you _want_ to use in development, it's for games and badly designed web sites/applications.</rant>
February 21, 2013 21:36
+1 for this being a whitespace issue, not really a crlf/lfcr/lf issue...
February 21, 2013 23:52
Dan Ludwig: touche!
February 22, 2013 0:10
We had a problem where the proposed solution of defaults actually ended up breaking our systems:

Were building an app on windows, but one of the systems we talk to is written in Java, presumably from a more *nix based world. The encryption config files it uses use LF for the end of line codes. When the build server pulled those files out of git it changed the line endings and the library couldn't read the files any more.

Whilst the library was clearly at issue, most of the solutions proposed here of change the source code as you commit it and pull it would have left us in a bad state if we couldn't override it somehow.
February 22, 2013 2:46
A problem with converting line-endings on checkout is that the checksums won't match. So you're debugging C++ in VS, and VS tells you that the .cpp doesn't match.
February 22, 2013 6:12
Though I agree automatic normalization at a possibly repo level would be great, the underlying issue is that we are all looking at diffs by line and there are valid cases where a diff by characters on a line would be superior. It's also ugly to see the same thing happen when converting tabs to spaces, or other minor formatting in code that is not strict about types of whitespace.
February 22, 2013 19:18
Hey Scott,

I know this problem only too well and completely agree ...

You could fork the git source find and replace all CRLF with LF / vice versa then check it back in again.

See how the git guys handle that ... lol.

Probably not worth wasting your time on it though ... smarter tooling is definately the way to go ;)
February 23, 2013 2:16
I think if LF were treated as pure line feed and CR were treated as pure carriage return a consensus on CRLF would be reached quicker.
February 23, 2013 12:33
@Kevin Smyth
A problem with converting line-endings on checkout is that the checksums won't match. So you're debugging C++ in VS, and VS tells you that the .cpp doesn't match."


If the conversion happens at checking and checkout, then VS will not notice anything (it was CRLF before checkin, and it is CRLF after checkout). It can be a problem if the checksum is verified cross-platform (do the checksum on Win with CRLF, submit, chechout on UNIX with LF, and the verify checksum).

But I have never seen such a workflow. And I would probably hate to work with it :-) Why would I need a checsum that needs to be kept in sync with my sources?
February 23, 2013 12:44
@Zhaph - Ben Duguid
We had a problem where the proposed solution of defaults actually ended up breaking our systems:

...

Whilst the library was clearly at issue, most of the solutions proposed here of change the source code as you commit it and pull it would have left us in a bad state if we couldn't override it somehow.


This is why the version control should allow type override at file level, like perforce does.
Perforce has "text files" (where line ending gets converted) and "binary files" (where line endings stay untouched, just a bunch of bytes)
Can be smarter, for instance binary, textCrLf, textCr, textLf, but the idea is the same: allow for file level override.

On top of decent defaults, of course.

==========

@DaveWill
I think if LF were treated as pure line feed and CR were treated as pure carriage return

++1;
Because they are. I have seen (and done it myself) the CR used to go back at the beginning of the line and override stuff (for a progress report), without LF.
February 23, 2013 16:32
The issue here is that the Git installation defaults for Windows that lets you commit CR/LF, but stores as LF and when you do a checkout elsewhere *and* changed the Git defaults, you don't get CR/LF transformation.

If everyone used "commit as is, checkout as is" this problem would never happen as all Windows devs would use CR/LF and Git would store it as such.
February 23, 2013 16:39
Plus, having this transformation by default is very annoying for developers using Windows, as your code will use CR/LF but a large number of libraries that you include uses LF (such as Javascript frameworks) and you get a bunch of warnings from Git because of that.
February 24, 2013 4:03
Tortroise svn and tortroise git has provisions where you can ignore any whitespace changes. Especially for oss pull requests users can use this feature for saving the author time from checking lots of pull requests that have these scenarios
February 24, 2013 14:15
Screw all of that, lets change all tools to use the <br/> tag instead :)
February 25, 2013 17:19
So... you are a developer and you use "Windows"? o_O

Anyway, I feel it's a configuration problem. I like GitHub's solution.
February 26, 2013 18:10
In 40 years we still can't agree on a text file format. So what hope is there for HTML? Mankind is doomed!
February 27, 2013 0:59
I know the feeling, luckily some software does support the "ignore line endings" flag.

The VS2010 however keeps insisting on converting my coworkers' line endings every time I open a file... :)
March 01, 2013 0:08
I don't think this is a particularly difficult issue to solve. The code repository could simply include a check-in approval process that has an option to ignore line endings or treat them all the same (CR=LF=CRLF) while viewing changes prior to approval. Same for the other white space, it could have an option to ignore it or treat it as equivalent (4 spaces = tab and so on). Finally, there could be an option to normalize the line endings and white space prior to approving the check-in.

All of these options could be presented automatically as part of the check-in and approval workflow, unless they have specifically been set as defaults by the repository owner. You could even have the option to normalize everything to the repository defaults on check-in by the developer. You could maybe even detect the code editor default on check-out, normalize to the editor settings, and then automatically revert to the repository default on check-in. None of these things are difficult features to implement, and many of us say good software would do a much better job of abstracting everything to make the user experience painless.

But this really all goes back to an old difference of ideology between UNIX and Windows, and Git has certainly gone with the UNIX ideology, giving you maximum flexibility with the tradeoff of minimum abstraction. This is the very reason I just don't use Git. I prefer the abstraction provided by my code repository of choice over the flexibility of Git. Unlike the Windows API battle, which appears to have finally declared a winner, I don't expect to see a clear winner anytime soon among code repository architecture. We've had this difference in ideology for at least the past thirty years. If you prefer the abstraction like me, it's probably not a good idea to use software built with the UNIX ideology like Git.

Now, I could be completely missing something, but I believe there is an organization working on exactly what you're asking for Git to implement in their repository solution. Someone finally realized that it's actually pretty simple to integrate the abstraction based architecture with the other architecture. There are others that have discovered this same thing and turned it into an opportunity.
March 14, 2013 2:57
Then the bastards at IBM who moved to ASCII from EBCDIC and needed a carriage return when punch-cards fell out of favor.


Actually, EBCDIC differentiates between the control characters CR (carriage return), LF (linefeed) and NL (new line) at 0x0D, 0x25 and 0x15 respectively.

My self, I use Beyond Compare and dial in the whitespace importance I want. One of the settings for whitespace importance is 'Compare Line Endings (PC/Mac/Unix)'. Unchecked, it treats them all as the same and pays them no nevermind, rather like the CLR does.
March 15, 2013 0:53
Solution: Everyone should always retool every line ending to CRLF. If anything breaks, file a bug against whatever broke.

CR != LF. They mean *utterly* different things.

The biggest complaint I see against 'CR' is 'Well, we don't use typewriter carriages.' So then say 'Cursor Return,' if you want. It doesn't matter. If I ask for an LF at Col 47, then output 20 more characters, I should be able to assume I'm on Col 67, not Col 20.
K
April 16, 2013 10:29
Next up - normalizing on tabs instead of spaces :D

But newlines are a very specific problem, because in most cases they're super easy to detect, and they only affect one when they aren't handled properly. Otherwise they're completely transparent. That's why I see a chance to make things right someday (when everyone finally agrees on \n).

Frankly, I would like to see a common configuration file for all IDEs and all platforms that specifies common formatting for everything and the particularities of each language/format.
April 18, 2013 2:32
"Git is FTP for code."

Part of the rub being that FTP *had* an ASCII transfer mode to mediate the CR-CRLF conflict, yes?

:)
April 19, 2013 14:14
It seems SemanticMerge solves this entirely different, with a nice intro by Jon Skeet.
September 14, 2013 2:22
By the way, to compound the problem, normalizing line endings using .gitattributes is broken on Git for Windows (see Issue #57). They have, however, taken a fix and it should be resolved in version 1.8.4.

Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.