Scott Hanselman

Cloud Database? NoSQL? Nah, just use CSVs and CsvHelper

September 02, 2017 Comment on this post [35] Posted in Open Source
Sponsored By

KISS - Keep it Simple, Stupid. While I don't like calling people stupid, I do like to Keep it Super Simple!

I was talking to Jeff Fritz on my team about a new system we're architecting. I suggested CosmosDB or perhaps Azure Table Storage. Then we considered the amount of data we were storing (less than 100 megs) and Jeff said...let's just use CSV files and CsvHelper.

First I was shocked. SHOCKED I SAY.

via GIPHY

Then I was offended

via GIPHY

But finally I was hey...that's a good idea.

via GIPHY

A fine idea in fact. Why use more moving parts than needed? Sure we could use XML or JSON, but for our project we decided rather than even bother with an admin site that we'd use Excel for administration! It edits CSV files nicely thank you very much.

Can you parse CSV files yourself? Sure, but it'll start getting complex as you move between data types, think about quotes, deal with headers, whitespace, encoding, dates, etc. CSV files can be MUCH more complex and subtle than you'd think. Really.

Here's what CsvHelper can do for you:

var csv = new CsvReader( textReader );
var records = csv.GetRecords<MyClass>();

Here you just get an array of some class - if your class's structure maps 1:1 with your CSV file. If not, you can map your class with a projection of the types in the CSV file.

public sealed class PersonMap : CsvClassMap<Person>
{
public PersonMap()
{
Map( m => m.Id );
Map( m => m.Name );
References<AddressMap>( m => m.Address );
}
}

public sealed class AddressMap : CsvClassMap<Address>
{
public AddressMap()
{
Map( m => m.Street );
Map( m => m.City );
Map( m => m.State );
Map( m => m.Zip );
}
}

And finally, just want to export a CSV from an Enumerable that mirrors what you want? Boom.

var csv = new CsvWriter( textWriter );
csv.WriteRecords( records );

Or do it manually if you like (hardcode some, pull from multiple sources, whatever):

var csv = new CsvWriter( textWriter );
foreach( var item in list )
{
csv.WriteField( "a" );
csv.WriteField( 2 );
csv.WriteField( true );
csv.NextRecord();
}

It won't replace SQL Server but it may just replace one-table SQLite's and "using a JSON file as a database" for some of your smaller projects. Check out CsvHelper's site and excellent docs here along with the CsvHelper GitHub here.


Sponsor: Check out JetBrains Rider: a new cross-platform .NET IDE. Edit, refactor, test and debug ASP.NET, .NET Framework, .NET Core, Xamarin or Unity applications. Learn more and download a 30-day trial!

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

facebook twitter subscribe
About   Newsletter
Hosting By
Hosted in an Azure App Service
September 04, 2017 2:45
If the goal was read/write to Excel - Excel has full REST endpoints on Microsoft Graph so you don't even need to write end points.

September 04, 2017 3:07
Love CsvHelper. I recently used this on a project to import/migrate some data. It was great. And Kimmy Schmidt is uber funny.
September 04, 2017 6:52
I had a website tracking 200k+ product prices on Amazon. I first used SQL but ran out of 2GB space limit on my host, this was before cloud era and housing was expensive. So I switched to text files and got pretty good performance out of it.
September 04, 2017 9:24
We youse csv for margin calculation (ruleset) on customerprices. First we used azure storage table but this was more complex to access / use and maintain as an csv file. This works now like a charm.But the ruleset does not exceed the sizelimit of 200 entries. So this was a KISS descission to support making changes for an suit wearer with a tool what he know l.
September 04, 2017 9:34
Another great option for CSV handling - filehelpers.net.
Its ORM implementation is great, a better interface than CSVHelper IMO (read/write entire record, and very customable)
September 04, 2017 10:23
Hi,

There is something wrong with your cetrificate. When I enter the page by using this URL: https://feeds.hanselman.com/~/456706218/0/scotthanselman~Cloud-Database-NoSQL-Nah-just-use-CSVs-and-CsvHelper.aspx then there is certificate warning (Could not verify identity feeds.hanselman.com) and the certificate is for *.feedblitz.com

BR,
Szymon
September 04, 2017 10:29
If CSV is an option then any other DB was never really an option.
September 04, 2017 11:18
I wonder whether you considered drawbacks of CSV format, because you did not mention any of them. The encoding and also the separator being used are specific to a machine configuration, so reading/writing or editing CSV in Excel might be broken on other machines than your own. The default encoding is not Unicode, so there might be a problem with storing some characters in CSV file.
September 04, 2017 13:08
Whilst this article is about using CSV as the data store I've also found that CsvHelper is fantastic for loading data into SQL Server too.

Read the data using CsvHelper, process the stream of data using ordinary C# code and bulk load it into SQL Server.

The bulk copy interface doesn't work with IEnumerable<T> though and expects a data reader. The code in the file below converts back to an IDataReader.

https://github.com/jsnape/deeply/blob/master/src/Deeply/EnumerableDataReader.cs or
https://gist.github.com/jsnape/56f1fb4876974de94238
September 04, 2017 13:13
I first came across CsvHelpera few years ago, it was the defacto way of accessing csvs at a client because the implementation made mapping to objects super easy and it just worked. If you go back and read robert Martins book he mentions with fitnesse they kept with text files instead of a database after having not enough data to Warrant the extra complexity. I like this idea too often we rush in without considering the requirements of a small system and the database is usually one the first things we decide upon, perhaps we should really decide if we need it at all.
September 04, 2017 14:43
Hmmm, I imagine that you have considered multithreading on the files, right? How do you manage that? And, do you write the whole file everytime you update something in an object? How did you manage this things?

I ask because I found this so interesting. KISS is the rule.
September 04, 2017 15:40
For something simple like this, I would default to using EF Core with SQLite. Easy to set-up with just some classes, plenty of simple tools for doing admin on the database and always scope to move to something more substantial in the future without much rework.

I use CSV when I'm forced to as people tend to open it in Excel and then break the formatting (e.g. adding commas at the end of lines) and they generally have no idea that it's a text file.
September 04, 2017 16:27
For a tiny site, how would you feel about storing user accounts with (properly hashed) passwords in text (csv, json, XML, whatever) files? 🤔
September 04, 2017 18:56
@Torleif how is that any less secure that storing them in a database?
September 04, 2017 19:29
@Torleif How do you feel about storing user accounts with (properly hashed) passwords in database files?
September 04, 2017 20:36
A better option would be SQLite with EF Core. It is just as easy if not easier to implement and your data access is more future-proof. Besides that you cannot index a CSV file or query a subset of your data.
September 04, 2017 23:54
Usually my go-to for little data is Newtonsoft. Can anyone elaborate on the technical pros in contrast to i.e. JSON? The only thing I can think of is file size. Tbh I might be a bit narrow-minded so please enlighten me! :)
September 05, 2017 0:21
Just be sure to abstract out the data access so you can replace it when the requirements change next week.
September 05, 2017 1:45
@James Snape - just had a look at your code. Thanks for that - something to keep in the back pocket for a rainy day!

CsvHelper has indeed be quite helpful. We use it to parse data from MIMS (a medication information database in Australia and other places) for use in our software. They have a new set of files monthly so our app just parses the files as they come in, maps them over to useful objects and calls the job done.
September 05, 2017 5:51
@Torleif @OwenO The biggest difference is that databases do everything on ACID for you while files don't do well. We can't tell if CSV is a good option for their project since we don't know any details about the project they talked about in the post.

But anyway, I agree we should KISS. Storing data in a CSV file is not that bad in some circumstances.
September 05, 2017 11:27
@Danny Chen

Yes, I assumed @Torleif was insinuating there was an issue from a security perspective though; my point was, how is it any less secure to store hashes in a CSV file than a database?
September 05, 2017 16:20
And you can even use it from Cake: https://www.nuget.org/packages/Cake.CsvHelper
September 05, 2017 17:24
I don't know really. Somehow it feels more secure to have something in a database than in flat files, but on the other hand I'm not really sure if that feeling is warranted. And maybe it actually is more secure, but maybe flat files is still secure enough.

As to read/write/sync/etc, that's definitely something of value in databases. But yeah, that's why I specified "tiny". Not really an issue when the site will have maximum 30 users, and not really anything happening other than logging in and out.
September 05, 2017 17:50
Ignore what techies tell you about JSON, XML, YAML, SQL, NoSQL- CSV is the most basic and universal data standard. It is simple, works everywhere, and can be implemented by any client trivially. I wish more people promoted and understood this. We send and receive hundreds of data files with clients each day, and I always insist that they use CSV rather than their custom formatted Excel du jour. It certainly does not handle every use case, but it's great for sending and receiving data.

CsvHelper is great, but I wish they would add Async variants for the methods already and stop walking around the issue. https://github.com/JoshClose/CsvHelper/issues/202

@Pato You could have an encoding issue with *any* type of file or data. Not just CSV.

@MarkAdamson Sounds like you're doing something wrong- Excel does not allow you to "break" CSV files by saving commas at the end of contents. It'll quote the contents if end your content with a comma, as it should. It would be a pretty $hitty spreadsheet application if it didn't work with CSV files. Is your file extension .csv?

Sam
September 05, 2017 18:45
I use litedb which is a simple NOSQL db stored a .db file which I put in app_data for small cloud sites. http://www.litedb.org/ It has Owin authentication Identity middleware and everything, its pretty damn good.
September 05, 2017 18:47
I can see the advantages on a small project, perhaps a non-profit with a site that might need some configurable options in memory. Obviously storing credentials in this format are problematic, but for simple configurations that don't contain sensitive data, CSV would make sense for some non-technical lay-person than json or a SQL datastore. Almost everyone can manipulate a spreadsheet these days.
September 05, 2017 22:09
CSV is great for small-ish tabular or list data. It is, in that sense, almost exactly the same topology as old-school INI files, except a lot more tools support CSV.

One place CSV (and INI) fall down is any kind of hierarchy and references. As soon as you introduce those, you end up kludging together all sorts of monstrosities to make it work. For those purposes (and again, small-ish projects) I use JSON blobs.

Of course, as soon as data requirements get "large" (for some definition of that word), and I need indexing, multi-thread read/write updates, transactions, etc. it is time to move to a real database of some sort.

But for small-ish data requirements: zipped up CSVs and simple JSON blobs are a great choice.
September 06, 2017 15:18
Sam: You can have encoding issue with any text file only if you don't specify what encoding is being used. And that is exactly the problem with CSV files that "can be" also edited in Excel. Applications often expect UTF-8 encoding (which is the best choice for text files), but Excel will use the system encoding that is not the same on all machines, and most important, it is never an UTF-8 encoding. There is similar issue with separator, it can be comma on your machine and semicolon on my machine. This makes CSV a horrible choice for any data exchange that is not completely under your control.
September 07, 2017 1:07
Looks like a good library, but it should be easier to define other ways to convert say, decimals.

I have to make an mappingclass just because I had two decimal fields, and also needed to turn all Fields to Properties?

Neither this blog, or the documentation states that one need to call AutoMap() before one map the single fields, I was hoping that the automap that CsvHelper creates automatically was used before applying my map..

But to handle other decimal separators an attribute would be nicer.. (atleast for my usecase)

Otherwise, super library.
September 07, 2017 13:20
I always get into problems with Unicode when using CSV. Excel doesn't support CSV with Unicode encoding AFAIK.
September 07, 2017 17:49
One of the real failings of the csv format is that people don't know how to handle whitespace correctly.
September 07, 2017 18:44
I do this often for small utilities. At least for trivial cases, you can make it even KISSier by just read input line by line, and apply string.Split() with the correct options for reading. For writing just use string.Join() and write line by line. No need for any lib.

And there is a catch: Many cultures use ',' as decimal separator. So be careful when you have columns containing floats.
September 08, 2017 0:27
Don't disagree with the sentiment here at all but one advantage of even a small rdbms must be basic acid compliance. This may or may not matter in any given scenario though. I would say even with only 100mb of data it might matter
September 09, 2017 23:01
I believe CSV or any simple text delimited flat file has it's place but I don't think it's wise to use it to store domain level data. In most applications you'll have some combination of data including configuration, logging, import/export and domain level. Flat files are best used in EDI import/export situations where information is to be loaded into other systems. Flat files are also useful in places where users can set parameters on an application's behavior, such as configuration files or logging errors in log files that can be easily read by users. Other than that, it makes much more sense to put your domain level data into a storage system that is designed for efficient insertion and query of data, even if it's a simple database library like SQLite or a remote NoSQL web service. IMHO there is little reason to store domain level data in a text file given the variety of efficient data storage options today.
September 10, 2017 19:21
We tend to over-engineer very often. I had the same reactions as you (with all the funny parts of it) when I read the "Agile Principles and Practices in C#", of Robert C. Martin. He said something like: don't use a database. Start simiple, as simple as possible. You just need to save data persistently, nothing else, so why using a database with all the bells and whistles? I was positively shocked.

Then I implemented a very tiny application to help me learn more vocabulary in German. I started coding and eventually I had to implement persistence, so, just a simple table in some DBMW and... hold on, why so complicated? Why just not a simple text file?

I did so and now I have a very tiny simple, fast application that can be easily installed anywhere :).

Comments are closed.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.