visitor (0 QPoints)
  • FR
  • EN
  • NL
  • DE
  • ES
315 experts, 1193 registered users, 1659 questions already answered
European Experts Exchange, the very best site for high-quality IT solutions

New Improved Search!

 


05/10/2011 1h30 : Steve Jobs is dead, the father of Apple ][ is gone, we are all orphaned.

Languages :: Csharp :: .Net :: Read CSV file to array


By: Bernard France  Date: 21/11/2005 08:50:10  English  Points: 20 Status: Answered
Quality : Excellent
Hello,
I know how to read a file.
But what I don't know is how to split the contents?

As normal CSV format, the fields are separated by commas, but the actual data also contains commas. What should I do?

I need also to store the contents into a matrix (2D array).

Thanks
By: VGR Date: 21/11/2005 09:06:05 English  Type : Answer
Hi ;-)

Basically your logic is classical :
- read each line of the file - this is your first dimension of the final array
- split each line on the delimiter - this is your second dimension of the array

So as you said you knew how to read a file, only the second part is problematic. You probably understand that the solution has to do with Split() - like methods

You could use regEx.Matches() but this is unefficient

Classically, .Net programmers would use a custom CSV parser for this kind of line format : $100,"$2,350",a,15/04/1998
Example :

Regex r = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))"); string s = "\"a\",b,\"c, d, e\",,f"; string[] sAry = r.Split(s); for(int i=0;i < sAry.Length;i++) { Console.WriteLine(sAry); }


See :
link 1
link 2
link 3

Others propose classical VB6 code, but it fails miserably on your data as it contains commas in the litterals.

I have seen once someone[alex_developer] proposing to use this kind of ODBC code to solve elegantly the issue :

string ConnectionString = @"Driver={Microsoft Text Driver (*.txt; *.csv)};DBQ=c:\"; OdbcConnection connection = new OdbcConnection(ConnectionString); connection.Open(); OdbcCommand command = new OdbcCommand("Select * FROM Subscribers.csv", connection); OdbcDataReader reader = command.ExecuteReader(); while (reader.Read()) { // loop through every field for (int x = 0; x < reader.FieldCount; x++) { // add code for puting fields into array.... Console.WriteLine(reader.GetDataTypeName(x) + " - " + reader.GetValue(x)); } } // Close the connection connection.Close();



and I think this is right direction to go if you want ease of use, performance and reliability.

My own advice is even better if you have access to a native MySql driver. This is quite common nowadays.

Just read the CSV file in MySql with something like :

LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';



And your problem is gone ;-)

Afterwards, just read the contents of the table into your 2D-array and it's done. This way you also will have a logic (their names) for retrieval of the columns' values in stead of having to handle numerical nonsense indices in the second dimension.

This also enables you to sort, arrange, sub-select from the data read in stead of doing it in your .Net script. It will also be faster. You may even perform fast "LIKE 'XXX%'" operations on your textual column containing commas, something you would not be able to do as easily in "standard" .Net code

Best regards,

VGR

Do register to be able to answer

EContact
browser fav
page generated in 331.146960 milliseconds

Why Google AdSense ads ?

compteur
 Ranking-Hits PageRank for this page