Loading [MathJax]/extensions/TeX/AMSsymbols.js
Jump to content

Recommended Posts

Posted (edited)

When you are stucked with the following situtaion :

  • When you need to feed your system huge amount of data , say 200 MB or more of data once a month or twice a month then what is your solution?
  • If you need to convert the above amoun of data from txt file or excell file and put it into any data base server say Mysql, MsSql, Postgress Sql then?
  • At the time of converting data, your system have to check multiple condition before inserting into data base of that amount of data then?
  • If your requirment does not supported by tool like mysqlworkbench or dbForge then how you accomplish your task at the time of huge data conversion and transfer on the fly?
  • When you need hybrid technology that will be supported by web as well as desktop then?

 

Guys I am expecting your thougts ideas.

Please give few minutes and put your ideas. I want to know when you are in this situation then what is your solution?

Edited by Samiul Haque
Posted (edited)
  On 2/13/2014 at 10:50 AM, Samiul Haque said:

When you need to feed your system huge amount of data , say 200 MB or more of data once a month or twice a month then what is your solution?

200 MB of text file is not much.

Or do you meant 200 GB?

 

If it's 200 MB, you can load it in a few seconds entire to memory in C/C++ code..

 

If you're seeking for high performance and file size is not an issue, remember to NOT using

while( !feof( file ) ) { fgets() } loop

which is loading one row at a time.

200 MB will have millions of lines, and it'll be very slow method.

Load file at once, then process from memory.

Reading chars from memory is fast, even faster if data is in CPU cache. Reading row by row - slow performance.

Make a test, and you will see on your own eyes.

 

There is a way to not have to load whole file at once - by using mapping file to memory functions. But it's OS dependent stuff. I am not sure whether you're writing for Windows. If so, here you have description of memory mapping functions:

http://msdn.microsoft.com/en-us/library/windows/desktop/aa366537(v=vs.85).aspx

 

  On 2/13/2014 at 10:50 AM, Samiul Haque said:

If you need to convert the above amoun of data from txt file or excell file and put it into any data base server say Mysql, MsSql, Postgress Sql then?

I don't know about excel, but txt the most natural obvious method is C/C++ program that is doing

while( !feof( file ) ) { fgets() } loop

but as I said it's slow.

If you don't mind spending couple minutes each time, go ahead use it.

But otherwise load whole file at once to memory.

Then search in loop for EOL from current memory location (f.e. use strchr() in C/C++).

If you will find it, you have whole row, copy it to some buffer, replacing EOL by 0.

The buffer will have the same what fgets() would normally return.

Then repeat with the rest of original file, incrementing current memory location to char right after EOL of previous row.

 

  On 2/13/2014 at 10:50 AM, Samiul Haque said:

At the time of converting data, your system have to check multiple condition before inserting into data base of that amount of data then?

I don't understand question. It's you who know what data file contains and what format your program will understand..

That's natural that program must check syntax of row of txt data to check whether it contains errors or not. Otherwise wrong data could trash database or crash application.

 

  On 2/13/2014 at 10:50 AM, Samiul Haque said:

If your requirment does not supported by tool like mysqlworkbench or dbForge then how you accomplish your task at the time of huge data conversion and transfer on the fly?

The most efficient way is writing C/C++ application which is using MySQL or other database functions (the most probably through linked library).

 

When whole file is in memory, and db engine supports multi-tasking, and checking syntax of row is taking a lot of time, you can even make program multi-threaded and speed it up even to the number of threads your CPU has (f.e. 8-12 times faster on Core i7+). Simply 1st thread will be analyzing rows in f.e. 1000000 rows, 2nd thread the next million rows, etc.etc, 12th thread will be analyzing rows with offset 11,000,000-11,999,999)

 

  On 2/13/2014 at 10:50 AM, Samiul Haque said:

When you need hybrid technology that will be supported by web as well as desktop then?

If PHP script has to work on 200 MB file, it sounds like nightmare for performance..

It'll be running hundred if not thousands times slower than C/C++ code, even using while( !feof() ) { fgets(); } loop.

 

Better would be writing CGI program (in C/C++) executed by PHP just to process data.

Edited by Sensei
Posted (edited)

@Sensei
I am really thrilled seeing your valuable reply. My concern is for web application and my portal is running on LAMP(Linux Apache Mysql Php)technology.I know 200 mb data processing in c/c++ is not a big deal but feeding 200mb or more data into a web application is still a problem.Even i need to check data at the time of conversion and transfer of .txt or .xls data to mysql data base. Example. txt tada coming in the form of .csv and .xls. So i need to map the field name of mysql table before inserting and checking data to create new data depending on the condition on the fly. So its not direct importing data. .xls file is not as simple of .txt file.

Remember the website is in cloud server not in a local machine.

Edited by Samiul Haque

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.