Samiul Haque Posted February 13, 2014 Posted February 13, 2014 (edited) When you are stucked with the following situtaion : When you need to feed your system huge amount of data , say 200 MB or more of data once a month or twice a month then what is your solution? If you need to convert the above amoun of data from txt file or excell file and put it into any data base server say Mysql, MsSql, Postgress Sql then? At the time of converting data, your system have to check multiple condition before inserting into data base of that amount of data then? If your requirment does not supported by tool like mysqlworkbench or dbForge then how you accomplish your task at the time of huge data conversion and transfer on the fly? When you need hybrid technology that will be supported by web as well as desktop then? Guys I am expecting your thougts ideas. Please give few minutes and put your ideas. I want to know when you are in this situation then what is your solution? Edited February 13, 2014 by Samiul Haque
Sensei Posted February 13, 2014 Posted February 13, 2014 (edited) When you need to feed your system huge amount of data , say 200 MB or more of data once a month or twice a month then what is your solution? 200 MB of text file is not much. Or do you meant 200 GB? If it's 200 MB, you can load it in a few seconds entire to memory in C/C++ code.. If you're seeking for high performance and file size is not an issue, remember to NOT using while( !feof( file ) ) { fgets() } loop which is loading one row at a time. 200 MB will have millions of lines, and it'll be very slow method. Load file at once, then process from memory. Reading chars from memory is fast, even faster if data is in CPU cache. Reading row by row - slow performance. Make a test, and you will see on your own eyes. There is a way to not have to load whole file at once - by using mapping file to memory functions. But it's OS dependent stuff. I am not sure whether you're writing for Windows. If so, here you have description of memory mapping functions: http://msdn.microsoft.com/en-us/library/windows/desktop/aa366537(v=vs.85).aspx If you need to convert the above amoun of data from txt file or excell file and put it into any data base server say Mysql, MsSql, Postgress Sql then? I don't know about excel, but txt the most natural obvious method is C/C++ program that is doing while( !feof( file ) ) { fgets() } loop but as I said it's slow. If you don't mind spending couple minutes each time, go ahead use it. But otherwise load whole file at once to memory. Then search in loop for EOL from current memory location (f.e. use strchr() in C/C++). If you will find it, you have whole row, copy it to some buffer, replacing EOL by 0. The buffer will have the same what fgets() would normally return. Then repeat with the rest of original file, incrementing current memory location to char right after EOL of previous row. At the time of converting data, your system have to check multiple condition before inserting into data base of that amount of data then? I don't understand question. It's you who know what data file contains and what format your program will understand.. That's natural that program must check syntax of row of txt data to check whether it contains errors or not. Otherwise wrong data could trash database or crash application. If your requirment does not supported by tool like mysqlworkbench or dbForge then how you accomplish your task at the time of huge data conversion and transfer on the fly? The most efficient way is writing C/C++ application which is using MySQL or other database functions (the most probably through linked library). When whole file is in memory, and db engine supports multi-tasking, and checking syntax of row is taking a lot of time, you can even make program multi-threaded and speed it up even to the number of threads your CPU has (f.e. 8-12 times faster on Core i7+). Simply 1st thread will be analyzing rows in f.e. 1000000 rows, 2nd thread the next million rows, etc.etc, 12th thread will be analyzing rows with offset 11,000,000-11,999,999) When you need hybrid technology that will be supported by web as well as desktop then? If PHP script has to work on 200 MB file, it sounds like nightmare for performance.. It'll be running hundred if not thousands times slower than C/C++ code, even using while( !feof() ) { fgets(); } loop. Better would be writing CGI program (in C/C++) executed by PHP just to process data. Edited February 14, 2014 by Sensei
Samiul Haque Posted February 15, 2014 Author Posted February 15, 2014 (edited) @SenseiI am really thrilled seeing your valuable reply. My concern is for web application and my portal is running on LAMP(Linux Apache Mysql Php)technology.I know 200 mb data processing in c/c++ is not a big deal but feeding 200mb or more data into a web application is still a problem.Even i need to check data at the time of conversion and transfer of .txt or .xls data to mysql data base. Example. txt tada coming in the form of .csv and .xls. So i need to map the field name of mysql table before inserting and checking data to create new data depending on the condition on the fly. So its not direct importing data. .xls file is not as simple of .txt file. Remember the website is in cloud server not in a local machine. Edited February 15, 2014 by Samiul Haque
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now