Sensei Posted May 17, 2018 Posted May 17, 2018 (edited) Hi! My program, written in managed C++ .NET Framework (quick & dirty replacement for script), was processing a lot of data, using code: FileStream ^stream = gcnew FileStream( this->Filename, FileMode::Open ); BufferedStream ^buffered_stream = gcnew BufferedStream( stream ); and later: int data; while( ( data = buffered_stream->ReadByte() ) != -1 ) { [....] } Example test file was processed in 18 seconds. The bigger file twice that number. When we have thousands of such files, it will take "ages" to process them all.. While waiting over 4 hours to finish processing entire folder of such files, I had illumination (or rather impatience) how to quickly speed up process, without doing much changes to the code.. and changed these two above lines of code to: array<unsigned char> ^raw_data = File::ReadAllBytes( this->Filename ); MemoryStream ^buffered_stream = gcnew MemoryStream( raw_data ); Nothing else has not been changed in the code. Instead of ~18 seconds per file, now it is.. ~3 seconds per test file.. 600% speedup at least.. Best Regards! ps. I wish you all great illuminations while programming.. If something runs slow, drink beer and instead of impatiently waiting for finish, improve your code.. Edited May 17, 2018 by Sensei
Thorham Posted May 17, 2018 Posted May 17, 2018 (edited) But why did you think it was a good idea to read whole files one byte at a time? That's just about never useful. Edited May 17, 2018 by Thorham
Sensei Posted May 18, 2018 Author Posted May 18, 2018 (edited) 1 hour ago, Thorham said: But why did you think it was a good idea to read whole files one byte at a time? That's just about never useful. It depends on what you are trying to read. If file is too big (>= few GB of data) reading stream directly from disk (instead of loading it at once to memory) is the only option (general rule, fortunately files that I am processing are smaller than 10 MB, but generally they can be any size). So while making the real parser you have to make two methods of reading file 1) fast loading to memory entire file 2) slow loading directly from file. If you will make just 1st option, it can fail sometimes, if file is too big to be loaded entire to memory (out of memory, or too fragmented memory to allocate one large continuous block of memory). 2nd option will almost always work (except really critical conditions). Reading entire line at a time is not an option in this file format (it's mixture of text and binary data, and you don't have idea where you are and what you're processing in advance). Reading in blocks, say 1024 bytes, would introduce a lot of complications (EOF are optional after control strings=commands.. and in three different versions \r, \n and \r\n), and it was supposed to be quick (to write) and dirty parser, not full implementation (it would take probably half-year to write full implementation of this file format).. BufferedStream class was supposed to make double buffer for programmer, read data in blocks and just give byte from buffer. Maybe because I am seeking in stream so often it has to flush double-buffer.. Edited May 18, 2018 by Sensei
Thorham Posted May 18, 2018 Posted May 18, 2018 Code that reads a fixed number of bytes, including seeking, and then parses the partial data, isn't complicated, regardless of what's being parsed. Also, writing quick and dirty code may seem like a good idea sometimes, but I've found that it leads to frustration almost every time.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now