Jump to content

Recommended Posts

Posted (edited)

Hi!

 

My program, written in managed C++ .NET Framework (quick & dirty replacement for script), was processing a lot of data, using code:

	FileStream ^stream = gcnew FileStream( this->Filename, FileMode::Open );
	BufferedStream ^buffered_stream = gcnew BufferedStream( stream );
	

and later:

	int data;
	while( ( data = buffered_stream->ReadByte() ) != -1 )
	{
	[....]
	}
	

Example test file was processed in 18 seconds. The bigger file twice that number.

When we have thousands of such files, it will take "ages" to process them all..

While waiting over 4 hours to finish processing entire folder of such files, I had illumination (or rather impatience) how to quickly speed up process, without doing much changes to the code.. and changed these two above lines of code to:

	array<unsigned char> ^raw_data = File::ReadAllBytes( this->Filename );
	MemoryStream ^buffered_stream = gcnew MemoryStream( raw_data );
	

Nothing else has not been changed in the code.

Instead of ~18 seconds per file, now it is.. ~3 seconds per test file.. 600% speedup at least..

 

Best Regards!

ps. I wish you all great illuminations while programming.. If something runs slow, drink beer and instead of impatiently waiting for finish, improve your code..

Edited by Sensei
Posted (edited)

But why did you think it was a good idea to read whole files one byte at a time? That's just about never useful.

Edited by Thorham
Posted (edited)
1 hour ago, Thorham said:

But why did you think it was a good idea to read whole files one byte at a time? That's just about never useful.

It depends on what you are trying to read.

If file is too big (>= few GB of data) reading stream directly from disk (instead of loading it at once to memory) is the only option (general rule, fortunately files that I am processing are smaller than 10 MB, but generally they can be any size). So while making the real parser you have to make two methods of reading file 1) fast loading to memory entire file 2) slow loading directly from file. If you will make just 1st option, it can fail sometimes, if file is too big to be loaded entire to memory (out of memory, or too fragmented memory to allocate one large continuous block of memory). 2nd option will almost always work (except really critical conditions).

Reading entire line at a time is not an option in this file format (it's mixture of text and binary data, and you don't have idea where you are and what you're processing in advance).

Reading in blocks, say 1024 bytes, would introduce a lot of complications (EOF are optional after control strings=commands.. and in three different versions \r, \n and \r\n), and it was supposed to be quick (to write) and dirty parser, not full implementation (it would take probably half-year to write full implementation of this file format)..

BufferedStream class was supposed to make double buffer for programmer, read data in blocks and just give byte from buffer.

Maybe because I am seeking in stream so often it has to flush double-buffer..

Edited by Sensei
Posted

Code that reads a fixed number of bytes, including seeking, and then parses the partial data, isn't complicated, regardless of what's being parsed. Also, writing quick and dirty code may seem like a good idea sometimes, but I've found that it leads to frustration almost every time.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.