Compression slow but good

fredreload · October 4, 2016

And here it is, the long waited algorithm thanks to Staphen that provided the algorithm in Crunchyroll. You are free to test it out on bigger files, currently it only runs for Blank.gif without going too slow. You are free to criticize only if you found a way to make this faster. Starting with iteratedbitcount. Thanks in advance!

using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Numerics;
using System.Threading.Tasks;

namespace HelloWorld
{
class Hello
{
public static byte[] bytes = System.IO.File.ReadAllBytes(@"E:\data\Blank.gif");

public static byte[] bytes_new;

static void Main()
{
BigInteger number_compress = new BigInteger(bytes);

Console.WriteLine(number_compress);
Console.WriteLine();

int[] one = new int[50];
int[] zero = new int[50];
int[] position = new int[50];

for (int i = 0; i < 50; i++)
{
//Console.WriteLine(i);

one = IteratedBitcount(number_compress);
zero = IteratedZerocount(number_compress);
position = one + zero;

BigInteger fac = Factorial(one);
BigInteger n = FindN(number_compress, one, position);

number_compress = n;

}

Console.WriteLine(number_compress);
Console.WriteLine();

BigInteger number_decompress = 0;

for (int i = 50-1; i >= 0; i--)
{
//Console.WriteLine(i);

number_decompress = FindNumber(number_compress, one, position);

number_compress = number_decompress;
}

Console.WriteLine(number_decompress);
Console.WriteLine();

bytes_new = number_decompress.ToByteArray();

File.WriteAllBytes(@"E:\data\New_Blank.gif", bytes_new);

}

static BigInteger FindNumber(BigInteger n, int one, int position)
{
BigInteger number=0;
int k = one;

for (int j = position-1; j >=0 ; j--)
{
BigInteger mult = j;
BigInteger num = 0;

for (BigInteger i = 1; i < (BigInteger)k; i++)
{
mult *= (j - i);
}

num = (mult / Factorial(k));

if (num <= n)
{
number = number + BigInteger.Pow(2, j);
k--;
n = n - num;
}
if (k == 0)
break;
}
return number;
}

static BigInteger FindN(BigInteger number, int one, int position)
{
BigInteger findMSB = number;
BigInteger test = number;
var numBits = (int)Math.Ceiling(BigInteger.Log(2));
BigInteger MSB=1;
while (findMSB > 0)
{
MSB <<= 1;
findMSB >>= 1;
}
MSB >>= 1;

BigInteger n = 0;
int k = one;
int pos = position;

while (MSB != 0)
{
if ((test & MSB) > 0)
{
int a = pos-1;
BigInteger mult = a;

for (BigInteger i = 1; i < (BigInteger)k; i++)
{
mult *= (a - i);
}

n += (mult/Factorial(k));
k--;

}
MSB >>= 1;
pos--;

}
return n;
}

static int IteratedBitcount(BigInteger n)
{
BigInteger test = n;
int count = 0;

while (test != 0)
{
if ((test & 1) == 1)
{
count++;
}
test >>= 1;
}
return count;
}

static int IteratedZerocount(BigInteger n)
{
BigInteger test = n;
int count = 0;

while (test != 0)
{
if ((test & 1)==0)
{
count++;
}
test >>= 1;
}
return count;
}

static BigInteger Factorial(int arg)
{
BigInteger value = 1;
for (int index = 2; index <= arg; index++)
{
//Console.WriteLine(index);
value *= index;
}
return value;
}

}
}

**Strange** · October 4, 2016

You are free to test it out on bigger files, currently it only runs for Blank.gif without going too slow.

What language is this written in?

How large is "blank.gif"?

Have you checked that the output of compressing and then decompressing the file is identical to the input?

Have you tested it with other files of a similar size (e.g. a sequence of random bytes)?

You are free to criticize only if you found a way to make this faster.

Not if we find it doesn't work?

Sensei · October 4, 2016

Again, and again, the same..

You have no idea whether it truly works, as you have no decompression function...

Writing off data to file, disallows you checking whether every single byte is equal to original (after compressing, then decompressing data).

Result from this line of code is nowhere used, as fac is nowhere else referenced:

BigInteger fac = Factorial(one);

What language is this written in?

C# .NET Framework Edited October 4, 2016 by Sensei

fredreload · October 5, 2016

What language is this written in?

How large is "blank.gif"?

Have you checked that the output of compressing and then decompressing the file is identical to the input?

Have you tested it with other files of a similar size (e.g. a sequence of random bytes)?

Not if we find it doesn't work?

Yes the output is identical to the input Strange . Gonna try optimizing this weekend, let me know if you can help lol

Again, and again, the same..

You have no idea whether it truly works, as you have no decompression function...

Writing off data to file, disallows you checking whether every single byte is equal to original (after compressing, then decompressing data).

Result from this line of code is nowhere used, as fac is nowhere else referenced:

BigInteger fac = Factorial(one);

C# .NET Framework

Cool, I'll take it out, thanks!

fredreload · October 5, 2016

This step is taking the most amount of my time. My best bet is a multiplication table stored in database, anyone else got any idea? The for loop here is taking too long, no for loop

for (int i = 1; i < k; i++)
{
    mult *= (a - (BigInteger)i);
}

timo · October 5, 2016

As far as I can see you are calculating mult = a*(a-1)*...*(a-k+1) in that loop. Special cases like k>a+1 and k<2 aside (consider if they can appear in your code), this is the same as mult=a!/k!. You already have a factorial function in your code: Use it, then optimize it with a lookup-table if needed. Note: You are dividing by another k! later on. You only need to calculate it once, of course.

**Strange** · October 6, 2016

I would still like to see some evidence that the algorithm works, and how well it works. What is the size of the inout file? What is the size of the compressed file? How does that change for (a) random data; (b) ordered data; (c ) all zeroes; etc. Is the result of uncompressing the data identical in all cases?

fredreload · October 9, 2016

Let's see, I got a question first, I converted bytes to BigInteger like this

BigInteger number_compress = new BigInteger(bytes.Concat(new byte[] { 0 }).ToArray());

Now the byte array is [378] byte array, as you know byte array is 8 bits each, so that's a total of 3024 bits. The thing is after conversion to number_compress, this BigInteger has 911 base 10 digits, so I'm not sure if BigInteger actually made the value bigger since I can't convert BigInteger to bits in c#. Below is the number.

10511498515079005786001923156952989929760765099887642049490645075407147670074787123453081119964496075798572683908765969937737368112628851278719151548358246532408860460103033076417881911156352613863717173123230737730516444716700786319847260734876596336573683014182274170198577808390323212579797450822781696155303651128091350419064337358263354652969443506081373456328170981909504950042936463577106765595222710454155651057937886003720893239530662179029448821080213785811445808269660735189910921681751787499664128734166678222596180132877088540155530232472178185615057339811168768357276447018187314658464330483545764600882883706505887000156889348127462631364561412835589414227905626001556124920721381180916353004195357239703395912173839712380569892467769276028873585749529029704900277393477308046170568306899216653561856937648186383356499772886802103617644079045242679876437789706198682378675257965584060895470702729

**Strange** · October 9, 2016

You can use the BigInterger.ToString method to convert it to hex. Extracting binary from hex is trivial.

fredreload · October 9, 2016

I used a StringBuilder to append the bits, the compression rate is much better. But I still need to deal with the time = =

As far as I can see you are calculating mult = a*(a-1)*...*(a-k+1) in that loop. Special cases like k>a+1 and k<2 aside (consider if they can appear in your code), this is the same as mult=a!/k!. You already have a factorial function in your code: Use it, then optimize it with a lookup-table if needed. Note: You are dividing by another k! later on. You only need to calculate it once, of course.

Hmm, the thing is it is (a - i), and it could be something like mult = 15*14*13/5!, I'm interested in using a look up table, possibly a database? But I am a newbie at that, how do you think I should start? Other than the look up table, is there other possible methods?

Btw I'm applying the recursive "a" computation I asked here.

As an example of the multiplication

BigInteger mult = 274341;

for (int j = 1; j < 157594; j++)
{
mult *= (BigInteger)(274341 - j);
}

P.S. Right I see it, mult=a!/k!, looks like I need a look up table

Edited October 9, 2016 by fredreload

Sign In

Compression slow but good

Recommended Posts

fredreload

Strange

Sensei

fredreload

fredreload

timo

Strange

fredreload

Strange

fredreload

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information