Jump to content

Recommended Posts

Posted

I'm interested in the following problem: given a random number n (n can be gigantic), how do we find a pair function+input(s) whose output is n such that the input(s) are relatively small in size?

This problems arises in data compression; consider the bits that make up a file (or a substring of bits of the file) and treat it as a number (i.e. the bits are the binary representation of this number). If we could write a pair function+input(s) whose output happens to be the substring, this whole substring can be replaced by the function+input(s).

I've thought of expressing the number as sums (or differences) of relative big powers of prime numbers. Is this a good approach? And, if not, what would be a good one? And how to proceed?

Motivation of the question: A simples function like raising the nth prime number to a power S can result (depending on the values of p and S) on various outputs, each of which is unique (given that any number has only one prime factorization). If we pick p = 17 and S = 89435, for example, that's computationally trivial to compute (takes logarithmic time), and will result in a somewhat gigantic number. We can then generate a file whose bits are the same of the binary representation of this number (or at least some of the bits are). (This is just a rough example). The problem is going the other way: Given a bit string (hence, a number), how to express this specific bitstring with less bits (very few, actually) through a function that results in the number.

Any ideas/answers/comments are welcome!

Posted (edited)

Even if you will manage to make such compression algorithm, decompression will be fast, but compression will be very slow. Do you want to make pre-calculated array of primes in memory? How many primes do you want to store?

2 hours ago, DaviFN said:

how to express this specific bitstring with less bits (very few, actually) through a function that results in the number. 

Classical approach of compression is to find pattern in data, and store indexes or so, to book of sub-strings.

f.e.

0xFF00FF00 = %1111000011110000

we see there is repeated 0xFF and 0x00, and 0xF and 0x0, and 0xFF00,

data stream %00 and book with 0xFF00, or

data stream %0101 and book with 0xFF and 0x00, or

data stream %00110011 and book with 0xF and 0x0.

Additionally there can be variable number of bits for index in book. The more often repeated pattern, the shorter sequence of bits to store it in the stream.

Such methods work especially good for txt file. The most used words as represented by the smallest number of bits in the stream.

 

Edited by Sensei
Posted

Sensei, the goal is compressing a specific file the most. Compression is indeed slow, but once compresses it's trivial to decompress.

I'm kind of aware of how the vast majority of compression takes place; search for patterns, build a probabilistic model, represent more likely things with less units of storage. That's not my approach; I'm taking a more numerical approach that may work on any (specific) data.

A pre-calculated array of primes would take too much space, but they could be stored on disk (focus is on decompression, not on compression). Compression's goal is to find a function and parameters to it such that the number that makes up the file is the output. Of course, there are infinite functions and arguments that satisfy this. Hence, we want one that doesn't take much space on disk.

Is there any theory that states this approach is impossible/possible?

Posted
11 minutes ago, DaviFN said:

A pre-calculated array of primes would take too much space,

If you want to base compression algorithm on primes you must 1) precalculate them 2) calculate at run time. What else options do you have?!

Programmers precalculate data to have quick access to them, as calculation at run time would take significant much more time..

17 minutes ago, DaviFN said:

the goal is compressing a specific file the most.

...but it must be done in less than minutes or seconds... if it would be taking hours, days, months, or years.. useless for anybody..

 

Sounds you have no experience in programming..

Maybe you should start from making brute-force algorithm finding primes, for as start? To see how many primes it's able to find in let's say 60 seconds of execution?

Posted (edited)

Sensei, this can be useful even if compression takes a long tine. Primes and their power are cheap to compute. They are computed in logarithmic time regarding to the power. Famous big files used worldwide could be compressed with joint effort, thus lowering down cost of data storage.

 

If this can be done, even if it takes a long time to compress, decompression will be fast.

And I have started it. Running for three days so far, managed to get a power of a prime whose binary representation happens to be that of a substring of 34 bits of the famous Lena image. That's not relevant at all, hence I came here for math help.

Edited by DaviFN
Posted

Endy0816, I do understand a general algorithm to compress everything doesn't exist. But, given a specific file, how can you compress it the most? That's the question.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.