DaviFN Posted June 26, 2019 Posted June 26, 2019 I'm interested in the following problem: given a random number n (n can be gigantic), how do we find a pair function+input(s) whose output is n such that the input(s) are relatively small in size? This problems arises in data compression; consider the bits that make up a file (or a substring of bits of the file) and treat it as a number (i.e. the bits are the binary representation of this number). If we could write a pair function+input(s) whose output happens to be the substring, this whole substring can be replaced by the function+input(s). I've thought of expressing the number as sums (or differences) of relative big powers of prime numbers. Is this a good approach? And, if not, what would be a good one? And how to proceed? Motivation of the question: A simples function like raising the nth prime number to a power S can result (depending on the values of p and S) on various outputs, each of which is unique (given that any number has only one prime factorization). If we pick p = 17 and S = 89435, for example, that's computationally trivial to compute (takes logarithmic time), and will result in a somewhat gigantic number. We can then generate a file whose bits are the same of the binary representation of this number (or at least some of the bits are). (This is just a rough example). The problem is going the other way: Given a bit string (hence, a number), how to express this specific bitstring with less bits (very few, actually) through a function that results in the number. Any ideas/answers/comments are welcome!
Sensei Posted June 26, 2019 Posted June 26, 2019 (edited) Even if you will manage to make such compression algorithm, decompression will be fast, but compression will be very slow. Do you want to make pre-calculated array of primes in memory? How many primes do you want to store? 2 hours ago, DaviFN said: how to express this specific bitstring with less bits (very few, actually) through a function that results in the number. Classical approach of compression is to find pattern in data, and store indexes or so, to book of sub-strings. f.e. 0xFF00FF00 = %1111000011110000 we see there is repeated 0xFF and 0x00, and 0xF and 0x0, and 0xFF00, data stream %00 and book with 0xFF00, or data stream %0101 and book with 0xFF and 0x00, or data stream %00110011 and book with 0xF and 0x0. Additionally there can be variable number of bits for index in book. The more often repeated pattern, the shorter sequence of bits to store it in the stream. Such methods work especially good for txt file. The most used words as represented by the smallest number of bits in the stream. Edited June 26, 2019 by Sensei
DaviFN Posted June 26, 2019 Author Posted June 26, 2019 Sensei, the goal is compressing a specific file the most. Compression is indeed slow, but once compresses it's trivial to decompress. I'm kind of aware of how the vast majority of compression takes place; search for patterns, build a probabilistic model, represent more likely things with less units of storage. That's not my approach; I'm taking a more numerical approach that may work on any (specific) data. A pre-calculated array of primes would take too much space, but they could be stored on disk (focus is on decompression, not on compression). Compression's goal is to find a function and parameters to it such that the number that makes up the file is the output. Of course, there are infinite functions and arguments that satisfy this. Hence, we want one that doesn't take much space on disk. Is there any theory that states this approach is impossible/possible?
Sensei Posted June 26, 2019 Posted June 26, 2019 11 minutes ago, DaviFN said: A pre-calculated array of primes would take too much space, If you want to base compression algorithm on primes you must 1) precalculate them 2) calculate at run time. What else options do you have?! Programmers precalculate data to have quick access to them, as calculation at run time would take significant much more time.. 17 minutes ago, DaviFN said: the goal is compressing a specific file the most. ...but it must be done in less than minutes or seconds... if it would be taking hours, days, months, or years.. useless for anybody.. Sounds you have no experience in programming.. Maybe you should start from making brute-force algorithm finding primes, for as start? To see how many primes it's able to find in let's say 60 seconds of execution?
DaviFN Posted June 27, 2019 Author Posted June 27, 2019 (edited) Sensei, this can be useful even if compression takes a long tine. Primes and their power are cheap to compute. They are computed in logarithmic time regarding to the power. Famous big files used worldwide could be compressed with joint effort, thus lowering down cost of data storage. If this can be done, even if it takes a long time to compress, decompression will be fast. And I have started it. Running for three days so far, managed to get a power of a prime whose binary representation happens to be that of a substring of 34 bits of the famous Lena image. That's not relevant at all, hence I came here for math help. Edited June 27, 2019 by DaviFN
Endy0816 Posted June 27, 2019 Posted June 27, 2019 You're wanting lossless compression. https://en.m.wikipedia.org/wiki/Lossless_compression If you look at Limitations you'll find a proof for why some files would end up longer than they were originally.
DaviFN Posted June 27, 2019 Author Posted June 27, 2019 Endy0816, I do understand a general algorithm to compress everything doesn't exist. But, given a specific file, how can you compress it the most? That's the question.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now