Zero knowledge equals random NN weights?

Hrvoje1 · May 9, 2021

I was watching this video on youtube https://youtu.be/ig380wp10aQ?t=111 in which Gary Kasparov says that machines revealed so many secrets, and magic or mysteries of the game of chess are gone because you could see it through the lenses of computer and even an amateur can actually understand immediately what is happening at the chessboard thanks to the machine’s advice. There is another video that I cannot find anymore in which he is more specific and says that engines can explain what’s going on. And he is right of course, in the context of chess, every explanation is expressible first and foremost in the language of moves, which engines do speak, however, besides that, human mind tends to reason abstractly about it, create concepts expressible in natural language, mastering of which is something that people primarily refer to when they speak about “understanding chess”, otherwise everyone can understand if he or she is losing consistently, that is kind of obvious directly from the moves, but why exactly this happens requires another kind of explanation, in terms of these abstract concepts, which should be able to be illustrated concretely in the language of moves at the same time, for them to be valid and teachable.
And this abstract reasoning comes natural to people, so that even young children are able to create for themselves some of these concepts without ever being taught, within mere hours of playing. Two of such concepts are material and the value of pieces, they immediately understand that having more is generally speaking better than having less of it, that is almost an instinct, which they feel as a frustration when they lose material, giving something without gaining anything in return, while for example the concept of sacrifice is advanced and has to be learned, ie acquired after some more experience with the game, as it involves further concepts, those that are exchanged for material.
Anyway, as chess engines do not speak natural language, and are mainly agnostic about human abstract concepts, at least those modern, self taught ones (in the sense that these concepts are not built into them), there is a gap that we need to overcome if we want to translate their knowledge to something that is comprehensible to us, besides the moves, that we clearly see are superior to those that human player is able to produce. And there comes into play a software company like Decodea, their mission is to produce such translators, for various domains of human knowledge, the one for chess they have named DecodeChess. I investigated it a bit, by watching this video: https://youtu.be/-JpQEByxpzY
Obviously, when I speak about chess engines, I speak in terms of standard chess software architecture that is not monolithic, it identifies three main parts, an engine, which is responsible for analysis of positions (Stockfish, LCZero...), a graphical user interface, which is a front end application that accepts user input and provides output to the user (Arena, Scid vs. PC...), and a protocol (UCI, CECP) by which these two components communicate. The engine is pluggable into the user interface if the interface supports the protocol for which the engine is written. The translator/decoder, which is a separate component, sits in between and interprets the input moves, before presenting its results via user interface, consulting in the process its own repository of human knowledge, those abstract concepts and ideas on what constitutes efficient play, matching them with data received from the engine and from human players, recognizing tactical and strategical patterns and presenting them in the form of explanations written in natural language why are the moves suggested by the engine good, and those played by human not that good, when such is the case. So when it detects that pin is created or threatened it reports that as good for the side which created it, or threatens to do so, or when it sees that an open file is taken under control it reports that as good for the side which took it, etc. That is a correct approach, and not quite a trivial task, although, the objection from the guy who talked about Nimzowitsch rules and Steinitz rules is on spot, regardless of the fact that he did not use the best term to describe what he meant (he said rules, as people often do, but he meant abstract concepts and ideas on how to play efficiently), and regardless of whether the objection still stands or not.
Namely, if the machine learning process during which the translator is trained to recognize patterns is strictly supervised, unable to distil its own patterns from the data it receives from the engine, and update the previously mentioned repository with some new, inhuman knowledge, instead of just using existing for supervisory reference, then the objection still stands, because they did not upgrade it yet to that level, to enable unsupervised learning. I know it is easier said than done, but if DeepMind managed to produce MuZero, a program that not only finds out by itself how to play efficiently by the rules given, such as AlphaZero does too, but it also finds out by itself what are the rules of the game in the first place given the chance to play, I don’t see why Decodea would not be able to produce an enhanced decoder that would actually be able to extract new abstract chess knowledge by analyzing engine’s play and teach even human grandmasters some new abstract concepts and ideas, that seems like a comparable effort to me. I don’t know if I got it right, but from my layman point of view, the principal difference between AlphaZero and MuZero is that the former one has a built in legal_move_generator function and a function recognize_terminal_game_state (mate, stalemate, draw by insufficient material, draw by repetition...) which means it knows the rules completely in advance, prior to NN training, that serves only to enhance evaluate_position function, while the latter one utilizes NN training to build from scratch the first two functions, as well as to enhance the third. Actually, this is not right distinction, as the starting point for all three functions is zero knowledge, ie random NN weights, the important difference between these functions is that the first two can be learned perfectly, while for the third, the law of diminishing returns applies with respect to the number of NN training games, and possibly with respect to growing NN topology. Does it mean that the game rules should be somehow extractable from MuZero NNs into a human understandable code? Can the same thing be done with the knowledge of evaluating positions, and “decoded” into natural language?
Of course, there is a possibility that there is no such new abstract concept unknown to human, and the only reason why computers can play better is because they can apply more consistently the concrete ideas which present concretization of abstract ideas already known to human. And of course, computers are able to show new concrete ideas even to the best informed, most knowledgeable human players and they do that all the time, thanks to their superior capability to explore the game tree, which is vast, but in order to do just that, engines are sufficient, no need for decoders.

Unfortunately the current situation with Decodea decoder is still slightly worse, and even the initial intended more modest result of explaining the moves in terms of already known concepts is not yet fully achieved, let alone something more. I can compare the state of the art with the translation from English to Croatian by using Google Translate: the original English text is much more understandable to me than the produced Croatian text, and I am a native Croatian speaker. That is not helpful at all, except maybe to some native Croatian speakers who do not speak a word of English. They might gain at least a certain clue what the text is about, but to me it is actually confusing and annoying.
Let me illustrate my comparison with a second example of analyzed position in the same video, there is a summary that explains why is Nb4 a good move, this is because it:

threatens to play Nc2+
enables Bxf3+
allows playing Bxf3+ and prevents playing Qxf6
lures the white pawn to f3 and steps into a dangerous place

As none of this makes any sense if one fails to see that 17...Nb4 is actually an indirect checkmate threat, which does not allow 18.Qxf6 because of a forcing line 18...Nc2+ 19.Ke2 Ba6+ 20.c4 Bxc4#, and there is no better alternative to 18.cxb4, for example 18.Be2 Nc2+ 19.Kf1 Qxg5 20.Nxg5 Nxa1 is worse, and the commentary presented in the video does not explain that, I was not lazy and I opened an account at DecodeChess, and there I opened the same example myself in order to see if I can get that analysis by expanding the hidden text (pressing the yellow plus sign button on the right). And I cannot see that panel properly, all content, for some reason, but it seems that these lines are there, strangely scattered, in not particularly concise way as I presented them. And the text that is visible without expanding hidden panes, does not explain properly in human way the tactical idea of that complex combination: due to previously explained reasons black can make a clearing sacrifice of the knight on b4 (clears the path for his bishop), decoying sacrifice of the rook on d1 (lures the opponent’s king to that dangerous square), after which exchange of the bishop for the knight on f3 comes with check, and at the same time it removes the only white’s queen defender, so that black can pick it up in the next move, with a net material gain of queen for rook and knight, which in this position should be comfortable advantage for black. The combination is actually even longer, I did not mention the exchange of one pair of rooks on d1 in the middle of it.
Yes it is all there, recognized, and somehow mentioned, but not in a sufficiently succinct way, and the sentence “lures the white pawn to f3 and steps into a dangerous place” sounds more silly than explanatory. Credit is due to what’s been done, I hope it will get better, and I also hope constructive criticism could be accepted. But the main problem stays: if it can only explain things that I already understand, and things I could grasp by my self from direct communication with an engine, in that case that translator or decoder does not fully meet its purpose.
The development of chess engines is many years ahead of development of chess knowledge decoders due to several reasons, primary one being the fact that there is a large and vibrant community of chess engine developers, that organize chess engine competitions, with occasional inclusion of biggest players such as IBM and Google, while Decodea is not accompanied or challenged by a large community of active developers researching the same area, which is a pity, because what they do is as much important and exciting. An initiative by Herik, Herschberg, Marsland, Newborn and Schaeffer to establish a competition whose objective was to produce the best chess annotation software possible, died after a couple of years, regretfully. That was The Best Annotation Award, an annual contest described here: https://pure.uvt.nl/ws/portalfiles/portal/1239682/BEST____.PDF , if it was alive DecodeChess would be one of the main competitors there.
What constitutes a proper chess commentary was theorized in depth by David Levy, Ivan Bratko, Matej Guid, to name just a few among many others. The caveman approach to implementing that functionality in a computer program would be to read the input, not yet annotated game file, iterate through its moves by submitting them sequentially to the engine that is used, and when the difference between the quality of the move played and the quality of the best move available in the position reaches certain threshold, expressed in score unit of centipawns returned by the engine, detect that as a serious mistake that requires comment and print a principal variation which is also returned by the engine, as an annotation for any such move into the output annotated game file. The typical insufficiency of such a not sophisticated approach is that it misses refutations of alternative moves that might appear appealing to a superficial human analyzer, and that were not played, as well as explanations why was important to play those that were played, if reasons are not so obvious to a superficial human analizer. So for example in the analyzed position, after 17...Nb4 was played, such a simple annotator would fail to comment 18.Qxf6 simply because the move that was actually played, 18.cxb4, was the best available at that moment. The only way in this case to get the machine’s advice on why 18.Qxf6 is bad, is to ask the engine directly, which defeats the purpose of the annotator, because it fails to explain automatically all that is relevant, even tactical ideas, let alone strategical ones. In other words, such a program completely fails because it lacks the notion of obviousness, importance, and relevance. Even DecodeChess, which is much more sophisticated program misses some of that, when it reports that 17...Nb4 “lures the white pawn to f3 and steps into a dangerous place”. In both cases the problem is that the program is too rigid in making decisions on what to comment and how.
I know that explaining chess software architecture is not that important or relevant when we talk about DecodeChess, since it is an integrated product or unit, about which one doesn’t have to worry what to plug in it or what to plug it in. I explained it because I was annoyed by the fact that when I asked people online what chess annotating software would they recommend, some of them started to mention engines. Obviously it doesn’t matter if chess annotator is a standalone program with no other purpose, or if it is integrated into general purpose chess GUI, what matters is that using stronger chess engine will not solve the problem I just described, because the functionality in question is implemented in the annotator, not in the engine, so there is no point to mention them. To understand such things, one should always keep in mind the notion of separation of concerns between software components, and have a clear picture of their responsibilities.
Anyway, the attempt to extract the abstract knowledge was not connected only with engines as sources or oracles of that knowledge, but with endgame tablebases too, for example https://ailab.si/matej/doc/Deriving_Concepts_and_Strategies_from_Chess_Tablebases.pdf . The subject has a lot of history, but its future is actually more interesting. And a few mentioned concepts are just a tip of the iceberg of what actually exists in that game, and then some, as one can easily imagine, considering the fact that one can practice that immensely rich game whole his life, and still not be particularly good at it.
But chess is not only a game of logic and tactical and strategical planning, other factors are important, such as memory, visualization, focus or concentration. Although each chess player regardless of his or her strength has to have certain visualization capabilities in order to be able to analyze a few moves ahead, without actually moving pieces on the board (because rules do not allow that), that is immensely easier when you can look at the board. At least to an average person, not so much to a top grandmaster, but, can they explain how they acquired such an amazing skill, like being able to play blindfold? Saying that this presents a whole another level of visualization capability that although not required by standard rules, greatly helps in standard circumstances when looking at the table is allowed, doesn’t explain much how is this actually achieved. The only explanation offered by Alexander Grischuk https://youtu.be/B3SXVN6KSNc?t=1340 was that it came natural to him during his childhood, as it should to any future grandmaster, ie not a result of some conscious effort and systematic practice, while I tried to follow a couple of recipes offered by others, to no avail. So either these explanations were not good enough, or I did not follow them properly and on time, result is the same: I cannot memorize the board, just as Grischuk cannot speak Chinese although he tried to learn it. Which I know because he said that a few minutes before the moment I chose as a starting point for playing this video, when I pasted its link here. Before writing this essay, I did not know how to pass that information (Start at...) along with the link to a youtube video that would otherwise start from the beginning, at timestamp zero, and now not only I know that, but I am also fairly sure I can explain that to pretty much everyone interested, in several ways, depending on their prior knowledge. This is because explaining properly how to learn during adulthood a new language which is very different compared to your own native one, is much harder task than explaining properly how to add timestamp parameter to a youtube video link. Which is connected to the amount of information the explanation contains, and reliability of passing that information. And if we accept a “task” as a fundamental notion needed to explain nature, then “explanation” would be “information needed to accomplish a task”. Moreover, explanation is to a human the same thing as program is to a computer, instruction it can follow and execute. Of course one can argue this is just one aspect of explanation, not its full characterization, because one can follow instructions without fully understanding them.
Nevertheless, following that logic, every living organism that can pass useful information, can produce explanation, it is only a matter of surpassing a communication barrier between the one who tries to explain and the one who tries to understand. Right? David Deutsch seems to disagree, here in this TED interview https://www.ted.com/talks/the_ted_interview_david_deutsch_on_the_infinite_reach_of_knowledge/transcript?language=en#t-889066 , when asked by Chris Anderson:
“A lot of people would say, look, every species knows something. A dog knows that a bone tastes delicious, but it doesn't know scientific theory. We know a certain amount of scientific theory, but it's ridiculous to imagine that we could know you know, that there must be a whole world of things out there that we are never even in principle capable of understanding. And you dispute that. Why? Why?”
David Deutsch replied:
“I've already explained why the dog is inherently different from us. It's because the dog knows that the bone tastes good because some of its ancestors who didn't know that, died. And the dog doesn't actually know anything, its genes know that. And there are certain types of things that can become known that way. But the vast majority of things in the world, in the universe, cannot become known that way, because the dog cannot try to eat the Sun and be burned and that kind of thing.”
So, a lot of vague instructions present much higher barrier to understanding than a few precise ones. Actually, a lot of precise instructions given in precise order that can be reliably memorized are much easier to get acquired and applied than just one vague instruction, but in that case one can at least focus much more easily on removing the vagueness. This is the reason why it is possible to train a dog to sit on your command, or search for drugs, or search for a missing person, but it is impossible to make conversation like Doctor Dolittle. As some of that potentially saves human lives, it is actually odd to read such comments about dogs from a person who is obvious Nobel prize candidate.
This is also a reason why it is much easier to remove just one bug from the program, than several combined ones, if we consider debugging as a way of communication between human and computer, during which human tries to remove vagueness of instructions given to a computer. If bugs are isolated in their effect, then the effort to eliminate them is proportional to their number, if we assume they are equally hard to get eliminated.
One such vague instruction is that in order to learn a foreign language to the level of being able to speak fluently and sound like a native speaker at the same time, one should not only study it like it is taught in school, but learn it in the way small children learn their native language. Which sounds logical, but it also requires further explanation, how exactly is that performed by an adult? I have found that insight shared on the internet by the excellent polyglot Luca Lampariello, who demontrates validness of his methods as soon as he begins to speak. He is an Italian who speaks fluently Chinese and Russian, among a dozen of other languages, but he failed with Japanese, to a certain extent, what he describes as a failure I would most probably describe as a great achievement, if I was ever able to reach that level of fluency in any foreign language. I was impressed by the longitude of explanation of that failure, and by the steps taken to improve his skill, such as sessions with another guy, Matt Bonder, an American who managed to learn Japanese fluently. So, there is a lot to know about it, and make an introspection, how did we manage to learn those languages that we speak, and those things that we know in general?
Finally, there is maybe a crucial aspect of an explanation, captured by Deutsch when he says:
“Well, human-type creativity is different from the creativity of the biosphere, in that human creativity can form models of the world that say not only what will happen, but why. So an explanation, for example, is something that captures an aspect of the world that is unseen. So, explanations explain the seen in terms of the unseen”
This describes a scientist as someone who tries to decipher conjurer tricks performed by nature, but this is a subject for another essay, explanations in science. Although, if we assume that moves represent the seen, and abstract concepts the unseen, then we may conclude that there is no reason to make distinction between chess explanations and scientific explanations?

Edited May 9, 2021 by Hrvoje1

dimreepr · May 9, 2021

That's a lot of word's...

That, explains why you don't understand... 😉

Sign In

Zero knowledge equals random NN weights?

Recommended Posts

Hrvoje1

dimreepr

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information