Found exact solution for TSP

October 13, 201312 yr

Hi,

My name is Ilya Gazman. I found an exact solution for Traveling salesmen problem. Currently the best implementation according to wiki is Held–Karp algorithm that solves the problem in time O(N^2 * 2^N).

I believe that my algorithm can do this in O(N*C * 2^N) when C is a bit smaller than 1. I need your help to officially approve my work and update the wiki page.

So if this is interesting you here is how I did it:

Lets say we want to solve a 6 cities route with the brout algorithm. There are (6-1)! options for that, we will need to test them all and return the shortest route founded. So it will look something like that(Cities names are: A, B, C, D, E, F):

Option 1 : A -> B -> C -> D -> E -> F -> A

Option 2 : A -> B -> C -> D -> F -> E -> A

Option 3 : A -> C -> B -> D -> E -> F -> A

Option 4 : A -> C -> B -> D -> F -> E -> A

.

Option 119

Option 120

Now I am saying that after calculating option 1, you can skip over options 2 and only calculate part of options 3 and 4. How do you do that? It's simple: When calculating option 1 you need to calculate what will be the shortest route starting from City D, finishing in City A, and going thru cities E, F. Now you can skip option 2 because you already calculated it in option 1. And when you start calculating options 3 and 4 you can stop when reaching City D, because you already know what would be the shortest route starting at city D, finishing in City A and going thru cities E, F.

This is the principle that I used in my algorithm. I run a brute algorithm and mapped all the sub results, those results are not sub routes, do not confuse there. They are just part of calculation that need to be done in order to find the shortest route. So each time I recognize I am doing the same calculation I used a solution from a map.

Here is an output of my algorithm running over 19 cities, (in the attached file you can find java code that implements it).




Source(19)  [10.0,65.0, 34.0,52.0, 37.0,98.0, 39.0,44.0, 43.0,37.0, 45.0,89.0, 66.0,79.0, 69.0,74.0, 7.0,76.0, 70.0,15.0, 77.0,27.0, 78.0,11.0, 78.0,13.0, 80.0,5.0, 81.0,38.0, 82.0,64.0, 87.0,7.0, 90.0,61.0, 93.0,31.0]

Finish MapEngine test after 321550 mills

Created: 20801457

Map(3)  Write    2448       Read     34272

Map(4)  Write    12240      Read     159120

Map(5)  Write    42840      Read     514080

Map(6)  Write    111384     Read     1225224

Map(7)  Write    222768     Read     2227680

Map(8)  Write    350064     Read     3150576

Map(9)  Write    437580     Read     3500640

Map(10) Write    437580     Read     3084270

Map(11) Write    352185     Read     2344256

Map(12) Write    245131     Read     1382525

Map(13) Write    135638     Read     570522

Map(14) Write    54320      Read     156758

Map(15) Write    15077      Read     27058

Map(16) Write    2809       Read     2087

Map(17) Write    306        Read     0

Map(18) Write    18         Read     0

Map(19) Write    1          Read     0



0) 295.5947584525372>   [10.0,65.0, 34.0,52.0, 39.0,44.0, 43.0,37.0, 70.0,15.0, 78.0,13.0, 78.0,11.0, 80.0,5.0, 87.0,7.0, 77.0,27.0, 93.0,31.0, 81.0,38.0, 90.0,61.0, 82.0,64.0, 69.0,74.0, 66.0,79.0, 45.0,89.0, 37.0,98.0, 7.0,76.0, 10.0,65.0]

Source(19) is the input cities. It took my PC 321550 mills to calculate, (about 5 minutes). Created: 20801457 represent the number of atomic actions that the algorithm performed(about 20M actions). Map(3) speaks about the number of maps with 3 cities that been created. It created 2448 3 cities maps and used them 34272 times.

To calculate the efficiency of my algorithm all you need to do is to sum all the maps that he produce, then you will get the answer.

So the number of maps that my algorithm will produce with K cities size in N cities route will be: The number of times I can select the first city of my map: N, multiplies the number of times I can choose different selection of my cities from the remaining cities: (n-1)! / ((n - k - 1)! * (k-1)!). Thas come to n! / ((n - k - 1)! * (k-1)!). Assuming that creating a map of size 3 is an atomic action, then my algorithm efficiency will be the sum of all those maps.

So my algorithm have the next efficiency.



N * (N - 1) * (N - 2) / 2! + N * (N - 1) * (N - 2) * (N - 3) / 3! + N * (N - 1) * (N - 2) * (N - 3) (N -4) / 4! + ... N! / (N - 1)! = N * (N - 1) * (N - 2) / 2! + N * (N - 1) * (N - 2) * (N - 3) / 3! + N * (N - 1) * (N - 2) * (N - 3) (N -4) / 4! + ... N

Now lets solve this efficient algorithm with N from 7 to 100, and compare it to the previous results(result of N = 9 with N =8, result of N = 24 with N = 23). I found out that for big numbers of N the comparison result is 2. Then I did the same with the traditional dynamic programing algorithm efficiency. Here is the list of what I got:




7   2.55769     2.72222     2.98397 

8   2.40601     2.61224     2.74973 

9   2.31562     2.53125     2.60507 

10  2.2582      2.46913     2.50912 

11  2.21972     2.42        2.44169 

12  2.19258     2.38016     2.39191 

13  2.17251     2.34722     2.35356 

14  2.15701     2.31952     2.32293 

15  2.14456     2.29591     2.29774 

16  2.13424     2.27555     2.27652 

17  2.12548     2.25781     2.25832 

18  2.1179      2.24221     2.24248 

19  2.11124     2.22839     2.22853 

20  2.10533     2.21606     2.21614 

21  2.10003     2.205       2.20503 

22  2.09525     2.19501     2.19503 

23  2.09091     2.18595     2.18596 

24  2.08696     2.17769     2.17769 

25  2.08333     2.17013     2.17014 

26  2.08        2.1632      2.1632 

27  2.07692     2.1568      2.1568 

28  2.07407     2.15089     2.15089 

29  2.07142     2.1454      2.1454 

30  2.06896     2.1403      2.1403 

31  2.06666     2.13555     2.13555 

32  2.06451     2.13111     2.13111 

33  2.0625      2.12695     2.12695 

34  2.0606      2.12304     2.12304 

35  2.05882     2.11937     2.11937 

36  2.05714     2.11591     2.11591 

37  2.05555     2.11265     2.11265 

38  2.05405     2.10956     2.10956 

39  2.05263     2.10664     2.10664 

40  2.05128     2.10387     2.10387 

41  2.05        2.10125     2.10125 

42  2.04878     2.09875     2.09875 

43  2.04761     2.09637     2.09637 

44  2.04651     2.0941      2.0941 

45  2.04545     2.09194     2.09194 

46  2.04444     2.08987     2.08987 

47  2.04347     2.0879      2.0879 

48  2.04255     2.08601     2.08601 

49  2.04166     2.0842      2.0842 

50  2.04081     2.08246     2.08246 

51  2.04        2.0808      2.0808 

52  2.03921     2.0792      2.0792 

53  2.03846     2.07766     2.07766 

54  2.03773     2.07618     2.07618 

55  2.03703     2.07475     2.07475 

56  2.03636     2.07338     2.07338 

57  2.03571     2.07206     2.07206 

58  2.03508     2.07079     2.07079 

59  2.03448     2.06956     2.06956 

60  2.03389     2.06837     2.06837 

61  2.03333     2.06722     2.06722 

62  2.03278     2.06611     2.06611 

63  2.03225     2.06503     2.06503 

64  2.03174     2.06399     2.06399 

65  2.03125     2.06298     2.06298 

66  2.03076     2.06201     2.06201 

67  2.0303      2.06106     2.06106 

68  2.02985     2.06014     2.06014 

69  2.02941     2.05925     2.05925 

70  2.02898     2.05839     2.05839 

71  2.02857     2.05755     2.05755 

72  2.02816     2.05673     2.05673 

73  2.02777     2.05594     2.05594 

74  2.02739     2.05516     2.05516 

75  2.02702     2.05441     2.05441 

76  2.02666     2.05368     2.05368 

77  2.02631     2.05297     2.05297 

78  2.02597     2.05228     2.05228 

79  2.02564     2.05161     2.05161 

80  2.02531     2.05095     2.05095 

81  2.025       2.05031     2.05031 

82  2.02469     2.04968     2.04968 

83  2.02439     2.04907     2.04907 

84  2.02409     2.04848     2.04848 

85  2.0238      2.0479      2.0479 

86  2.02352     2.04733     2.04733 

87  2.02325     2.04678     2.04678 

88  2.02298     2.04624     2.04624 

89  2.02272     2.04571     2.04571 

90  2.02247     2.04519     2.04519 

91  2.02222     2.04469     2.04469 

92  2.02197     2.04419     2.04419 

93  2.02173     2.04371     2.04371 

94  2.0215      2.04324     2.04324 

95  2.02127     2.04277     2.04277 

96  2.02105     2.04232     2.04232 

97  2.02083     2.04188     2.04188 

98  2.02061     2.04144     2.04144 

99  2.0204      2.04102     2.04102 

100 2.0202      2.0406      2.0406

Column 1 is N, column 2 is my algorithm efficiency compare, column 3 is dynamic programming algorithm compare and column 4 is my algorithm efficiency multiply N compare.

See how column 3 and 4 are almost the same. This is how I found it. Please verify my work, take a look at the code, tell me if you agree or not with me. If not please show me where my algorithm or my math is not working by exact sample.

CityMapSample2.zip

October 14, 201312 yr

If city has location x,y

then distance between two cities is

sqrt((city1.x-city2.x)^2+(city1.y-city2.y)^2)

sqrt() operation is slow to cpu.

It might be possible to get ride of it during the main calculation because

sqrt(x)>sqrt(y)

is returning the same as

x>y

We are interested what is greater after all, not what is exact value, isn't?

sqrt() use in final calculation of path length for user, where speed is not important.

pow(x,y) with any y is also slow.

For such simple thing as ^2 I would never thought about using pow(), simply x*x

private double mathOption(City city1, City city2){
double powerX = Math.pow(city1.x - city2.x, 2);
double powerY = Math.pow(city1.y - city2.y, 2);
return Math.sqrt(powerX + powerY);
}

Change to:

private double mathOption(City city1, City city2){
double dx = city1.x - city2.x;
double dy = city1.y - city2.y;

dx *= dx;

dy *= dy;
return ( dx + dy );
}

And try whether it's running faster.

Did you thought about precalculating data?

If you have x cities, you can allocate x^2 distance table and fill it at beginning

And then reuse instead of calculating over and over again distance from city to city[j].

But this would require using index to city, instead of City object.

distance=distances[ i*count+j ]

i = 0...count-1

j = 0...count-1

Dynamic tables are slow, they require reallocating and copying data.

In the worstest implementation of dynamic table, it's extended every added element to array. Inserting element at beginning must end up with copying and pasting the all elements (the only way to get ride of it, is using double linked lists. I doubt java uses them).

If we know how many elements table will have (city count), better is to allocate static table and then fill, instead of relying on dynamic table/array.

(I noticed this in Path class)

Edited October 14, 201312 yr by Sensei

October 14, 201312 yr

Author

Thank you Quark, for reviewing the code. This indeed will improve the speed. As you probably saw more improvements can be done. This is just a sample and it's purpose to explain the algorithm. This is why I used pow. Another improvement could be working with integers, instead of doubles. And only use double for the full calculation(Of course it will require more code and testing if this is possible, for the current given input of the cities, and I think it will be possible in 99% of the cases).

Another problem of the code, is that it is using Ram memory and not database, even so db is slower, it will allow to perform bigger calculation. On my debugger, I am getting out of memory for N = 20;

October 14, 201312 yr

Another problem of the code, is that it is using Ram memory and not database, even so db is slower, it will allow to perform bigger calculation. On my debugger, I am getting out of memory for N = 20;

The problem is that Path and therefor PathList contain copies of the all Cities.

If you have ArrayList<Object> it's taking count * sizeof( Object ) memory. sizeof( City ) = 20, so count * 20 for each Path at least.

You really don't want whole structure of City copied there. What you need is just 32 bit index, 4 bytes long, to City from table of cities.

It will reduce immediately memory requirement to 20% what you have now.

Using unsigned short, 16 bit, would reduce number of cities to 16384, and memory requirement to 10% of now.

October 14, 201312 yr

Author

Its not working this way. You should read about shallow copy. I do not copy the cities but only the pointers.

October 14, 201312 yr

Just a sub-sub-detail: integers are often slower than doubles or floats. It's often the case for multiplication - just because Cpu designers don't pay much attention to integers - and always for division, because integer division uses an exact algorithm which take about one cycle per bit (or slightly less now) while double division and sqrt are approximate and iterative algorithms that double the accurate bits at each iteration.

Presently, Cpu take one cycle per multiply or add on doubles (or even on double pairs, some Cpu on double quadruplets) while an int multiply can need several cycles. And did I read that since the Core2, integer computations are made by the Sse hardware? Though, if building hardware purposely, integers do save gate counts and may be faster, sure.

Sorry for interrupting, I understand operation speed is not a part of the algorithm's intrinsic speed. Any improvement on such a well-known (...and useless!) problem would be remarquable.

October 14, 201312 yr

Pointer on 64 bit machine is 8 bytes,

while 16 bit index is 2 bytes,

so it's 4x less memory.

October 15, 201312 yr

Author

Organism you are the man! I had no idea about anything of this.

I just posted my second part of the algorithm. You are welcome to read it. Also there is a link to chat with me one of the post comments.

http://cs.stackexchange.com/questions/16076/traveling-salesman-heldkarp-algorithm-big-improvement

October 15, 201312 yr

What said Enthalpy is about SSE extension: it's used when you enable it in C/C++ compiler, and get used usually when you are using vector math, f.e. adding/subtracting/multiplying vector with 4 elements by other vector with 4 elements. Basically execute instruction doing multiple maths in one.

http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

Visual Studio 2008 Express is supporting only no-SSE, SSE1, SSE2.

I am not sure how it's with newer Express and commercial Visual Studio.

But you're using Java, not C/C++.

From experience with very math heavy ray-tracing engine I can say you that speed difference between no-SEE and SSE1 compiled project was 8% and between SSE1 and SSE2 there was also 8% (no-SSE -> SSE2 ~15%).

It was tested by compiling project after switching compiler option and trying it to render the same 3d scene.

SSE are truly useful when you have a lot of data to process, and they are in very simple form, like adding-subtracting-multiplying one image from another image (so there is continuous array of RGB/RGBA pixels), and programmer will call SSE processor functions manually.

Sign In

Found exact solution for TSP

Featured Replies

Archived

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)