- Created Monday, May 16th 2016 @ 20:47:05
Since the competition is underway I was curious to hear some keys to peoples bots.
My bot performed around the 1580 range and the code can be found here: https://gitlab.com/amsully/block-battle-amsully
I used the following metrics with a single lookahead: - Average height (Average max height of each column) - Resulting height (subtract completed rows) + Completed rows (multiplied by a combo value) - Hole count (also played with non-linear holes) - Height of the block (height of the piece being placed) - bumpiness (defined in the blog post below) - aggregate height (defined in the blog post below) + adjacent blocks (number of sides that the placed block is touching.)
I ran a genetic algorithm to optimize these parameters. I ran a few styles of the GA but I settled on following what is described in this blog post: https://codemyroad.wordpress.com/2013/04/14/tetris-ai-the-near-perfect-player
I was starting to work on T-Spin but traveling got in the way of creating anything competitive. I am very curious of the heuristic people implemented to optimize this feature.
I feel with my bot was largely lacking long term planning and if I could restart I would have first focused on optimizing lookahead with a single feature (say 'max height') then implement more features rather than first optimizing features with a single lookahead and getting bogged down with parameter optimization.
- Created Monday, May 16th 2016 @ 21:12:40
And how did you use the GA? Some kind of tournament? Personaly I used two kinds of algoritms for optimization. For initial global optimization some kind of evolutionary algorithm was used. And then starting from the best solution so far obtained I ran https://en.wikipedia.org/wiki/CMA-ES this algorithm. The latter is one of the best general purpose optimization algorithms (this can be seen in black box optimization competitions).
Majority of my optimizations were performed at depth 2 (taking into account both pieces). I must notice that I completely neglect what my opponent is doing. In this regard my bot is highly suboptimal. It would be interesting to know if many bots care about their opponent actions. I had about 19 features in total and their linear combination is evaluation for the board. The purpose of optimization is to find those multipliers. My guess would be that majority of guys/gals used very similar approach, the difference being in the details.
- Created Monday, May 16th 2016 @ 21:40:11
yes I used a tournament that initially generated an array of random parameters that were passed to each bot in the population (these were weights applied to all of my features). I scored strictly based on win-losses where each bot would play a set number of matches against randomly selected bots in the population. Bottom 30% were removed, and a weighted average with selection (described in the post) was used to replace that 30%. It definitely worked better than hand tuning.
How long did your optimization algorithms take to run? I found I had to run my GA around 10 hours to get 25 iterations going for only 30 bots where each would play 5-10 matches per iteration (because by the 5th iteration my bots were always playing around 140-200 rounds which can take up to 20 seconds per game...my bot didn't go for combos very often so it resulted in these large matches).
- Created Monday, May 16th 2016 @ 21:48:14
@amsully - Nice post, I remember finding that blog post you mentioned and using some of his ideas to begin with. An interesting thing was that when I tuned my bot, it found that having a positive coefficient for aggregate height was better, not a negative one. In this version of tetris it pays off to build your board high and stack down for more points, so that makes sense.
@Tarokong - Yea my bot is similar in that it mostly ignores what my opponent is doing. I do however calculate the maximum number of lines my opponent could send at the end of each turn. Then I add this number to my current max height to calculate what my max height for the next turn could be. I am really curious as to how significant of an advantage someone can gain by keeping track of what their opponent does. To me, it felt like it would be difficult to take advantage of the knowledge of your opponent.
- Created Monday, May 16th 2016 @ 21:49:52
My optimizations took FOREVER. I actually used a 3-depth search for my optimizations and depending on how poorly I had implemented the new feature I was trying out, I would regularly let it run overnight and during work (so, ~16+ hours) and only end up with 8-12 iterations. I had a lot of problems with figuring out how to correctly choose the number of samples I used (unlike amsully, I did not do it tournament style, I constructed various fitness functions based on a single-player game), but eventually I figured out that I could start with only a few samples for each bot (I think I ended up with 8 to start) and then perform a t-test to see if they were statistically different enough, and if not, add more samples until they were.
I'm actually really interested in the CMA-ES algorithm TaroKong mentioned, though. I'm already using (very little of) Apache Math, which has an CMA-ES implementation. I just had no idea it it was there.
- Updated Monday, May 16th 2016 @ 22:03:20
@melsonator yep, I used apache math library.
@amsully. My optimization took ~24 hours (one time optimization). But to be fair I was using Spark cluster. Small cluster, total 78 cores. This gave me some advantage I guess. BTW, I was not playing tournaments most of the time. I was imitating the opponent by adding garbage lines at the bottom of my playing field and the objective was to maximize points.
- Created Monday, May 16th 2016 @ 22:13:57
@TaroKong, In my tests where was pretty high variance so I ended up having to do upwards of 100 runs per evaluation in order to have a reasonably consistent results. For your fitness function, how many runs did you average your total score over? I was never able to really decide if I could get away with only doing a few and letting it wash out in the stochastic nature of the optimization.
- Updated Monday, May 16th 2016 @ 22:25:08
Maybe I'm missing something, but does aggregate height really do much? Let's start with a playfield and figure the aggregate height. Next, place the piece anywhere that does not create a hole and does not clear a line. Is there any variation in the resulting scores among the different placements? Seems as though punishing this metric is only an indirect way to reward line clears and punish holes.
I believe bumpiness may have issues, as well. It is definitely true that it punishes bad playfields, but it is not very holistic. For example, row transitions and cumulative well sums (as implemented by Dellacherie) look at the complete playfield. However, bumpiness (and aggregate heights) only look at the surface. It cannot see below. When I played with these measures, I began to see perverse incentives emerge. For example, the bot might place an I-piece flat over a bunch of holes to cover up the bumpiness. In practice, this can be counter-balanced with other measures like holes. However, to me that's treating a symptom rather than fixing the core problem. In short, I believe cumulative well sums and row transitions do a much better job at what bumpiness aims to do.
Edit: On optimization, I used noisy cross entropy, as was done by Thiery and Scherrer in their papers. Instead of constant noise, I set the variance to a minimum of a third of the mean for every iteration. This was done because I found that the difference that weights could be for features necessary in this game type was much greater than the features necessary for standard survival play.
Probably just as important as a good optimization algorithm was finding a good fitness function. I always had 3 garbage rows and would either reward points, garbage cleared, or a measure of average height, (or a combination of all three). I rewarded different combinations differently, depending on which mode my bot was going to be in. I used several modes depending on the current in-game circumstances.With next piece factored in, 24 hours of optimization might yield 150 to 200 iterations.
- Created Monday, May 16th 2016 @ 22:22:09
@melsonator I was averaging over 200 games. There is no way to get meaningful results without huuuuuge averaging IMHO.
- Created Monday, May 16th 2016 @ 22:45:59
@caffeine. With aggregate height I get different scenario. Optimization shows that height must be rewarded. A few reasons may be: 1) You want to keep s-spin setup 2) You are waiting for clear of more than two lines.
- Updated Monday, May 16th 2016 @ 22:59:43
Searching: it starts with a full 3-ply search + 1 ply of all possible line clearing moves. Every 2-deep move combination is assigned a value during this search. Then it sorts these 2-move combinations and starts investigating each one using a full 4 ply search + 1 ply of all possible line clearing moves until it runs out of time. The search speed is around 3 million searched nodes/second on the server, and usually it can evaluate a few tens of 2-move combinations at 4 ply.
Evaluation: like most people report, I use a linear combination of features. I used particle swarm optimization to find the optimal coefficients. I have about 10 simple features that are fast to compute (probably too simple)
T-spins (and 4-line clearing): these are evaluated based on the notion of "waiting": if we are early in the game and we have a potential T-spin, we can affort to wait for a T-piece for many moves, so this is worth close to 10 points. But when the board is almost filled, we cannot afford to wait for the T for many moves and may need to clear lines and destroy the T-spin opportunity. So if BlockParty thinks it can wait only 1 move, this is worth 10/7 points, 2 moves waiting is worth (1-6/7 x 6/7) x 10 points, etc.
BlockParty neglects its opponent.
The weakest part of BlockParty is its evaluation function. I wanted to get rid of it completely and do monte carlo searches, simulating both its own game and the opponent for perhaps 10 moves and then compare the resulting positions, for example using number of holes. But the last month I had to work like crazy at my "real" job and found no time to work on this. A pity, because I am very curious to know if it would work.
- Created Monday, May 16th 2016 @ 23:01:17
I was also pondering MC, but did not try it. I doubt it would work. It is very tactically oriented game and MC search is not that good in these. For example, random playouts can easily miss some importance of t-spin. Of course, this is pure speculation from my side.
- Created Monday, May 16th 2016 @ 23:11:11
No, random playouts will probably not work, I wanted to use 1-ply + 1 clearing line search using my current evaluation function, which plays reasonably well (so I would not have gotten rid of it completely as I wrote). I estimated that I should be able to simulate a few thousand games per move that way, I don't know if that would be enough. My intuition is that in the end game BlockParty would really benefit.
- Updated Monday, May 16th 2016 @ 23:58:51
@Tarokong, my point was that it doesn't actually measure height. As you said with your #2 point, it's trying not to clear lines. In other words, rewarding aggregate height is an indirect way of rewarding not clearing lines. A clearer way to do this is, for example, to punish single line clears.
Edit: to clarify why aggregate height doesn't measure height, think about what it's actually measuring. It looks at the row index of the highest filled cell. The sum of these will always be the same no matter where you place a piece since the sum of piece heights for each column of the piece are the same for any orientation. For example, a flat I-piece has an "aggregate" of 4 over 4 columns. The vertical orientation has an aggregate of 4 also, but over one column. The exceptions are when a placement creates a hole or when it clears a line. That is why this measure punished line clears.
In order to actually measure aggregate heights you could do things like square it. Just keep in mind that this magnifies the line clear bias, which can cause serious problems.
- Created Tuesday, May 17th 2016 @ 00:08:32
@caffeine I agree with your line of thinking. Rewarding height help at not clearing lines when this leads to bad positions.
BTW, i have quite a philosophical viewpoint that you should not think in terms of punishing and rewarding. Much better strategy IMHO is to find as many factors as possible and to assign them weights. This has its cons also. The number of features increases and optimization may be stuck in local optima.