BlackBird Suite¶
Blackbird module¶

class
Blackbird.
ExampleState
(evaluation, policy, board, player=None)[source]¶ Bases:
object
Class which centralizes the data structure of a game state.
GameState objects have many properties, but only a few of them are relevant to training. ExampleState provides an interface on top of protocol buffers for reading and storing game state data.

`MctsPolicy`
A numpy array which holds the policy generated from applying MCTS.

`MctsEval`
A float between 1 and 1 representing the evaluation that the MCTS computed.

`Board`
A numpy array which holds the input state for a board state. In general, this is the game state, as well as layers for historical positions, current turn, current player, etc.

`Player`
An optional integer, representing the current player.


Blackbird.
GenerateTrainingSamples
(model, nGames, temp)[source]¶ Generates selfplay games to learn from.
This method generates nGames selfplay games, and stores the game states in a local sqlite3 database.
Parameters:  model – The Blackbird model to use to generate games
 nGames – An int determining the number of games to generate.
 temp – A float between 0 and 1 determining the exploration temp for MCTS. Usually this should be close to 1 to ensure high move exploration rate.
Raises: ValueError
– nGames was not a positive integer.

class
Blackbird.
Model
(game, name, mctsConfig, networkConfig={}, tensorflowConfig={})[source]¶ Bases:
DynamicMCTS.DynamicMCTS
,Network.Network
Class which encapsulates MCTS powered by a neural network.
The BlackBird class is designed to learn how to win at a board game, by using Monte Carlo Tree Search (MCTS) with the tree search powered by a neural network. :param game: A GameState object which holds the rules of the game
BlackBird is intended to learn.Parameters:  name – The name of the model.
 mctsConfig – JSON config for MCTS runtime evaluation
 networkConfig – JSON config for creating a new network from NetworkFactory
 tensorflowConfig – Configuaration for tensorflow initialization

GetPriors
[source]¶ Returns BlackBird’s policy of a supplied position.
BlackBird’s network will evaluate the policy of a supplied position.
Parameters: state – A GameState object which should be evaluated. Returns:  A list of floats of size len(state.LegalActions())
 which sums to 1, representing the probabilities of selecting each legal action.
Return type: policy

SampleValue
[source]¶ Returns BlackBird’s evaluation of a supplied position.
BlackBird’s network will evaluate a supplied position, from the perspective of player.
Parameters:  state – A GameState object which should be evaluated.
 player – An int representing the current player.
Returns:  A float between 0 and 1 holding the evaluation of the
position. 0 is the worst possible evaluation, 1 is the best.
Return type: value

Blackbird.
TestGood
(model, temp, numTests)[source]¶ Plays the current BlackBird instance against a standard MCTS player.
Game statistics are logged in the local data/blackbird.db database.
Parameters:  model – The Blackbird model to test
 temp – A float between 0 and 1 determining the exploitation temp for MCTS. Usually this should be close to 0.1 to ensure optimal move selection.
 numTests – An int determining the number of games to play.
Returns:  wins: The number of wins model had.
 draws: The number of draws model had.
 losses: The number of losses model had.
Return type: A dictionary holding

Blackbird.
TestModels
(model1, model2, temp, numTests)[source]¶ Base function for playing a BlackBird instance against another model.
Parameters:  model1 – The Blackbird model to test.
 model2 – The model to play against.
 temp – A float between 0 and 1 determining the exploitation temp for MCTS. Usually this should be close to 0.1 to ensure optimal move selection.
 numTests – An int determining the number of games to play.
Returns: An integer representing a win (1), draw (0), or loss (1)

Blackbird.
TestPrevious
(model, temp, numTests)[source]¶ Plays the current BlackBird instance against the previous version of BlackBird’s neural network.
Game statistics are logged in the local data/blackbird.db database.
Parameters:  model – The Blackbird model to test
 temp – A float between 0 and 1 determining the exploitation temp for MCTS. Usually this should be close to 0.1 to ensure optimal move selection.
 numTests – An int determining the number of games to play.
Returns:  wins: The number of wins model had.
 draws: The number of draws model had.
 losses: The number of losses model had.
Return type: A dictionary holding

Blackbird.
TestRandom
(model, temp, numTests)[source]¶ Plays the current BlackBird instance against an opponent making random moves.
Game statistics are logged in the local data/blackbird.db database.
Parameters:  temp – A float between 0 and 1 determining the exploitation temp for MCTS. Usually this should be close to 0.1 to ensure optimal move selection.
 numTests – An int determining the number of games to play.
Returns:  wins: The number of wins model had.
 draws: The number of draws model had.
 losses: The number of losses model had.
Return type: A dictionary holding

Blackbird.
TrainWithExamples
(model, batchSize, learningRate, teacher=None)[source]¶ Trains the neural network on provided example positions.
Provided a list of example positions, this method will train BlackBird’s neural network to play better. If teacher is provided, the neural network will include a crossentropy term in the loss calculation so that the other network’s policy is incorporated into the learning.
Parameters:  model – The Blackbird model to train
 examples – A list of TrainingExample objects which the neural network will learn from.
 teacher – An optional BlackBird object whose policy the current network will include in its loss calculation.
Connect4 module¶
DynamicMCTS module¶
FixedMCTS module¶
GameState module¶
MCTS module¶

class
MCTS.
MCTS
(explorationRate, timeLimit=None, playLimit=None, **kwargs)[source]¶ Bases:
object
Base class for Monte Carlo Tree Search algorithms.
Outlines all the necessary operations for the core MCTS algorithm. _findLeaf() will need to be overriden to avoid a NotImplemenetedError.

TimeLimit
¶ The default max move time in seconds.

PlayLimit
¶ The default number of positions to evaluate per move.

ExplorationRate
¶ The exploration parameter for MCTS.

Root
¶ The Node object representing the root of the MCTS.

AddChildren
(node)[source]¶ Expands a node and adds children, actions and priors.
Given a node, MCTS will evaluate the node’s children, if they exist. The evaluation and prior policy are supplied in the creation of the child Node object.
Parameters: node – A Node object to expand.

FindMove
(state, temp=0.1, moveTime=None, playLimit=None)[source]¶ Finds the optimal move in a position.
Given a game state, this will use a Monte Carlo Tree Search algorithm to pick the best next move.
Parameters:  state – A GameState object which the function will evaluate.
 temp – A float determining the temperature to apply in move selection.
 moveTime – An optional float determining the allowed search time.
 playLimit – An optional float determining the allowed number of positions to evaluate.
Returns:  A tuple providing, in order…
 The board state after applying the selected move
 The decided value of input state
 The probabilities of choosing each of the children
Raises: TypeError
– state was not an object of type GameState.ValueError
– The function was not able to determine a stop time.

GetPriors
(state)[source]¶ Gets the array of prior search probabilities.
This is the default GetPriors for MCTS. The return value is always an array of ones. This should be overridden to get actual utility.
Parameters: state – A GameState object to get the priors of. Returns: A numpy array of ones of shape [num_legal_actions_of_state].

MoveRoot
(state)[source]¶ This is the public API of MCTS._moveRoot.
Move the root of the tree to the provided state. Use this to update the root so that tree integrity can be maintained between moves if necessary. Does nothing if Root is None, for example after running DropRoot().
Parameters: state – A GameState object which self.Root should be updated to.

ResetRoot
()[source]¶ Set self.Root to the appropriate initial state.
Reset the state of self.Root to an appropriate initial state. If self.Root was already None, then there is nothing to do, and it will remain None. Otherwise, ResetRoot will apply an iterative backup to self.Root until its parent is None.

SampleValue
(state, player)[source]¶ Samples the value of a state for a specified player.
This applies a set of Monte Carlo random rollouts to a state until a game terminates, and returns the determined evaluation.
Parameters:  state – A GameState object which the function will obtain the evaluation of.
 player – An integer representing the current player in state.
Returns:  A float representing the value of the state. It is 0 if it was
determined to be a loss, 1 if it was determined to be a win, and 0.5 if it was determined to be a draw.

_applyAction
(state, action)[source]¶ Applies an action to a provided state.
Parameters:  state – A GameState object which needs to be updated.
 action – An int which indicates the action to apply to state.

_backProp
(leaf, stateValue, playerForValue)[source]¶ Backs up a value from a leaf through to self.Root.
Given a leaf node and a value, this function will backpropogate the value to its parent node, and propogate that all the way through the tree to its root, self.Root
Parameters:  leaf – A Node object which is the leaf of the current tree to apply backpropogation to.
 stateValue – The MCTScreated evaluation to backpropogate.
 playerForValue – The player which stateValue applies to.

_findLeaf
(node, temp)[source]¶ Applies MCTS to a supplied node until a leaf is found.
Parameters: node – A Node object to find a leaf of.

_moveRoot
(state)[source]¶ Updates the root of the tree.
Move the root of the tree to the provided state. Use this to update the root so that tree integrity can be maintained between moves if necessary. Does nothing if Root is None, for example after running DropRoot().
Parameters: state – A GameState object which self.Root should be updated to.

_runMCTS
(temp, endTime=None, nPlays=None)[source]¶ Run the MCTS algorithm on the current Root Node.
Given the current game state, represented by self.Root, a child node is seleted using the _findLeaf method. This method will apply temp to all child node move selection proportions, compute the sampled value of the action, and backpropogate the value through the tree.
Parameters:  temp – A float determining the temperature to apply in FindMove.
 endTime – (optional) The maximum time to spend on searching.
 nPlays – (optional) The maximum number of positions to evaluate.

_selectAction
(root, temp, exploring=True)[source]¶ Chooses an action from an explored root.
Selects a child of the root using an upper confidence interval. If you are not exploring, setting the exploring flag to false will instead choose the one with the highest expected payout  ignoring the exploration/regret factor.
Parameters:  root – A Node object which must have children Nodes.
 temp – The temperature to apply to the children Node visit counts. If temp is 0, _selectAction will return the child Node with the greatest visit count.
 exploring – A boolean toggle for overriding the selection type to a simple argmax. If True, _selectAction will return the child Node with the greatest visit count.
Returns: An int representing the index of the selected action.
Return type: choice


class
MCTS.
Node
(state, legalActions, priors, **kwargs)[source]¶ Bases:
object
Base class for storing game state information in tree searches.
This is the abtract tree node class that is used to cache/organize game information during the search.

State
¶ A GameState object holding the Node’s state representation.

Value
¶ A float holding the Node’s state valuation.

Plays
¶ A counter holding the number of times the Node has been used.

LegalActions
¶ An int holding the number of legal actions for the Node.

Children
¶ A list of Nodes holding all legal states for the Node.

Parent
¶ A Node object representing the Node’s parent.

Priors
¶ A numpy array of size [num_legal_actions] that holds the Node’s prior probabilities. At instantiation, the provided prior is filtered on only legal moves.

_childWinRates
¶ A numpy array of size [num_legal_actions] used for storing the win rates of the Node’s children in MCTS.

_childPlays
¶ A numpy array of size [num_legal_actions] used for storing the play counts of the Node’s children in MCTS.

ChildPlays
()[source]¶ Samples the play rate of each child Node object.
Samples the play rates for each of the Node’s children. Not helpful if none of the children have been evaluated in MCTS.
Returns: A numpy array representing the play rate for each of the Node’s children.

ChildProbability
()[source]¶ Samples the probabilities of sampling each child Node.
Samples the play rate for each of the Node’s children Node objects. If no children have been sampled in MCTS, this returns zeros.
Returns: A numpy array representing the play rate for each of the Node’s children. Defaults to an array of zeros if no children have been sampled.

Network module¶

class
Network.
Network
(name, networkConstructor=None, tensorflowConfig={})[source]¶ Bases:
object

getPolicy
(state)[source]¶ Given a game state, return the network’s policy. Random Dirichlet noise is applied to the policy output to ensure exploration, if training.

RandomMCTS module¶

class
RandomMCTS.
RandomMCTS
(*args, **kwargs)[source]¶ Bases:
MCTS.MCTS

FindMove
(state, *args, **kwargs)[source]¶ Finds the optimal move in a position.
Given a game state, this will use a Monte Carlo Tree Search algorithm to pick the best next move.
Parameters:  state – A GameState object which the function will evaluate.
 temp – A float determining the temperature to apply in move selection.
 moveTime – An optional float determining the allowed search time.
 playLimit – An optional float determining the allowed number of positions to evaluate.
Returns:  A tuple providing, in order…
 The board state after applying the selected move
 The decided value of input state
 The probabilities of choosing each of the children
Raises: TypeError
– state was not an object of type GameState.ValueError
– The function was not able to determine a stop time.

MoveRoot
(*args, **kwargs)[source]¶ This is the public API of MCTS._moveRoot.
Move the root of the tree to the provided state. Use this to update the root so that tree integrity can be maintained between moves if necessary. Does nothing if Root is None, for example after running DropRoot().
Parameters: state – A GameState object which self.Root should be updated to.

ResetRoot
(*args, **kwargs)[source]¶ Set self.Root to the appropriate initial state.
Reset the state of self.Root to an appropriate initial state. If self.Root was already None, then there is nothing to do, and it will remain None. Otherwise, ResetRoot will apply an iterative backup to self.Root until its parent is None.
