diff --git a/examples/axelrod/NOTICE b/examples/axelrod/NOTICE new file mode 100644 index 0000000000..a7d10e2aec --- /dev/null +++ b/examples/axelrod/NOTICE @@ -0,0 +1,24 @@ +This folder contains code derived from Axelrod (https://github.com/Axelrod-Python/Axelrod). + +Copyright (c) 2015 The Axelrod-Python project team members listed at +https://github.com/Axelrod-Python/Axelrod/graphs/contributors + +Licensed under the MIT License (MIT): +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + diff --git a/examples/axelrod/README.md b/examples/axelrod/README.md new file mode 100644 index 0000000000..55a6b93b77 --- /dev/null +++ b/examples/axelrod/README.md @@ -0,0 +1,176 @@ +# Axelrod Tournament Strategy Evolution + +> **Built on the [Axelrod Python library](https://github.com/Axelrod-Python/Axelrod)** +> — an open-source framework for research into the iterated Prisoner's Dilemma. +> The tournament infrastructure, strategy implementations, and game mechanics used +> in this example are all provided by that project. The seed strategies in this +> example were adapted from strategies in the Axelrod library. See the `NOTICE` +> file in this directory for license details. + +This example uses OpenEvolve to evolve a strategy for the iterated Prisoner's Dilemma, inspired by [Robert Axelrod's famous tournaments](https://en.wikipedia.org/wiki/The_Evolution_of_Cooperation) from the 1980s. The goal is to discover a strategy that maximizes its average score when competing against the full suite of Axelrod library strategies. + +## Problem Overview + +In the [Prisoner's Dilemma](https://en.wikipedia.org/wiki/Prisoner%27s_dilemma), two players simultaneously choose to either **Cooperate (C)** or **Defect (D)**. The payoff matrix rewards mutual cooperation, but each player is individually tempted to defect. + +In the *iterated* version, players play many rounds against each other. Strategies that build cooperation tend to do well over time, but must also protect against exploitation by purely selfish strategies. + +The evolved strategy plays a one-vs-all tournament against every strategy in the `axelrod.short_run_time_strategies` collection (over 100 strategies). It competes against each one in a separate match and is scored on its **median score** across all matchups. + +## Evaluation + +The evaluator (`evaluator.py`) works as follows: + +1. Loads the evolved strategy as a class named `Evolvo` +2. Constructs a tournament where `Evolvo` plays every strategy in `axl.short_run_time_strategies` +3. Plays all matches (1 repetition per pair, no round-robin among opponents) +4. Returns the `Median_score` as the primary metric (`combined_score`) + +Additional artifacts are tracked for each evaluation: +- **Cooperation_rating**: How often the strategy cooperates overall +- **Wins**: Number of outright wins +- **CC_rate / CD_rate / DC_rate / DD_rate**: Frequencies of each outcome pair +- **CC_to_C / CD_to_C / DC_to_C / DD_to_C**: Conditional cooperation rates after each prior outcome + +These artifacts are fed back into the LLM prompt (via `include_artifacts: true`) so the LLM can see exactly how the strategy is behaving and make informed improvements. + +## Seed Strategies + +Rather than starting from a single naive strategy, this example seeds 4 diverse starting strategies—one per island—covering a range of well-known approaches. Each was adapted from the Axelrod library with the evolve block wrapping the class body so OpenEvolve can mutate the strategy. + +| File | Original Strategy | Key Behavior | +|------|------------------|--------------| +| `seed1.py` | Tit-for-Tat | Cooperate first; mirror the opponent's last move | +| `seed2.py` | ForgetfulFoolMeOnce(0.05) | Forgive one defection; retaliate forever on a second; occasionally forget with 5% probability | +| `seed3.py` | OriginalGradual | Escalate punishment with each defection; enter a calming state to de-escalate | +| `seed4.py` | SecondByBorufsen | Track opponent patterns; switch to defect mode against random or defective opponents; detect mutual-defect cycles and alternating echoes | + +These seeds were chosen to be near-top-level performers that are straightforward to implement, while remaining diverse enough to explore different regions of strategy space. + +## Island-Based Seeding + +The example uses a custom runner (`run_evolve.py`) rather than the standard `openevolve-run.py`. This allows each seed to be evaluated and placed directly onto a specific island before evolution begins: + +```python +for i, seed_path in enumerate(initial_seeds): + with open(seed_path, 'r') as f: + code = f.read() + metrics = await openevolve.evaluator.evaluate_program(code, initial_program_id) + program = Program(id=..., code=code, metrics=metrics, ...) + openevolve.database.add(program, target_island=i) +``` + +This ensures each island starts with a strong, distinct strategy—promoting diversity throughout the evolutionary process and preventing premature convergence to a single approach. + +## Running the Example + +Install the dependency and run from the repository root: + +```bash +pip install axelrod +python examples/axelrod/run_evolve.py +``` + +## Configuration + +The evolution is configured in `config.yaml`: + +```yaml +max_iterations: 1000 +checkpoint_interval: 10 + +llm: + primary_model: "gemini-2.5-flash-lite" # Fast, cheap model for most mutations + primary_model_weight: 0.8 + secondary_model: "gemini-2.5-flash" # Stronger model for refinement + secondary_model_weight: 0.2 + temperature: 0.7 + max_tokens: 16000 + +prompt: + num_top_programs: 3 + num_diverse_programs: 2 + include_artifacts: true # Feed cooperation stats back to LLM + system_message: "You're building a program that runs iterated prisoner's dilemma + in a tournament with many other strategies, similar to Axelrod's famous tournaments. + Your task is to improve the play strategy for maximum profit." + +database: + population_size: 50 + num_islands: 4 + elite_selection_ratio: 0.2 + exploitation_ratio: 0.7 + similarity_threshold: 0.99 + +evaluator: + enable_artifacts: true + timeout: 60 + parallel_evaluations: 3 +``` + +Key configuration decisions: +- **4 islands** to match the 4 seed strategies, maintaining diversity throughout evolution +- **Artifacts enabled** so the LLM can reason about cooperation patterns and win/loss rates +- **Similarity threshold of 0.99** to discourage near-duplicate strategies from accumulating in the population + +## Evolved Strategy: AdaptiveCycleBreakerTFT + +The best evolved strategy (`best_program.py`) is named `AdaptiveCycleBreakerTFT`. It is a sophisticated refinement of the SecondByBorufsen seed, with several key changes: + +**Mode Switching (every 15 turns instead of 25):** +- Evaluates opponent behavior more frequently, allowing faster adaptation +- Switches to permanent **Defect mode** if the opponent cooperated fewer than 3 times, or if less than 83% of their cooperations were in response to the player's own cooperation (stricter than the original 70% threshold) + +**Cycle-Breaking in Normal Mode (triggers at streak ≥ 2 instead of 3):** +- If both players have mutually defected for 2+ consecutive turns, cooperate once to break the cycle +- If an alternating D/C echo pattern persists for 2+ turns, flag the next defect to become a cooperation + +**Tit-for-Tat as Default:** +- When none of the special conditions apply, simply mirror the opponent's last move + +The strategy is fully deterministic (`stochastic: False`) and has a declared `memory_depth` of 15, matching its re-evaluation period. + +```python +class AdaptiveCycleBreakerTFT(Player): + def strategy(self, opponent: Player) -> Action: + # Re-evaluate mode every 15 turns + if turn > 1 and turn % 15 == 0: + if self.opp_coops < 3 or self.cc_counts / self.opp_coops < 0.83: + self.mode = "Defect" + else: + self.mode = "Normal" + + if self.mode == "Defect": + return D + + # Break mutual defection cycles + if self.mutual_defect_streak >= 2: + return C + + # Flag alternating echo patterns + if self.echo_streak >= 2: + self.flip_next_defect = True + + # Tit-for-tat (with optional flip) + return C if opponent.history[-1] == C else (C if self.flip_next_defect else D) +``` + +## Validating the Best Program + +After evolution completes, you can run a full round-robin validation tournament: + +```bash +python examples/axelrod/validation.py +``` + +This runs `AdaptiveCycleBreakerTFT` against all `short_run_time_strategies` in a proper round-robin tournament (all vs. all) with 5 repetitions, giving a more robust estimate of its ranking. + +## Key Observations + +1. **Diversity through seeding**: Starting with 4 distinct strategies on separate islands prevented premature convergence and gave the LLM a broader set of patterns to recombine and refine. + +2. **Artifact-driven learning**: Feeding cooperation statistics (CC_rate, DC_rate, etc.) back to the LLM allowed it to reason about *why* a strategy was failing—e.g., too cooperative against defectors, or failing to re-establish cooperation after mutual defection. + +3. **Convergence toward known principles**: The evolved strategy independently rediscovered key insights from decades of game theory research—the value of proportional retaliation, the need to detect and escape mutual defection traps, and the importance of distinguishing random from deterministically defective opponents. + +4. **Parameter refinement**: The evolved strategy tightened the SecondByBorufsen parameters (25→15 turn evaluation period, 70%→83% cooperation threshold, streak threshold 3→2), suggesting the LLM learned to be more responsive and less forgiving than the original. diff --git a/examples/axelrod/best_program.py b/examples/axelrod/best_program.py new file mode 100644 index 0000000000..231d372027 --- /dev/null +++ b/examples/axelrod/best_program.py @@ -0,0 +1,143 @@ +from axelrod.action import Action +from axelrod.player import Player + +C, D = Action.C, Action.D + + +class AdaptiveCycleBreakerTFT(Player): + """ + An adaptive strategy that switches between cooperative and defective modes + based on opponent behavior analysis. + + This player tracks the opponent's cooperation patterns: + + - `opp_coops` counts total opponent cooperations since last evaluation + - `cc_counts` counts opponent cooperations in response to player cooperating + - `cd_counts` counts opponent cooperations in response to player defecting + + The player operates in two modes: + + **Normal Mode:** + Uses conditional cooperation with the following ranked rules: + + 1. If mutual defection occurred for 2+ consecutive turns, cooperate once + to attempt breaking the cycle. + 2. If an alternating pattern is detected for 2+ turns (player and opponent + taking turns defecting), flag the next defection to be converted to + cooperation. + 3. Otherwise, play tit-for-tat. + + **Defect Mode:** + Always defects. + + Starting in normal mode, the player re-evaluates its mode every 15 turns. + It switches to defect mode if either condition holds: + + - Opponent cooperated fewer than 3 times in the evaluation period + - Less than 83% of opponent's cooperations were in response to player's + cooperation, i.e. cc_counts / opp_coops < 0.83 + + When transitioning from defect mode back to normal mode, the player defects + on the first turn of normal mode. The special rules for mutual defection + and alternating patterns only apply during normal mode operation. + """ + + name = "AdaptiveCycleBreakerTFT" + classifier = { + "memory_depth": 15, + "stochastic": False, + "long_run_time": False, + "inspects_source": False, + "manipulates_source": False, + "manipulates_state": False, + } + + def __init__(self): + super().__init__() + # Counters used for deciding mode + self.opp_coops = 0 + self.cd_counts, self.cc_counts = 0, 0 + + # Streak counters + self.mutual_defect_streak = 0 + self.echo_streak = 0 + + self.flip_next_defect = False + self.mode = "Normal" + + def strategy(self, opponent: Player) -> Action: + turn = len(self.history) + 1 + if turn == 1: + return C + + # Update counters. + if turn >= 3: + if opponent.history[-1] == C: + self.opp_coops += 1 + if self.history[-2] == C: + self.cc_counts += 1 + else: + self.cd_counts += 1 + + # Check if it's time for a mode change. + if turn > 1 and turn % 15 == 0: + coming_from_defect = (self.mode == "Defect") + + self.mode = "Normal" + if self.opp_coops < 3 or self.cc_counts / self.opp_coops < 0.83: + self.mode = "Defect" + + # Clear counters + self.opp_coops = 0 + self.cd_counts, self.cc_counts = 0, 0 + if self.mode == "Defect": + self.mutual_defect_streak = 0 + self.echo_streak = 0 + self.flip_next_defect = False + + # Check this special case: if coming from defect mode, defect on first move + if self.mode == "Normal" and coming_from_defect: + return D + + # In Defect mode, just defect + if self.mode == "Defect": + return D + assert self.mode == "Normal" + + # Update streak counters + if self.history[-1] == D and opponent.history[-1] == D: + self.mutual_defect_streak += 1 + else: + self.mutual_defect_streak = 0 + + my_two_back, opp_two_back = C, C + if turn >= 3: + my_two_back = self.history[-2] + opp_two_back = opponent.history[-2] + if ( + self.history[-1] != opponent.history[-1] + and self.history[-1] == opp_two_back + and opponent.history[-1] == my_two_back + ): + self.echo_streak += 1 + else: + self.echo_streak = 0 + + # Special behavior for streaks + if self.mutual_defect_streak >= 2: + self.mutual_defect_streak = 0 + return C + + if self.echo_streak >= 2: + self.echo_streak = 0 + self.flip_next_defect = True + + # Just do tit-for-tat + if opponent.history[-1] == C: + return C + + if self.flip_next_defect: + self.flip_next_defect = False + return C + + return D diff --git a/examples/axelrod/config.yaml b/examples/axelrod/config.yaml new file mode 100644 index 0000000000..7e4cc537df --- /dev/null +++ b/examples/axelrod/config.yaml @@ -0,0 +1,42 @@ +# Configuration for function minimization example +max_iterations: 1000 +checkpoint_interval: 10 + +# LLM configuration +llm: + primary_model: "gemini-2.5-flash-lite" + primary_model_weight: 0.8 + secondary_model: "gemini-2.5-flash" + secondary_model_weight: 0.2 + api_base: "https://generativelanguage.googleapis.com/v1beta/openai/" + temperature: 0.7 + max_tokens: 16000 + timeout: 120 + +# Prompt configuration +prompt: + num_top_programs: 3 + num_diverse_programs: 2 + include_artifacts: true + system_message: "You're building a program that runs iterated prisoner's dilemma in a tournament with many other strategies, similar to Axelrod's famous tournaments. Your task is to improve the play strategy for maximum profit." + +# Database configuration +database: + population_size: 50 + archive_size: 20 + num_islands: 4 + elite_selection_ratio: 0.2 + exploitation_ratio: 0.7 + + # embedding_model: "text-embedding-3-small" + similarity_threshold: 0.99 + +# Evaluator configuration +evaluator: + enable_artifacts: true + cascade_evaluation: false + timeout: 60 + parallel_evaluations: 3 + +# Evolution settings +max_code_length: 20000 diff --git a/examples/axelrod/evaluator.py b/examples/axelrod/evaluator.py new file mode 100644 index 0000000000..2d804e263f --- /dev/null +++ b/examples/axelrod/evaluator.py @@ -0,0 +1,77 @@ +import importlib.util +import traceback + +import axelrod as axl + +from openevolve.evaluation_result import EvaluationResult + + +def evaluate(program_path): + """ + Evaluate the program by running it multiple times and checking how close + it gets to the known global minimum. + + Args: + program_path: Path to the program file + + Returns: + Dictionary of metrics + """ + try: + # Load the program + spec = importlib.util.spec_from_file_location("program", program_path) + program = importlib.util.module_from_spec(spec) + spec.loader.exec_module(program) + + existing_players = [s() for s in axl.short_run_time_strategies] + new_player = program.Evolvo() + players = [new_player] + existing_players[:] + # edges should be new_player against everyone else + edges = [(0, i) for i in range(1, len(players))] + tournament = axl.Tournament(players, edges=edges, repetitions=1) + results = tournament.play(progress_bar=False) + + evolvo_score = None + evolvo_artifacts = dict() + for player in results.summarise(): + if player.Name == program.Evolvo.name: + evolvo_score = player.Median_score + evolvo_artifacts["Cooperation_rating"] = player.Cooperation_rating + evolvo_artifacts["Wins"] = player.Wins + evolvo_artifacts["CC_rate"] = player.CC_rate + evolvo_artifacts["CD_rate"] = player.CD_rate + evolvo_artifacts["DC_rate"] = player.DC_rate + evolvo_artifacts["DD_rate"] = player.DD_rate + evolvo_artifacts["CC_to_C_rate"] = player.CC_to_C_rate + evolvo_artifacts["CD_to_C_rate"] = player.CD_to_C_rate + evolvo_artifacts["DC_to_C_rate"] = player.DC_to_C_rate + evolvo_artifacts["DD_to_C_rate"] = player.DD_to_C_rate + assert(evolvo_score is not None) + + return EvaluationResult( + metrics={ + "combined_score": evolvo_score, + "error": 0.0, + }, + artifacts=evolvo_artifacts, + ) + + except Exception as e: + print(f"Evaluation failed completely: {str(e)}") + print(traceback.format_exc()) + + # Create error artifacts + error_artifacts = { + "error_type": type(e).__name__, + "error_message": str(e), + "full_traceback": traceback.format_exc(), + "suggestion": "Check for syntax errors or missing imports in the generated code" + } + + return EvaluationResult( + metrics={ + "combined_score": 0.0, + "error": str(e), + }, + artifacts=error_artifacts, + ) diff --git a/examples/axelrod/requirements.txt b/examples/axelrod/requirements.txt new file mode 100644 index 0000000000..9cf39c6402 --- /dev/null +++ b/examples/axelrod/requirements.txt @@ -0,0 +1 @@ +axelrod diff --git a/examples/axelrod/run_evolve.py b/examples/axelrod/run_evolve.py new file mode 100644 index 0000000000..d84b559aad --- /dev/null +++ b/examples/axelrod/run_evolve.py @@ -0,0 +1,62 @@ +import asyncio +import logging +import os +import uuid + +from openevolve import OpenEvolve +from openevolve.config import load_config +from openevolve.database import Program + +AXELROD_DIR = os.path.join("examples", "axelrod") + +logger = logging.getLogger(__name__) + + +async def main(): + initial_seeds_files = ["seed1.py", "seed2.py", "seed3.py", "seed4.py"] + initial_seeds = [os.path.join(AXELROD_DIR, f) for f in initial_seeds_files] + + # Load base config from file or defaults + config = load_config(os.path.join(AXELROD_DIR, "config.yaml")) + + openevolve = OpenEvolve( + initial_program_path=initial_seeds[0], # Default for index 0 + evaluation_file=os.path.join(AXELROD_DIR, "evaluator.py"), + config=config, + ) + + # Manually insert other seeds into the database/islands before running + for i, seed_path in enumerate(initial_seeds): + with open(seed_path, 'r') as f: + code = f.read() + logger.info("Adding initial program to database") + initial_program_id = str(uuid.uuid4()) + + metrics = await openevolve.evaluator.evaluate_program( + code, initial_program_id + ) + + program = Program( + id=initial_program_id, + code=code, + language=openevolve.config.language, + metrics=metrics, + iteration_found=0, + ) + openevolve.database.add(program, target_island=i) + + # Run evolution + best_program = await openevolve.run() # checkpoint_path="examples/axelrod/openevolve_output/checkpoints/checkpoint_25") + + print(f"\nEvolution complete!") + print(f"Best program metrics:") + for name, value in best_program.metrics.items(): + # Handle mixed types: format numbers as floats, others as strings + if isinstance(value, (int, float)): + print(f" {name}: {value:.4f}") + else: + print(f" {name}: {value}") + + +if __name__ == "__main__": + asyncio.run(main()) diff --git a/examples/axelrod/seed1.py b/examples/axelrod/seed1.py new file mode 100644 index 0000000000..592d53c675 --- /dev/null +++ b/examples/axelrod/seed1.py @@ -0,0 +1,33 @@ +"""Originally Tit-for-Tat in Axelrod library. + +Portions of this code were taken from https://github.com/Axelrod-Python/Axelrod/blob/dev/axelrod/strategies/titfortat.py +Modified by T.J. Gaffney for use in openevolve. +""" + +from axelrod.action import Action +from axelrod.player import Player + +C, D = Action.C, Action.D + + +class Evolvo(Player): + + name = "Evolvo" + +# EVOLVE-BLOCK-START + """ + Cooperates on first move, and defects if the opponent defects. + """ + + def __init__(self): + super().__init__() + + def strategy(self, opponent: Player) -> Action: + # First move + if not self.history: + return C + # React to the opponent's last move + if opponent.history[-1] == D: + return D + return C +# EVOLVE-BLOCK-END diff --git a/examples/axelrod/seed2.py b/examples/axelrod/seed2.py new file mode 100644 index 0000000000..7d99827ead --- /dev/null +++ b/examples/axelrod/seed2.py @@ -0,0 +1,48 @@ +"""Originally ForgetfulFoolMeOnce(0.05) in Axelrod library. + +Portions of this code were taken from https://github.com/Axelrod-Python/Axelrod/blob/dev/axelrod/strategies/oncebitten.py +Modified by T.J. Gaffney for use in openevolve. +""" + +import random + +from axelrod.action import Action +from axelrod.player import Player + +C, D = Action.C, Action.D + + +class Evolvo(Player): + name = "Evolvo" + +# EVOLVE-BLOCK-BEGIN + """ + Forgives one D then retaliates forever on a second D. Sometimes randomly + forgets the defection count, and so keeps a secondary count separate from + the standard count in Player. + """ + + def __init__(self) -> None: + """ + Parameters + ---------- + forget_probability, float + The probability of forgetting the count of opponent defections. + """ + super().__init__() + self.D_count = 0 + self._initial = C + self.forget_probability = 0.05 + + def strategy(self, opponent: Player) -> Action: + r = random.random() + if not opponent.history: + return self._initial + if opponent.history[-1] == D: + self.D_count += 1 + if r < self.forget_probability: + self.D_count = 0 + if self.D_count > 1: + return D + return C +# EVOLVE-BLOCK-END diff --git a/examples/axelrod/seed3.py b/examples/axelrod/seed3.py new file mode 100644 index 0000000000..56b5b67a6a --- /dev/null +++ b/examples/axelrod/seed3.py @@ -0,0 +1,57 @@ +"""Originally OriginalGradual in Axelrod library. + +Portions of this code were taken from https://github.com/Axelrod-Python/Axelrod/blob/dev/axelrod/strategies/titfortat.py +Modified by T.J. Gaffney for use in openevolve. +""" + +from axelrod.action import Action +from axelrod.player import Player + +C, D = Action.C, Action.D + + +class Evolvo(Player): + + name = "Evolvo" + +# EVOLVE-BLOCK-START + """ + A player that punishes defections with a growing number of defections + but after punishing for `punishment_limit` number of times enters a calming + state and cooperates no matter what the opponent does for two rounds. + + The `punishment_limit` is incremented whenever the opponent defects and the + strategy is not in either calming or punishing state. + """ + + def __init__(self) -> None: + + super().__init__() + self.calming = False + self.punishing = False + self.punishment_count = 0 + self.punishment_limit = 0 + + def strategy(self, opponent: Player) -> Action: + if self.calming: + self.calming = False + return C + + if self.punishing: + if self.punishment_count < self.punishment_limit: + self.punishment_count += 1 + return D + else: + self.calming = True + self.punishing = False + self.punishment_count = 0 + return C + + if D in opponent.history[-1:]: + self.punishing = True + self.punishment_count += 1 + self.punishment_limit += 1 + return D + + return C +# EVOLVE-BLOCK-END diff --git a/examples/axelrod/seed4.py b/examples/axelrod/seed4.py new file mode 100644 index 0000000000..db7e6bfb72 --- /dev/null +++ b/examples/axelrod/seed4.py @@ -0,0 +1,153 @@ +"""Originally SecondByBorufsen in Axelrod library. + +Portions of this code were taken from https://github.com/Axelrod-Python/Axelrod/blob/dev/axelrod/strategies/axelrod_second.py +Modified by T.J. Gaffney for use in openevolve. +""" + +from axelrod.action import Action +from axelrod.player import Player + +C, D = Action.C, Action.D + + +class Evolvo(Player): + + name = "Evolvo" + +# EVOLVE-BLOCK-START + """ + This player keeps track of the the opponent's responses to own behavior: + + - `cd_count` counts: Opponent cooperates as response to player defecting. + - `cc_count` counts: Opponent cooperates as response to player cooperating. + + The player has a defect mode and a normal mode. In defect mode, the + player will always defect. In normal mode, the player obeys the following + ranked rules: + + 1. If in the last three turns, both the player/opponent defected, then + cooperate for a single turn. + 2. If in the last three turns, the player/opponent acted differently from + each other and they're alternating, then change next defect to + cooperate. (Doesn't block third rule.) + 3. Otherwise, do tit-for-tat. + + Start in normal mode, but every 25 turns starting with the 27th turn, + re-evaluate the mode. Enter defect mode if any of the following + conditions hold: + + - Detected random: Opponent cooperated 7-18 times since last mode + evaluation (or start) AND less than 70% of opponent cooperation was in + response to player's cooperation, i.e. + cc_count / (cc_count+cd_count) < 0.7 + - Detect defective: Opponent cooperated fewer than 3 times since last mode + evaluation. + + When switching to defect mode, defect immediately. The first two rules for + normal mode require that last three turns were in normal mode. When starting + normal mode from defect mode, defect on first move. + """ + + def __init__(self): + super().__init__() + self.cd_counts, self.cc_counts = 0, 0 + self.mutual_defect_streak = 0 + self.echo_streak = 0 + self.flip_next_defect = False + self.mode = "Normal" + + def try_return(self, to_return): + """ + We put the logic here to check for the `flip_next_defect` bit here, + and proceed like normal otherwise. + """ + + if to_return == C: + return C + # Otherwise look for flip bit. + if self.flip_next_defect: + self.flip_next_defect = False + return C + return D + + def strategy(self, opponent: Player) -> Action: + turn = len(self.history) + 1 + + if turn == 1: + return C + + # Update the response history. + if turn >= 3: + if opponent.history[-1] == C: + if self.history[-2] == C: + self.cc_counts += 1 + else: + self.cd_counts += 1 + + # Check if it's time for a mode change. + if turn > 2 and turn % 25 == 2: + coming_from_defect = False + if self.mode == "Defect": + coming_from_defect = True + + self.mode = "Normal" + coops = self.cd_counts + self.cc_counts + + # Check for a defective strategy + if coops < 3: + self.mode = "Defect" + + # Check for a random strategy + if (8 <= coops <= 17) and self.cc_counts / coops < 0.7: + self.mode = "Defect" + + self.cd_counts, self.cc_counts = 0, 0 + + # If defect mode, clear flags + if self.mode == "Defect": + self.mutual_defect_streak = 0 + self.echo_streak = 0 + self.flip_next_defect = False + + # Check this special case + if self.mode == "Normal" and coming_from_defect: + return D + + # Proceed + if self.mode == "Defect": + return D + else: + assert self.mode == "Normal" + + # Look for mutual defects + if self.history[-1] == D and opponent.history[-1] == D: + self.mutual_defect_streak += 1 + else: + self.mutual_defect_streak = 0 + if self.mutual_defect_streak >= 3: + self.mutual_defect_streak = 0 + self.echo_streak = 0 # Reset both streaks. + return self.try_return(C) + + # Look for echoes + # Fortran code defaults two turns back to C if only second turn + my_two_back, opp_two_back = C, C + if turn >= 3: + my_two_back = self.history[-2] + opp_two_back = opponent.history[-2] + if ( + self.history[-1] != opponent.history[-1] + and self.history[-1] == opp_two_back + and opponent.history[-1] == my_two_back + ): + self.echo_streak += 1 + else: + self.echo_streak = 0 + if self.echo_streak >= 3: + self.mutual_defect_streak = 0 # Reset both streaks. + self.echo_streak = 0 + self.flip_next_defect = True + + # Tit-for-tat + return self.try_return(opponent.history[-1]) +# EVOLVE-BLOCK-END diff --git a/examples/axelrod/validation.py b/examples/axelrod/validation.py new file mode 100644 index 0000000000..f358da7da9 --- /dev/null +++ b/examples/axelrod/validation.py @@ -0,0 +1,21 @@ +"""Runs a final analysis with round robin and multiple rounds.""" + +import importlib.util +import os +import pprint + +import axelrod as axl + + +if __name__ == "__main__": + # Load the program + spec = importlib.util.spec_from_file_location("program", os.path.join("examples", "axelrod", "best_program.py")) + program = importlib.util.module_from_spec(spec) + spec.loader.exec_module(program) + + players = [program.AdaptiveCycleBreakerTFT()] + [s() for s in axl.short_run_time_strategies] + tournament = axl.Tournament(players, repetitions=5) + results = tournament.play(progress_bar=True) + + summary = results.summarise(results) + pprint.pprint(summary)