docs for maze-dataset v1.1.0

PyPI - Downloads

`maze-dataset`

This package provides utilities for generation, filtering, solving, visualizing, and processing of mazes for training ML systems. Primarily built for the maze-transformer interpretability project. You can find our paper on it here: http://arxiv.org/abs/2309.10498

This package includes a variety of maze generation algorithms, including randomized depth first search, Wilson’s algorithm for uniform spanning trees, and percolation. Datasets can be filtered to select mazes of a certain length or complexity, remove duplicates, and satisfy custom properties. A variety of output formats for visualization and training ML models are provided.

Maze generated via constrained randomized depth first search

Installation

This package is available on PyPI, and can be installed via

pip install maze-dataset

Docs

The full hosted documentation is available at https://understanding-search.github.io/maze-dataset/.

Additionally:

our notebooks serve as a good starting point for understanding the package:
- the notebooks page in the docs has links to the rendered notebooks
- the notebooks folder has the source notebooks
combined, single page docs are available as:
test coverage reports are available on the coverage page or the coverage/ folder
generation benchmark results are available on the benchmarks page or the benchmarks/ folder

Usage

Creating a dataset

To create a MazeDataset, which inherits from torch.utils.data.Dataset, you first create a MazeDatasetConfig:

from maze_dataset import MazeDataset, MazeDatasetConfig
from maze_dataset.generation import LatticeMazeGenerators
cfg: MazeDatasetConfig = MazeDatasetConfig(
    name="test", # name is only for you to keep track of things
    grid_n=5, # number of rows/columns in the lattice
    n_mazes=4, # number of mazes to generate
    maze_ctor=LatticeMazeGenerators.gen_dfs, # algorithm to generate the maze
    maze_ctor_kwargs=dict(do_forks=False), # additional parameters to pass to the maze generation algorithm
)

and then pass this config to the MazeDataset.from_config method:

dataset: MazeDataset = MazeDataset.from_config(cfg)

This method can search for whether a dataset with matching config hash already exists on your filesystem in the expected location, and load it if so. It can also generate a dataset on the fly if needed.

Conversions to useful formats

The elements of the dataset are SolvedMaze objects:

>>> m = dataset[0]
>>> type(m)
maze_dataset.maze.lattice_maze.SolvedMaze

Which can be converted to a variety of formats:

# visual representation as ascii art
m.as_ascii() 
# RGB image, optionally without solution or endpoints, suitable for CNNs
m.as_pixels() 
# text format for autoreregressive transformers
from maze_dataset.tokenization import MazeTokenizerModular, TokenizationMode
m.as_tokens(maze_tokenizer=MazeTokenizerModular(
    tokenization_mode=TokenizationMode.AOTP_UT_rasterized, max_grid_size=100,
))
# advanced visualization with many features
from maze_dataset.plotting import MazePlot
MazePlot(maze).plot()

Development

This project uses Poetry for development. To install with dev requirements, run

poetry install --with dev

A makefile is included to simplify common development tasks:

make help will print all available commands
all tests via make test
- unit tests via make unit
- notebook tests via make test_notebooks
formatter (black, pycln, and isort) via make format
- formatter in check-only mode via make check-format

Citing

If you use this code in your research, please cite our paper:

@misc{maze-dataset,
    title={A Configurable Library for Generating and Manipulating Maze Datasets}, 
    author={Michael Igorevich Ivanitskiy and Rusheb Shah and Alex F. Spies and Tilman Räuker and Dan Valentine and Can Rager and Lucia Quirke and Chris Mathwin and Guillaume Corlouer and Cecilia Diniz Behn and Samy Wu Fung},
    year={2023},
    eprint={2309.10498},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={http://arxiv.org/abs/2309.10498}
}

API Documentation

SolvedMaze
MazeDatasetConfig
MazeDataset
MazeDatasetCollection
MazeDatasetCollectionConfig
TargetedLatticeMaze
LatticeMaze
set_serialize_minimal_threshold
LatticeMazeGenerators
Coord
CoordTup
CoordList
CoordArray
Connection
ConnectionList
ConnectionArray
SPECIAL_TOKENS
VOCAB
VOCAB_LIST
VOCAB_TOKEN_TO_INDEX

Contents

maze-dataset

Installation

Docs

Usage

Creating a dataset

Conversions to useful formats

Development

Citing

Submodules

API Documentation

maze_dataset

maze-dataset

Installation

Docs

Usage

Creating a dataset

Conversions to useful formats

Development

Citing

class SolvedMaze(maze_dataset.maze.lattice_maze.TargetedLatticeMaze):

SolvedMaze

def get_solution_tokens

def from_lattice_maze

def from_targeted_lattice_maze

def get_solution_forking_points

def get_solution_path_following_points

def serialize

def load

def validate_fields_types

Inherited Members

class MazeDatasetConfig(maze_dataset.dataset.dataset.GPTDatasetConfig):

MazeDatasetConfig

def maze_ctor

Arguments

algorithm

def stable_hash_cfg

def to_fname

def summary

def serialize

def load

def validate_fields_types

Inherited Members

class MazeDataset(typing.Generic[+T_co]):

MazeDataset

def data_hash

def as_tokens

def generate

def download

def load

def serialize

def update_self_config

def custom_maze_filter

Inherited Members

class MazeDatasetCollection(typing.Generic[+T_co]):

MazeDatasetCollection

def generate

def download

def serialize

def load

def as_tokens

def update_self_config

Inherited Members

class MazeDatasetCollectionConfig(maze_dataset.dataset.dataset.GPTDatasetConfig):

MazeDatasetCollectionConfig

def summary

def stable_hash_cfg

def to_fname

def serialize

def load

def validate_fields_types

Inherited Members

class TargetedLatticeMaze(maze_dataset.maze.lattice_maze.LatticeMaze):

TargetedLatticeMaze

def get_start_pos_tokens

def get_end_pos_tokens

def from_lattice_maze

def serialize

def load

def validate_fields_types

`maze-dataset`

`maze_dataset`

`maze-dataset`

`class SolvedMaze(maze_dataset.maze.lattice_maze.TargetedLatticeMaze):`

`SolvedMaze`

`def get_solution_tokens`

`def from_lattice_maze`

`def from_targeted_lattice_maze`

`def get_solution_forking_points`

`def get_solution_path_following_points`

`def serialize`

`def load`

`def validate_fields_types`

`class MazeDatasetConfig(maze_dataset.dataset.dataset.GPTDatasetConfig):`

`MazeDatasetConfig`

`def maze_ctor`

`def stable_hash_cfg`

`def to_fname`

`def summary`

`def serialize`

`def load`

`def validate_fields_types`

`class MazeDataset(typing.Generic[+T_co]):`

`MazeDataset`

`def data_hash`

`def as_tokens`

`def generate`

`def download`

`def load`

`def serialize`

`def update_self_config`

`def custom_maze_filter`

`class MazeDatasetCollection(typing.Generic[+T_co]):`

`MazeDatasetCollection`

`def generate`

`def download`

`def serialize`

`def load`

`def as_tokens`

`def update_self_config`

`class MazeDatasetCollectionConfig(maze_dataset.dataset.dataset.GPTDatasetConfig):`

`MazeDatasetCollectionConfig`

`def summary`

`def stable_hash_cfg`

`def to_fname`

`def serialize`

`def load`

`def validate_fields_types`

`class TargetedLatticeMaze(maze_dataset.maze.lattice_maze.LatticeMaze):`

`TargetedLatticeMaze`

`def get_start_pos_tokens`

`def get_end_pos_tokens`

`def from_lattice_maze`

`def serialize`

`def load`

`def validate_fields_types`

`class LatticeMaze(muutils.json_serialize.serializable_dataclass.SerializableDataclass):`

`LatticeMaze`

`def heuristic`

`def nodes_connected`

`def is_valid_path`

`def coord_degrees`

`def get_coord_neighbors`

`def gen_connected_component_from`

`def find_shortest_path`

`def get_nodes`

`def get_connected_component`

`def generate_random_path`

`def as_adj_list`

`def from_adj_list`

`def as_adj_list_tokens`

`def as_tokens`

`def from_tokens`

`def as_pixels`

`def from_pixels`

`def as_ascii`

`def from_ascii`

`def serialize`

`def load`

`def validate_fields_types`