Introduction
Chess clocks have been used in competitive chess games since 1883. There are few variations but the main idea how they work didn’t change for over a century. Apart from time controls configuration at the start of the game the only interaction with chess clock is a player pressing a button after each move to stop their own clock and start the opponent’s. Novice players often forget to do that. Given advancements in computer vision it is time to create a hand free chess clock. And not in traditional hands free devices sense, meaning that it is voice controlled. Idea it to use camera to recognise when move is finished and automatically stop current player clock and resume opponent’s clock. Thus freeing players from any mandatory interaction with a clock except initial setup. An intelligent clock could actually understand chess positions, annotate games, detect illegal moves, stream online, give advice during training and much more.
Here is a quick demonstration what clock can do already:
Mechanical design
Frame is made mostly from eitech construction set parts, rubber feet are from old printer rollers, camera mounting plate is 3D printed.
Voice commands
Clock is controlled by voice commands. Audio is captured using Advanced Linux Sound Architecture (ALSA) library at 16kHz sampling rate. Silero Voice Activity Detector (VAD) is used to find start and end of the speech. VAD is using Open Neural Network Exchange (ONNX) Runtime. Detected speech is recognised using whisper.cpp library which is a plain C/C++ implementation without dependencies of OpenAI Whisper model. whisper.cpp library is compiled with cuBLAS for CUDA support. Small English only Whisper model is used. Inference time varies from under a second to several seconds depending on the input. This is real-time enough for clock application given it receives commands quite infrequently. Transcribed text is pattern matched (using regular expressions) against expected commands. If match is found relevant command is executed. Supported commands:
- start x minute(s) game [with y second increment] - starts a new game. All chess pieces must be at their initial squares. After starting game video camera will observe the board and automatically switch relevant clock after each move.
- stop the game - will stop the clock.
- continue game - will resume clock.
- shutdown - will halt computer running the clock. Handy when you don't have keyboard or remote terminal for a safe shutdown. /etc/sudoers file must be modified accordingly.
- please tell best move - will use external chess engine to evaluate current position and will tell what it thinks the best move is.
- what is worst move - will use external chess engine to evaluate current position and will tell what it thinks the worst move is.
- who is winning - will use external chess engine to evaluate current position and will tell who is winning.
Slight variations of command wording will work too.
Voice output
Hands Free Chess Clock provides voice feedback after commands, chess moves and when game ends. It can also tell opening name from Lichess opening database. Voice audio is generated on the fly using Piper text to speech (TTS) engine.
Chess board detection
Open Source Computer Vision Library (OpenCV) is used to detect chess board. Software doesn't attempt to detect board in arbitrary orientation or with arbitrary initial chess position. Assumption is made that when start new game command is issued chess pieces will be at their initial positions, board will be in the view of the camera and aligned with the chess clock.
Wooden chess board shown in the video above is my first chess board I ever played on. It is over 30 years old and is quite worn. This makes detection problem more challenging because of extra visual artefacts on the board. On the other hand it is good for tuning vision algorithms not only for clean tournament grade boards but for a regular board used in amateur setting. Tests so far were made on 2 different boards, see images bellow.
Video capturing is running all the time regardless if game is being played or not. This helps to keep camera focused and adjusted to light conditions when user issues start game command. When frame is captured its blurriness determined by calculating Laplacian and frames bellow certain threshold are ignored. Camera refocusing might severely decrease chance to correctly detect board or a move.
Board detection algorithm is executed at every start of the game and then board is assumed to be static with minimal disturbances.
Visual processing starts with classical steps to convert RGB image to grayscale and apply some blurring:
Next, OTSU threshold is applied to aid extracting contours:
As you can see some of squares are joined and would result in contours spanning multiple squares but we want to detect individual squares rather than groups. To fix that erosion operation is applied:
Now white squares are separated but black ones are joined. To rectify inverted threshold is calculated and erosion is applied to it:
Both erosion results are used to find contours:
Detected contours are approximated as polygons and all non convex or polygons which don't have exactly 4 vertices are filtered out. Additional filters are applied to remove all polygons not having 2 horizontal, not having 2 almost vertical sides and too big ones, for example, to filter out polygon for the whole board:
Expectation is that remaining polygons will match some or all squares of the chess board. For standard chess boards all non central squares are obstructed by chess pieces spanning multiple squares because they are tall therefore preventing these squares to be recognised as 4 sided polygons. For some less common boards non central squares will be detected too:
As you can see non central squares are detected too but not on all ranks. Therefore some more heuristics need to be applied to identify which 4 ranks are central. Assumption is made that there will be at least one square detected per file. Then k-means algorithm is used on polygon centers of mass to group them to 8 files. Grouping to ranks is much harder because of perspective distortion the first file and the last file squares will overlap. Therefore a different approach is taken to find out ranks. Starting with h file leftmost square is assumed to have row 0, column 0 (as looking at the image not from player perspective). Then by moving at a distance of its top line length other polygons are found and numbered. In the image 0, 2 is detected next at a distance of 2 sizes of the initial square. Moving to the lower file first we need to find a square matching any column of previous row. Moving from the left that would be 1, 0. Then other squares in the same row are detected by moving left and right and process continues until the last row. Row and column indices are relative and don't need to correspond to the chess board coordinates. The contiguous group of columns with largest number of rows is assumed to be central squares. They contain green coordinates in the picture. Yellow ones participated in detection therefore they have indices but were not assumed to be in the center. Red ones were rejected for not being square enough.
From detected central squares we can extract side lines (blue) bounding the unoccupied center of the board (red dots). Scaling center rectangle (it is exactly half of the board) allows to get rectangle spanning whole board (purple dots).
Using purple point location we can calculate and apply perspective correction needed to get square board:
Given how move detection (see bellow) works, perspective correction doesn't need to be very accurate.
Move detection
Once game starts MOG2 background subtraction algorithm is used to detect start and end of the move. This algorithm is adaptive therefore then movement finishes even if scene changed it will eventually be recognised as new background. Speed of updating background can be configured. Once any movement starts it will cause big changes to the original background. Especially because human hand is relatively big compared to chess pieces and shadows plus lighting changes will cause even more disturbances. When disturbances exceeds certain threshold we can assume start of the move. Once disturbances falls bellow other threshold we can assume end of the move. Choosing falling threshold significantly smaller than rising threshold avoids/minimises oscillations. Once move finish event is detected we can calculate difference of background before and after move. Applying blurring and adaptive threshold helps to get more meaningful difference:
There is a lot of noise in the difference image because of table moving, shadows changing and so on. 6 squares with most changes are shown in red. These squares are scored using various rules and matched against legal moves in this position. To get legal moves and endgame conditions simple chess engine is implemented. Rules for scoring for example favour square which has another disturbed square above it as all chess pieces when looking from this angle also obscure at least one square above it. Obviously this rule can't be applied for top most h file because it doesn't have squares above it. If no match found it is assumed that either user attempted to move and changed her/his mind or it was just a disturbance caused by table motion or changing lighting conditions.
If there is no legal move match found, additionally previous moves are matched. If there is a match of moving back when moving back is not legal it is assumed that it is a "take back" move. Chess position and clocks are updated accordingly. Therefore currently it is only possible to take back moves which are not legal normal/forward moves. For example, pawn moves. In future additional voice command might be added to allow taking back any move.