Time-Travel Debugging: Replaying Production Bugs Locally
36 points by tie-in 4 days ago | 7 comments

canucker2016 13 hours ago
What they're describing isn't time travel debugging - https://en.wikipedia.org/wiki/Time_travel_debugging.

I'd call this logging calls to your business logic layer. Then running the logged calls on your business logic layer in a development environment to debug the problem.

Your business logic layer should be separate from your UI/presentation layer.

Makes it easy to test them separately if they're not tightly coupled.

Also if you wanna to reuse your business logic layer in a different UI environment, it's easier to switch to another UI if they're not tightly coupled.

reply
syntacticbs 17 hours ago
Interesting approach. I often get to the same goal by using the replicated state machine pattern. Where all inputs to a system are recorded. Both methods seem to rely on designing your application in a very specific way to be able to replay inputs and deterministically get the same outputs.
reply
Veserv 16 hours ago
In some senses, that is true of every scheme; you need to ensure you capture all non-determinism and that can be done with either more capturing or less non-determinism.

However, the restrictions for generic replay-based time-travel debugging is mostly just not using shared memory and, as a corollary, not using multiple threads in a process (multiple processes is okay). Deliberately architecting your system in the way described in the article is largely unnecessary as the overhead of these generic schemes is low, much less work, applies to most codebases that could even attempt deliberate re-architecture, and integrates well with existing tooling and visualizers.

You can even lower these restrictions further to include explicit shared memory if you record those accesses. And you can do everything if you just record all accesses. The overhead of each of these schemes increasing as the amount of recording needed to capture these forms of non-determinism increases.

reply
igorw 14 hours ago
Another approach is to record at a lower level and then reconstruct the series of events, eg.g. https://engineering.fb.com/2021/04/27/developer-tools/revers...
reply
F-W-M 14 hours ago
How do you structure your program to do this?

I had huge success writing a trading system where everything went through the same `on_event(Inputs) -> Outputs` function of the core and a thin shell was translating everything to inputs and the outputs to actions. I actually had a handful of these components communicating via message passing.

This worked rather well as most of the input is async messages anyway, but building anything else this way feels very tiresome.

reply
vimda 13 hours ago
OP is basically describing functional programming
reply
PeterWhittaker 10 hours ago
Interesting. I built a sort-of-similar system for executing a series of linked, serially-dependent system commands from the TUI used to manage some of our secure appliances. It made writing and debugging such sequences much easier: each command was represented by a struct containing optional pre, post-success, and post-fail log and status line messages, where status could simply default to log.

It meant that the user received meaningful short updates as things progressed, with detailed information in system logs.

This made it much easier for testers and users to report bugs and for developers to understand what to look for in logs.

reply