LoGeR – 3D reconstruction from extremely long videos (DeepMind, UC Berkeley)
124 points by helloplanets 13 hours ago | 26 comments

tmilard 11 hours ago
Very interesting paper. I can see street-view using it to perfect the 3D analysing of the photo-video they catch with there google-car. What a wonderfull time we are living in ! Specificaly in the Video to 3D reconstruction. Every month, a new brick is put in place.Super
reply
wumms 10 hours ago
Street View cars added Velodyne LiDAR around 2017 [0][1], but it's optional. I found no data on 'LiDAR vs image only'-percentage.

[0] https://arstechnica.com/gadgets/2017/09/googles-street-view-...

[1] https://en.wikipedia.org/wiki/Google_Street_View

reply
priowise 3 hours ago
very interesting direction. One thing I’m curious about with extremely long videos is how you handle temporal drift over time. Do you periodically re-anchor the reconstruction or rely purely on accumulated frame consistency?
reply
quadrature 2 hours ago
In a traditional SLAM pipeline you do periodically fix drift by detecting when you've visited an area that you've mapped before this lets you align your sub maps so they are globally consistent.

In the areas you have visited previously you have two estimates of your position one from your frame-to-frame estimates and another from the map you built of the area the first time. You can then solve an optimization problem to bring those two estimates closer together.

In order to find out if you've already visited an area you store a description of the locations in a DB and search through them. The paper says they use a compressed representation of the "maps" and use test time training to optimize the global consistency between their sub maps.

reply
IshKebab 11 hours ago
Very cool. Doesn't seem like they've actually released the code:

> This is a reimplementation of LoGeR; complete code and models will be released upon approval.

I don't understand why it's a reimplementation either?

I would guess it's "research" code anyway so not really usable unless you are an expert.

reply
_fw 10 hours ago
This is like something straight out of Cyberpunk 2077 - the braindances investigation scenes.
reply
Karliss 9 hours ago
More like the opposite. Point cloud data captured with varying means has existed for a long time with raw data visualized more or less just like this. And SciFi movies/games use the effect of raw visualization as something futuristic/computer tech looking. Just like wireframe on black background, although that one is getting partially downgraded to more retro scifi status since drawing 3d wireframe isn't hard anymore. It started when any 3d computer graphics even basic wireframe was futuristic and not every movie could afford it, with some of them faking it with analog means. Any good scifi author takes inspiration from real world technology and extrapolate based on it, often before widespread recognition of technology by general population. Once something reaches the state of consumer product beyond just researchers and trained professionals, the visuals tend to get more polished and you loose some of the raw, purely functional, engineering style.
reply
realberkeaslan 9 hours ago
It reminds me of that as well.
reply
raphaelmolly8 2 hours ago
[dead]
reply
msuniverse2026 12 hours ago
Truly don't understand what is happening in the heads of these researchers. Can't they see how the main use of this is going to be mass surveillance?
reply
KeplerBoy 11 hours ago
These seems to be much more robotics / autonomous vehicle focused? I don't quite see the mass surveillance angle you get from this you don't already get from cheap ubiquitous cameras, basic computer vision and networking (aka flock) .
reply
haritha-j 11 hours ago
I think you've made the erroneous assumption that the researchers care. I work in 3D reconstruction and I've not really seen too many people care about the actual use case, and indeed have had some friends join defence.
reply
KaiserPro 5 hours ago
This bit isn't that surveillance-y

Relocalisation is the bit thats surveillance-y. But its also crucial for accurate visual only navigation.

reply
endymion-light 9 hours ago
I mean, i think if you want to perform mass surveilance, you can do it far cheaper and more efficiently via facial recognition, mobile phone surveillance and a variety of different other methods.

If you want reconstruction and training of robotic movement, this is far more appropriate. I believe we're going to see robots being able to "dream" in terms of analysing historical video information on spaces and improving movement and navigation.

So not mass surveilance, but probably there's a future of mass subjugation using robot enforcement.

reply
imtringued 10 hours ago
I'm not sure what you mean. The input video feed already constitutes "surveillance". You'd need cameras everywhere and if you have a camera, you can also just use regular models like China already does.
reply
Dead_Lemon 11 hours ago
What is the actual objective of this, is it solving an issue or creating a solution to a problem, that is still to be determined? It seems like a lot of energy to replicate a lidar mapping system. It's not like you can expect accurate dimensions from this approximate guess work, excluding the expected hallucinations adding to inaccuracy.
reply
alpine01 9 hours ago
3D reconstruction of old spaces which no longer exist seems like a clear use case to me. There's loads of old videos of driving down a street in the 80s, or neighborhoods in cities which got replaced.

I can imagine future iterations of this which bring together other stills of the same space at that time to augment the dataset. Then perhaps another pass to fill in gaps with likely missing content based on probability or data from say the same street 10 years later.

It won't be 100% real, but I think it'd be very cool to be able to have a google-street view style experience of areas before google street view existed.

reply
phrotoma 8 hours ago
> it'd be very cool to be able to have a google-street view style experience of areas before google street view existed.

Now do Kowloon Walled City.

reply
voidUpdate 10 hours ago
Video cameras are much cheaper and easier to use than LIDAR, like anyone can just pull out their phone, take a video and send it to this algorithm to get a reasonable point cloud of the environment. Sure, if you want an exact model of an environment and you have the time and money, LIDAR would give better results, but this is about doing more with less
reply
KaiserPro 5 hours ago
One of the key issues of "machine perception" is the inability of machines using standard image sensors to re-create the world accurately.

Lidars are great, and getting smaller, but they still eat a lot of power. (The quest 3 had a lidar on the front[well structured light] and it was mostly not used)

For machines to understand the 3d world, first they need to extract geometry, then isolate those geometries into objects. This method is _a_ way to do that, the first step, extracting 3d points.

The problem with this model is that the points are not actually that well aligned frame to frame. This is why it looks a bit blurry. I assume this is to avoid running out of memory, as you're not quite sure about which points are relevant and need to be kept in memory.

Once you have those points, you need to replace them with simplfied geometry, so that you can workout intersections and junk.

reply
washadjeffmad 6 hours ago
We use drones with RGB cameras for photogrammetry to reconstruct 3D environments with gaussian splatting, which is a manual process and often requires making multiple trips for additional capture to fill in gaps. Because it's for perceptual use and doesn't require high accuracy, automating with a single-take video would be useful.
reply
ekjhgkejhgk 7 hours ago
The actual objective is learning about these systems. It's called research.
reply
_diyar 7 hours ago
You can reconstruct accurate dimensions if you have IMU data.
reply
flipbrad 10 hours ago
N00b question from me, perhaps, but how easy is it to mount and run Lidar on aerial drones?
reply
petargyurov 10 hours ago
It's easy but it's not cheap. Well, price is relative but capturing video is certainly cheaper.

Also, I am not sure how heavy LIDAR units are, but remember that the heavier the payload the more the flight time is reduced. Some drones can only have a single payload, so if you also want to capture (high-res) video/imgs you need to fly again.

It all depends on the use-case.

reply
Daub 10 hours ago
The most available lidar is found on your iPhone, but the results are orders of magnitude less detailed than that derived from photogrammetry. How ever an advantage is that lidar is not confused by reflections.
reply
taneq 9 hours ago
Huh? LIDAR absolutely is confused by reflections. Not always the reflections you can see (because often it’s using IR wavelengths) but nonetheless, reflections.
reply