Anti-scraping people are tiring.

Federation duplicates data and the way masto is built, masto's federation backs it up for perpetual availability.

Scraping and archiving is another side of the same coin.

You opt into having a permanent record of your digital activity when you start posting online.

@jonn Wait, I saw something the other day that said that Masto data is ephemeral. Is it permanent or not?

(I *hope* it's permanent)

@nafnlaus one query is better than a thousand words.

Stuff we interact with gets permanently cached on our instances.

Maybe big instances work differently, but it's a matter of coping with scale rather than a design choice.

@jonn Hey, while I have you here, is it possible for you to check - how much of the stored data is text vs. images vs. video?

(I'm also curious what the distribution of images served is (e.g. are the vast majority a small subset at any given time that could cache well?), but it's more complicated to investigate, so I won't bug you with it).

This relates to this issue: github.com/mastodon/mastodon/i

@nafnlaus oh fuck, I have to increase the size of the instance or purge media files somehow. 🤔

@nafnlaus the instance is up since Autumn 2021, so for roughly a year.

@jonn It's not clear to me - what's the ratio between text, images and video?

The issue in question is various things that can be done to simultaneously increase image quality (which we're getting complaints about) while decreasing image storage size (which is always an issue). But some of the possibilities have complications, such as dealing with legacy clients.

Follow

@nafnlaus ok, assuming all the videos are converted to mp4, the orders of magnitude are:

Text: hundreds of megabytes (500 MB), Videos: thousands of megabytes (5 GB), Images: tens of thousands of megabytes (25 GB).

· Web · 1 · 0 · 0

@jonn Okay, very interesting! So images ARE the real culrpit that needs to be fixed!

Sign in to participate in the conversation
Doma Social

Mastodon server of https://doma.dev.