Jump to content

Recommended Posts

Posted

Twitter, that brilliant marriage of self-absortion and short attention spans, is now valued at $1 billion, quadrupled since February, despite having a revenue of exactly $0 and no (public) plan to raise that number. There's no possible way this isn't a sound investment. (Also: Is it 1999?)

Posted

The idea that politicians are furiously twittering away trying to reach out to their constituents and appear 'connected' when, I think, most people find Twitter to be a ridiculous, faddish waste of time ...strikes me as funny.

Posted
Twitter, that brilliant marriage of self-absortion and short attention spans, is now valued at $1 billion, quadrupled since February, despite having a revenue of exactly $0 and no (public) plan to raise that number. There's no possible way this isn't a sound investment. (Also: Is it 1999?)

 

No profit via adverts?

Posted
I wonder how profitable Twitter spam would be :P

 

At least it wouldn't be over 140 characters.

 

Seriously, the only thing I can think of use-wise is some sort of data mining, but I doubt it would be of too much use. It may have potential - which seems to be enough for virtual companies again. There is some sort of value to having a very low overhead and actively engages millions (6 million users last I checked) of people. Not sure if that's a billion dollars, but perhaps no one who does these assessments understands. With all the stories of companies turning down opportunities to purchase companies at 1/10th their later value (especially if we forget the crash :D) I guess "unknown potential" is still a commodity.

Posted

Slightly longer answer: their architecture is appalling.

 

I am curious about this because their concept is so simple, it sounds like a real special kind of fail to pull that off - can you elaborate?

Posted
I am curious about this because their concept is so simple, it sounds like a real special kind of fail to pull that off - can you elaborate?

 

Yes, the concept is simple: Twitter provides what's called a pub/sub system (i.e publish/subscribe). This model is prevalent throughout the Internet: mailing lists, IRC, RSS, etc. all implement pub/sub systems.

 

Twitter is effectively backed by a content management system: subscribers get incoming tweets added to their timeline, so when a message goes out, it's delivered to all subscribers and written out to their respective timelines (e.g. in a database somewhere)

 

The way Twitter chose to implement all of this is, well, fairly retarded: Twitter stores the state of processing a tweet incrementally throughout the system. That means if any part of the processing chain breaks down, they lose tweets! It also means that if any of these systems breaks, they have to get that particular system back online and processing its disk log to recover those tweets. And worse: everything is written to disk every step of the way, further adding to the latency of the system.

 

A better approach is to create a system which is stateless, meaning that messages "in flight" are transient and if they are lost the system can recover, and a system which is idempotent, meaning that if a message is accidentally processed more than once it has no effect on the system.

 

Compounding all of this is that Twitter decided to write pretty much all of the pieces of their system from scratch. While there are many robust message queues and message queuing protocols, some designed specifically around short messages (i.e. STOMP), Twitter decided to write their own message queue, in Ruby, and use memcache as their queueing protocol, something it was never designed for.

 

Twitter leveraged none of the existing technology out there for doing what they're doing and they've been paying for it ever since.

Posted
Yes, the concept is simple: Twitter provides what's called a pub/sub system (i.e publish/subscribe). This model is prevalent throughout the Internet: mailing lists, IRC, RSS, etc. all implement pub/sub systems.

....

 

Wow, I was curious and that does not disappoint. Are you saying that they actually track the state of a single "tweet" as it propagates through the entire list of subscribers sequentially, and only completes the transaction after it has been verified as closed by every end point destination?

 

Does this mean that a faulty client could cause server side lags in delivery to other clients?

 

I imagine the "entire list of subscribers" to a single tweet at least leverages various "push ahead" servers so that subscribers don't all reside in a single database (hub/spoke/hub/spoke/etc) so that you just have secondary servers handling the subscribers it's responsible to iterate (yuck) through. Considering that there are a million and one flavors of reliable UDP protocols out there that sounds odd to go that way. It's at least not TCP based is it?

 

Honestly it sounds like a Reliable UDP would be perfect, though I've only written one RUDP server and it didn't require scaling so I am no expert.

Posted
Wow, I was curious and that does not disappoint. Are you saying that they actually track the state of a single "tweet" as it propagates through the entire list of subscribers

 

Yes

 

sequentially

 

I assume they have homebrewed their own pub/sub mechanism. I don't know how they handle delivery to the entire subscriber list. My guess would be that users are "sharded", and that the pub/sub mechanism looks up what users are on particular shards, delivers a messages to a particular shard, and allows that shard to handle writing out the particular database records, but that's just a guess. It's not a sequential process I hope, but it is still one that is stateful end-to-end.

 

A better architecture would allow all message queues to die and lose all their data from the point a tweet is initiated, and the system could safely recover.

 

and only completes the transaction after it has been verified as closed by every end point destination?

 

Verification happens every step of the way. Every process hands the tweet off to the next process, through their "dumb" Kestrel (previously Starling) asynchronous message queues. Many, many queues exist which implement the pub/sub mechanism natively, and more to the point people have built distributed architectures which implement the Twitter-style pub/sub model without being stateful or requiring complex interfacing between disparate systems over a mechanism like the memcache protocol. Twitter could be done much, much better, given a proper architecture and a proper set of tools. To get to the point: Twitter could seriously use Erlang, a language designed for building these sorts of systems.

 

Does this mean that a faulty client could cause server side lags in delivery to other clients?

 

Not even a faulty or malicious client. Anyone with a large number of followers strains Twitter's system whenever they send a tweet.

 

I imagine the "entire list of subscribers" to a single tweet at least leverages various "push ahead" servers so that subscribers don't all reside in a single database (hub/spoke/hub/spoke/etc) so that you just have secondary servers handling the subscribers it's responsible to iterate (yuck) through. Considering that there are a million and one flavors of reliable UDP protocols out there that sounds odd to go that way. It's at least not TCP based is it?

 

Yes, it's based on TCP, which isn't necessarily a bad idea, especially on a closed internal network. Twitter's present architecture is bound by disk write speeds, since they log everything to disk every step of the way. It's truly a stupid architecture.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.