So I played with it a bit to see if I can find any holes. I first found a few bugs that are not exploitable on Google Sites and reported those directly to the Google Caja team. These bugs are not yet fixed so I won't write about them at this time. However, when trying to exploit one of those bugs on Google Sites, I discovered another issue there related to the parsing of user-supplied HTML. This issue can be used to cause a stored XSS in sites.google.com.
In order to understand the issue, let's first look at how Google Sites handled some of the user-supplied HTML input.
Let's say that we entered something like this:
<noembed><![CDATA[ <script>alert(document.cookie)</script> ]]></noembed>
<noembed><![CDATA[ </noembed><script>alert(document.cookie)</script> ]]></noembed>
The parsing would fail. This is again the correct behavior, because the browsers would interpret the first occurrence of </noembed> as the closing tag despite it being in the CDATA tag. Thus, if something like that passed unchanged, the script would get executed. The actual problem stems from having multiple CDATA tags in a single noembed tag (or other tags that interpret special HTML characters literally). So for example
Considering everything written so far, it shouldn't be hard to combine it into a working exploit:
<noembed><![CDATA[ <]]><![CDATA[/noembed><script>alert(document.cookie)</script> ]]></noembed>
When parsing the HTML code above, the two CDATA blocks would get merged and, in doing so, a new closing </noembed> tag would be formed. Thus, the noembed tag would get closed before expected, and the content of the script tag would get executed. This is shown in the image below.
This issue was quickly resolved by the Google security team and now the HTML special characters are escaped even in noembed and similar tags. Thanks!
PS If you thought that my previous post about PRNG predictability in browsers is related to Google, I'll have to disappoint you - you'll have to wait a bit longer to find out just how I used that :-)