Another follow-up to Google and apologies to my regular visitors

So I’ve jam-packed a great deal of work into the past few days in my quest to find the problem with my website’s Google indexing.

I have made progress thanks to Matt Cutts, Brent, Chris, Logician, Dylan, g1smd, and many others who stood by and cheered me on with helpful tidbits filling my SEO challenged fabulous fifty brain.

I don’t know if I have a complete solution yet but I feel very much closer to the end of the problem than before. So this is good news!

What I discovered on my end:

  1. My thread titles needed to be tidied up.
  2. My htaccess needs to be fixed to make sure my domain resolves at and not or other assorted similar urls. Indexing both constitutes duplicate content. Not something I did intentionally, in fact I thought I resolved the issue a year or more ago when system admin created a 301 rewrite in htaccess for the domain. But apparently it also needs to be applied to the forum directory, too.
  3. My meta descriptions were pulling the same description from the built in templates. Fixed to make sure they were unique to each thread.
  4. I double checked my robots.txt and found something more to add.
  5. I resubmitted my sitemap, removing more content from the config level.
  6. More memory has been alloted to php so as not to serve white pages when the google bot arrives on top of our already busy site.

My main question which sent me on this tizzified quest: Why isn’t Googlebot visiting my site? Was I penalized? Where have all the indexed pages gone? Long time passing. Where have all the indexed pages gone. Long time ago. (Apologies to Joan Baez)

We created a page to log Googlebot. To log the threads it indexes and to note the time it visits. I figured it would be a place to hang out and watch for any activity since I wasn’t seeing the googlebot on “who’s online”.

The page was created. A miracle occurred! The bot of Google was in the house!

We timestamped the log from Dec 1: Looking at my log page I see a list of threads and this message at the top:

Since Dec.1.2007 Google bot visited 25,417 threads!

Ah ha! Relief! The answer to my biggest mystery. I was asking the wrong question. The question isn’t “why hasn’t the bot been visiting?” rather, “why isn’t there evidence of the google’s bot work in the site: search function?”

Are my threads in Google’s index or not?

I do a simple site: search to find that 13,000 threads result.

But. BUT. BUT!!!!

If I do an advanced search, looking for keyword: “hystersisters com” in domain “” I find 415,000 threads indexed!

It is possible I’ve been pulling out my hair for nothing much except a broken function on Google? Or not. It could be (based on my traffic stats) that some results are missing from SERP because this is where it all started. Searching for specific discussions I knew should be in the search index.

Dear Google-sweetie,
When you send my adsense gift for the year, will you send something wonderful? My favorite gift was the mood/light/radio from 2004. The USB kit wasn’t used much because my Mac OSX wasn’t happy compatible with some of it, but is so nifty I took it for “show and tell” at the family Christmas in 2005. I was deemed “cool” by the teen geeks in the crowd. Last year’s 2006 digital frame is still in the box because sadly, my Mac OSX couldn’t find a way to connect to it.

So this year I have a request. Please send a fuzzy hat. Put a Google logo on it. I’ll use it to cover my bald head from all the hair pulling I did this week making sure you hadn’t broken up with me.

Sincerely yours,



  1. I think you need to ask Matt how the canonical search for indexed pages should be formatted.

    The kind of search you do makes a huge difference. First, note that 940,000+ pages show as indexed from Searching Yahoo! or Altavista. At Google I tried some common tricks (626k is the result from the best search I know):

    642,000 com
    626,000 -zz
    500,000 -zzzz
    407,000 “”
    404,000 -vcjmghweuz1
    329,000 inurl:com
    98,700 site: com
    98,800 site: “”
    94,300 site: inurl:com
    97,000 site:

    “Indexed pages in your site – uses the site: operator to return a SAMPLE list of your indexed pages.”

    Also, read reason #1 in “25 Things I Hate About Google”

  2. Some references feel that
    is the canonical approach, which is very similar to your search, and produces similar numbers

  3. kathy

    Thank you Dave. That is very helpful to me. I was looking at site:search as “the” numbers indexed. I discovered with the com site:search that the numbers varied greatly. Still confusing…but relieved the pages are indeed indexed. Thank you.

  4. It looks like your income is down because Google has thrown many of your pages out of it’s main index, and into it’s supplemental index.

    Since Matt thought there was a problem based on the information you gave him, and followed up with others at Google (without further enlightenment), I presume he was thinking about your pages in the main Google index vs. the pages in the supplemental index.

    In any case, if Matt thinks it’s an issue, perhaps best to follow up with him. “[your situation has] generated some good discussion [at Google], but no definitive cause for having a small number of pages indexed right now.”

    Note that having your pages indexed by Google, but ending up as supplemental results can be nearly the same as having them not indexed. As Matt noted, some extra page rank helps (I assume he means to get your pages out of “supplemental hell” and into the main index), and he promised you a followed link in a future “these sites are cool” post to help give you more page rank.

    You mentioned at one point your income was way down, and while it’s good to know Google IS indexing your pages (at least as supplemental results), good to find out how to get more of them back in the main index, and why they have been banished from there.

    Supplemental results often have to be requested specifically by the searcher (by clicking a link at the end of the results), and most searchers don’t do this/aren’t aware of what to do. Even when they show automatically, they will only be shown after the main results.

    Best wishes!!

  5. kathy

    Thanks Dave. So the supplemental results are what we get when we use the inurl: along with site: and using the site: search alone just provides available pages indexed?

    329,000 site inurl <–this really is an accurate reflection on the pages of content. And the site: search <—–is 12,800 tonight. So the difference in numbers are the pages in supplemental limbo and not available on SERP? Yes, I need to find out from Matt how to get those from supplemental limbo to SERP. Would be good to get page rank. I’ve never run around the internet asking for links nor paying for them. LOL. I’ve been a 5 since the system first was introduced and it bugs me I can’t seem to move beyond that. I figured that this was just because there was no one in our niche who outranked me. (And this is true…except wiki page which links to us). Thanks again for your kindness, Dave!

  6. Kathy i’m not sure if you read my reply in your last post on this here, but blocking showpost.php and calendar.php in your Robots.txt will cull about 2-3 Million needless pages.

    This equates to alot more link equity and crawl budget being distributed to your main showthread.php pages and will definitely serve to bring these pages back in to the main index and ranking better.

  7. Sorry to double post, but i can’t edit my last one.

    I just had a look at your Robots.txt again and you have done it and you have added the .php extension, excellent!

    It will take a bit for Google to drop these pages (Sorry Matt but Google is slow in this department) however distributing your domains authority between less pages will mean more authority passing to your important pages.

    So the sheer number of URL’s indexed may fall, however the ratio of indexed pages to supplemental pages will improve.

    The primary goal is to present the single most important version of a page to Google and focus your authority/pagerank to these pages, while removing duplicate pages (like print versions, the single post version showpost.php etc) and removing “fluff” pages that are not important like calendar.php.

    You don’t want the “most” pages in Google, you want the “best” pages in Google.

    [I just got an Email you have replied to my above post]

    You are welcome Kathy, with this small modification alone things should greatly improve over the coming few weeks.

  8. kathy

    I appreciate it! I’m looking forward to the coming weeks when more pages leap from supplement purgatory and are indexed properly. Maybe my Christmas present from Santa!

  9. Kathy, you’re doing fantastic work on the technical side! So nice to find such a worthy site managed so well.

    I’ll try to add some light to the supplemental ratio question (should have noted what a challenge it is), but first, the core of the question you might consider asking could be like:

    “Matt, you said your folks found ‘no definitive cause for having a small number of pages indexed,’ and I was wondering how I can accurately check for how many pages have been indexed in the main index (vs. the supplemental index) to test to see if the situation improves from the things we are doing to try to improve our site structure.”

    Regarding supplemental results: Determining what they are is controversial/challenging. Google has changed their search engine query operators more than once relative to finding main vs. supplemental results.

    However, folks in Google discussing why you have a small number of pages indexed and trying to figure it out is FACT #1, IMHO. Any specific tests would take a back seat to internal Google conclusions, and they seemed to conclude it appeared you had fewer pages indexed then you should. So MY conclusion is it seems fair to ask them how you can test for the supplemental/main index ratio, since they have suggested there is a problem that needs solving.

    Note that most info you’ll find on the web about calculating supplemental ratio is old/inaccurate. Supplemental queries have had a limited lifetime. It’s a confusing area with a moving target! To read some opinions of how Google began obfuscating this, read

    The most popular approach to finding how many pages are in the main index seems to be the query .. but I can’t vouch for that result (5,140) as being meaningful! (Isn’t it interesting that you have 329K and half the searches find more pages, and half less…). A ratio tool (
    ) comes up with the same number (5,140), which, again, may not be meaningful.

    A few misc points:
    · A lot has to do with how much page rank is flowing in your vertical. If there is a lot page rank flowing, hystersisters definitely “feels” like a 6. Take a look at this popular tool for a sense of your site strength:
    · Carly: your explanation of goal I couldn’t agree more. Ratio of indexed pages to supplemental pages should definitely improve.

  10. .. of course, oops, I should have noted the obvious: Matt left for PubCon the day before yesterday, and won’t be back just yet, and won’t be blogging until he is back.

  11. kathy

    Dave, its a good thing to find folks who are willing to provide insight and understanding to an area that I don’t quite “get” and SEO is one of those topics.

    I took a look at the page and don’t understand some of the criteria outlined in that report. For instance:

    Position at Google for first four words of title tag on target URL: Not in top ten

    Actually the title of the site: Hysterectomy Support Discussions, Before Hysterectomy, After Hysterectomy, Recovery: Hystersisters

    The keyword: Hysterectomy I rank very well. Usually in the top 3 positions. Wiki and I flip flop back and forth for #2 or #3 and sometimes, Hystersisters ends up in #1 position before an edu or gov site takes it over again.

    And alex ranking is baffling. No one on my website is a geek with an alexa toolbar.

    You also said:

    · A lot has to do with how much page rank is flowing in your vertical. If there is a lot page rank flowing, hystersisters definitely “feels” like a 6.

    Does vertical mean internal pages, linking to one another?

    Today I have some “good news” or perhaps it is good to me alone only because I don’t understand much…. Checking site: search yesterday was 12,700 pages. Today, we finally broke the 13,000 barrier. 13,100 pages. So I was thinking this is a good thing that 1. I’m “up” in numbers even if a bit and 2. I broke the 13,000 barrier which has not been possible lately.

    As for the ratio of supplemental vs index, with your number of 5140, how could this be meaningful? Does this presume that for every page in index there are 5140 in supplement?

    I appreciate you helping me with the exact question to ask Matt upon his return to the blogging world after pubcon. If I had known about pubcon, I really might have bought a ticket just to ask him personally. :)

  12. Sorry for muddying things up with obscure lingo:

    · “Vertical” means category (example of top level categories would be things like sports, entertainment, business etc.)
    · The “not in top ten” is an error. I’ve notified them. Google sometimes makes changes to limit automatic access to it’s index—you can try the link on that page and see it give an error.
    · Alexa rank is just a popular measure, they do categorize your site correctly, and they are tracking a fair amount of traffic for it:

    If you were to attempt a ratio calculation based on some of the info in the SEO community, you might end up with something like:
    = 1.43% to 3.64% of pages in main index
    .. but again, this is nothing but “interpreted conjecture!” However, knowing something about your site history and prominence and interpreting results from that perspective, it seems likely they have indexed most of the pages one way or another. And your effort to direct page rank properly WILL pay off, whether measurable by supplemental ratio directly or not.

    Your internal link structure controls “link flow,” which affects page rank, which affects which pages are supplemental, and which are not.

    Here’s a couple places to get you started:

  13. Arrgh! Of course, the strongest subpages tool seems to broken at the moment. Apologies.

    Also, just a thought: You mentioned a willingness to budget for SEO education ;) .. by visiting PubCon to see Matt. You might want to pay a top SEO for a site review and some summary suggestions to pursue. There are a number of things hystersisters is not doing that would be easy and helpful, such as consistent semantic tagging of subheads.

    Here’s a couple places to start without paying anyone:

  14. kathy

    There are a number of things hystersisters is not doing that would be easy and helpful, such as consistent semantic tagging of subheads

    :) If I knew what this was, couldn’t I do that myself? Please define?

    Glancing at article there is much I have done on that list. Truly there are few other sites in our niche. None with more authority. Gaining incoming links is a difficult matter.

    Yahoo directory: I’m there (was added for free, back in the day when Yahoo picked “hot sites”.

    Press releases: We’ve done this

    Other publications: we’ve been linked to and mentioned when I was interviewed in US News World Report, USA Today, etc.

    Yes, if I knew that an SEO expert could do things that I can’t do myself, I could budget a tiny bit for the project if they did the work for me. :) However, I’m not a woman rolling in dough. The reason I do so much for myself if because of funding.

    Frequent flyer miles is another thing. I have those. Have miles, can travel.

  15. · I figured you were up on a lot of what Debra lists, and it’s oriented to new sites, but it’s still a good checklist to see if anything is missing from your efforts.
    · An common example of semantic tagging is to use the or tag on subheads, and ensure subheads use keywords/keyword phrases relevant to page topic.
    · Mostly I was thinking get targeted advice at low cost, and do it yourself.

  16. I see the comments system edited out my tags, so let’s try again:

    “A common example of semantic tagging is to use the or tag on subheads …”

  17. kathy

    Ahh. yes the h 2 and h 3 tags. Its something we can do to tweak the CSS. My designer did not use standard tags for the design and we went back and added h 1 tag. I can tweak the css to include h 2 and h 3.

  18. You got it! This helps the most on pages caught in the supplemental index, or where there are not a lot of other pages competing for the same terms (long tail results).

    There are also a variety of ways of getting more clicks from the same results positioning: one of the most targeted is adjusting your meta name=”description” tags (which sometimes show in the SERPs) by PPC testing, although that would appear to be something outside of your budget.

    The idea is to not only make your results show up as high as possible in the SERPs (Search Engine Results Pages), but to make them something your most desirable visitors will click on vs. the other results.

  19. Continuing on with some less obvious optimizations, keyword research (whether through PPC, keyword tools, studying referrer logs, etc.) can give you the opportunity to:
    · Contact your top referrers through backlinks and suggest/request (optimal) keyword changes in the backlink.
    · You can optimize the keywords/phrases in some of your more global (seen on many pages) H2 and H3 tags.

  20. Also, since I brought up the meta name=”description” tag, note that they need to be different on different pages. For example, Matt Cutts has noted:

    “When all [meta description tags] look the same, [Google] often collapses them together and shows the “click here to see all the duplicates” message … make them all different, [and Google will] immediately show plenty of results [for that site] again.”

    (By the way Kathy, great security word choices in your comments captcha!)

  21. Oops. Continuing my crazy high comment count here, Matt has just announced at PubCon that there will be only 2 total urls from a domain in any set of search results. So eliminating duplication of meta name=”description” tags won’t get you more than two results!

  22. kathy

    So if someone searched for keyword string that would provide 10 pages to my website, only 2 would show up?

    Why do they keep changing the rules for those of us that like to follow them?

  23. kathy

    I did tweak my meta descriptions anyway. The forums are set so that the meta description is the same on all pages. I changed that last week. I inserted the thread title into the description which should help make each unique.

    Thanks for the comments on my captcha words. I’ll have to go look at what I inserted. :D

  24. kathy

    Hey Dave, are you in PubCon at the moment? If so, if you get Matt’s attention, mention Hystersisters and the mystery of the supplemental page/ratio/indexing. :)

  25. Yup, that’s the result. Although there were already some tweaks in their algorithm that had a similar result. Google is trying to make changes to keep the spammers from pushing good sites like yours down where no one can find them.

    In this case, the consensus is very supportive of Google—I think some SEOs can hardly believe Google is being as responsive as they are—because spammers have unbelievably abused multiple results from a single domain through subdomains. This change is to block that blackhat strategy.

    The time frame on this change is “… Google will roll out in a few weeks a new filter … ”

  26. Hey Kathy,

    Not actually there, or I would have already tried to track down Matt about this. It’s just my day to catch up on the week’s SEO news, so I’m more on top of it in real time than I would normally be.

  27. Going back to less-obvious optimizations, I see your adWords (and nearby links) at hystersisters sometimes follows common advice, sometimes not. Is it formatted differently in different places for multi-variate testing purposes? Usually if it is similar to nearby text it should look EXACTLY the same (within Google guidelines). I’ve noticed it is never EXACTLY the same, but varies in similarity to the nearby text.

    Optimizing the appearance of of AdWords is definitely one way to increase your click-through income. There are a few best practices and lots of tips and tricks to that, while still keeping within Google’s guidelines.

  28. kathy

    Are you talking about adsense ad blocks? I don’t have adwords going at the moment. (Unless the pause …unpaused without my knowledge?)

  29. kathy

    Actually they are each one uniquely coded (by adsense standards) to match the background, color of links, color of text, etc. My adsense income is actually quite good. :) No complaints there. Income tends to be down because general overall traffic is down even compared to pre-penalty days: before April 2006. Before that I was seeing 9K-12K visitors per day. Since that time: 7K per day is my average with sometimes bigger days (depending on media exposure) and sometimes lower (because women are general offline during holidays.) So this is the lowest point of our year anyway.

  30. “… to match the background, color of links, color of text, etc.”

    That’s what I’m trying to point out—they actually don’t quite match. It’s close but no cigar. Most sites need to change the appearance of nearby links as well as set up the ads to get an EXACT match—and it’s the EXACT match where the best click-through rate is (in some cases, and yours looks like one).

    If you are happy with AdSense income, then you REALLY should optimize further. That means a small percent increase would mean real money for you. For example, you have a two-column sidebar in some areas utilizing AdSense. This is a prime, prime area for mixing ads and editorial content—appropriately—for increased click-through. But you only have other ads where you use the two-column format, and no editorial content.

  31. Here’s an example of what one expert says:

    “People have a tendency to adapt to advertisements and become numb to them. Placing both ad columns right next to each other forces their eyes to decide which of the two columns is the content … [they] invest added interest in the two double ad columns [if it’s not clear that they are both ads]. Hence the click through rate is unusually higher.”

  32. kathy

    Can you point out where an adsense is close to matching but doesn’t?

    If you are talking about the size of the border, that isn’t controlled by me. Adsense alters the size of the border and size of font.

    The background colors are the exact same color numbers. The links identical. So I’m curious, of course, to find out where you are seeing it doesn’t quite match?

    Editorial content in the side bar? Hadn’t thought of that. I don’t want to add one more javascript to pull unique content on each page. I’ll give it some thought. :D

  33. When I say editorial content, that means such as your “Hysterectomy News” which appears first in the sidebar. Editorial means “not ads.”

    “Adsense alters the size of the border and size of font.” You can set it to not alter it, unless I’m out of touch on this and policies have changed. And your content doesn’t exactly match any of the formats (1-line, 2-line, 4-line).

    Matching your editorial content to look like the ads is as important as doing what you can to make the ads look like your content. The important thing is that the result is that they look IDENTICAL. On hystersisters:

    · Editorial links use no underline; Ads have underlines.
    · Editorial content indented; Ads not indented.
    · Editorial uses header line, Ads have no header.
    · Editorial content font uses slightly different style.

    Just getting the look identical isn’t the only kind of optimization you can do, nor will it make as much difference as some other changes, but it’s the easiest one to tweak without changing much else.

    (BTW, your captcha security words are things such as loving, grand, etc. Nice!)

  34. kathy

    The border is randomly changes from my pt 1 to whatever it chooses. Same with the font size or style. I have no control over that unless you can show me adsense rules that have changed?….. BTW, the number of ads on the ad block is determined by whether the advertiser has targeted my site or not. Sometimes it is 1 in a block and huge, underlined and all. Sometimes there are 4 and no underlines. Really, this is something I can’t control in my account. I know there are some premium adsense publishers that are allowed to do that. I’m not one of them.

    As for the ads not being indented, I would have to add another 20 pixels to the width of that side column to do that. Indenting the ad means pushing things over and I can’t do that at the present size of the forums.

  35. I understand. You’re doing everything very well already! Some thoughts:

    · I wouldn’t indent the ads, I would de-indent the content. It’s not about what changes, it’s about matching the look.
    · You could optimize for one of the ad sizes, probably the four-line, which is the closest to your existing content, and a very common size.
    · I’ve never seen a multi-line non-image Google ad without the headline underlined or in bold style. Hence I would suggest using bold and underline in nearby editorial links.
    · Testing against a two-column format matching ads and editorial would probably be worth trying without too much fuss.

  36. kathy

    Goodness! We have been chatty here, haven’t we?

    I appreciate your help and input and I’ll give it some thought to work on optimizing some of the surrounding styles for editorial content to match the google ads.

    However I can’t control the number of links in a box. For instance: When one ad is in a box with HUGE, GIGANTIC text (ugh, the ultimate of ugly), these are advertisers that are paying to specifically be on Often these are pharmaceuticals who’s marketing guys needs a crash course in targeting keywords. Often we see the name of the product, then their company name. Nothing about symptoms or words that the visitors would recognize. Example:

    AMBIEN CRâ„¢ Is Different

    (Zolpidem Tartrate Extended Release /CIV) Get Info At The Official Site

    What if the visitor doesn’t recognize the word: Ambien. There is nothing in the next few lines of text that explains or educates the visitor to engage them to click. What is Zolpidem Tartrate anyway??

    Anyway, I hadn’t thought to change the style of my site to match adsense. THat means arial, font size 10 (in most cases). Will have to create a new google css class. :)

  37. I understand about the targeting. Those big ads are as often as not primarily branding, and the PPC is solid. I was thinking about at least matching one of the formats, probably the 4-line one.

    And .. not sure I’ve been that much help, so I’m going to invoke the “if it were me …” rule: If it were me, I would create a forum page explaining the benefits of what you are doing, and asking for someone with expertise to donate some of their time to review your situation and recommend some low-financial risk things to test to improve your situation, however you define it (I defined it narrowly as measurable contribution to profit).

    Then you could visit SEO/SEM forums and sites and ask for help–ask them to refer you to other sites/forums too, (and of course mention your special relationship with Matt Cutts :) ) and let them know you will promote them as much as you are able should they be helpful. Link to your page explaining your mission asking for expert assistance for your worthy cause.

    One result is that you will get more people like me bringing their expertise to your forum, but you will have asked for specific help, so it should be of a higher quality. (I’ve just focussed on a few interesting things, since you didn’t ask for anything specific or note any things you might be willing to try beyond learning about the initial indexing info.)

    Your mission would be something like (you would need to say whatever makes sense to you, this is just a rough idea):

    “After reviewing my site and practices, what would be the best low-financial risk things to test to increase site profits? I assume you will have a list of questions for me to answer so you are up to speed on whatever you need before reviewing my site. If these changes can be proved to ‘pay for themselves,’ I can pay for additional advice/help.”

    Myself, while I do donate money and some in-person, on-site time to worthy causes, I generally need my computer time to earn me the funds to donate elsewhere! But if you set up a place for people to provide assistance with your mission, be sure to let me know!

  38. Appears I may have mischaracterized what Matt said at PubCon about subdomains. Matt noted over at Sphinn:

    “This isn’t a correct characterization of what Google is looking at doing. What I was trying to say is that in some circumstances, Google may move closer to treating subdomains as we do with subdirectories. I’ll talk about this more at some point after I get back from PubCon.”


Comments are closed.