External Files Causing a WordPress 404 Error?

I’m trying to find a satisfactory way of adding WordPress tags and theme elements (such as the sidebar) to pages that exist outside of WordPress. A non-WordPress page could then appear to be seemlessly incorporated into the site, wherein the layout automatically updates with changes to the theme template files, and could use the same header, sidebar, and footer as a normal WordPress page.

The first few solutions that I found involved adding a <?php require('./wp-blog-header.php'); ?> line to each non-WordPress page. This does indeed allow the page to incorporate WordPress tags, theme elements and styles, but there is a serious drawback to this method because of the way WP identifies pages. When you click on the link to a WP page, or enter it into the address bar, you aren’t actually going to a file that resides at that address. Instead, WP uses that address as an instruction to pull various database entries and form an index.php page that resides in the WP installation directory. For example, while the address for this page appears to be http://www.ardamis.com/2006/07/10/wordpress-googlebot-404-error/ , the actual page is at http://www.ardamis.com/index.php.

WordPress assumes that it is responsible for every page on the site, and that there are no pages on the site that it doesn’t know about. Therefore, if you try to browse to a page that WP doesn’t know about, it sends you a 404 error page instead. This is fine so long as you don’t create any pages outside of WordPress.

A problem arises when you create a page that WP doesn’t know about. As far as WP is concerned, it doesn’t exist, so it puts a 404 error in the http header, but the page does exist, so the server sends the page with the 404 error. This behavior seemed to cause some problems with some versions of IE but none with Firefox. It did, however, result in a 404 header being given to Googlebot, so that non-WordPress pages would incorrectly show up in Google Sitemaps as Not Found.

To get around this problem, I’ve found that requiring a different file, wp-config.php, and selecting specific functions for use in the page results in a page that can use all of the desired tags and theme elements and also sends the correct header code: HTTP Status Code: HTTP/1.1 200 OK

This method is accomplished with the following code, adjusting the path to wp-config.php:

<?php require('./wp-config.php');
$wp->init();
$wp->parse_request();
$wp->query_posts();
$wp->register_globals();
?>

<?php get_header(); ?>

<div id="content">
<div class="post">
<h2>*** Heading Goes Here ***</h2>
<div class="entry">
*** Content in Paragraph Tags Goes Here ***
</div>
</div>
</div>

<?php get_sidebar(); ?>

<?php get_footer(); ?>

Testing the method

Using wp-blog-header.php as the include, I created a GoogleBot/WordPress 404 Testing Page as the index.php file in the /testing/ folder. I added the url http://www.ardamis.com/testing/ to my Google xml sitemap, and waited for the sitemap to be downloaded. Sure enough, a few days later Google Sitemaps was listing the /testing/ url among the Not Found errors.

The next step was to remove what I suspected was the culprit, the include of the WordPress header, wp-blog-header.php, and see if Googlebot could again access the page. A few days after removing the include, and after the sitemap was downloaded, the Not Found error disappeared. I’m interpreting this as Googlebot once again successfully crawling the page.

The third step was to use the above code, including wp-config.php and then testing the HTTP Request and Response Header. The header looks ok, and Googlebot likes it. It looks like this does the trick.

  • E-mail this story to a friend!
  • Facebook
  • Digg
  • StumbleUpon
  • del.icio.us
  • Google Bookmarks
  • Technorati
  • LinkedIn
  • Reddit
  • MySpace
  • Slashdot
  • SphereIt
  • Sphinn
  • Mixx

6 Responses to “External Files Causing a WordPress 404 Error?”

  1. Scrivs says:

    You might have found a solution for this already, but we were having the same problem at 9rules and I found that sticking this line of code worked for us.

    < ?php if (!have_posts()) { header('HTTP/1.1 200 OK'); } ?>

    It forces WP to throw a 200 code to all robots instead of the 404 so they can spider and index all of your pages. Hope that helped some if you still needed it.

  2. xurizaemon says:

    I made a similar workaround, but not using have_posts() because I wasn’t using posts.

    Instead I wanted to insert WP functions in a js.php file – which obviously doesn’t want WP throwing in its templating.

    Here’s my fix, via a link to the Trac ticket on trac.wordpress.org:

    http://trac.wordpress.org/ticket/2984

    
    <?
      define('WP_USE_THEMES', false) ;
      require('../../../wp-blog-header.php');
      header("HTTP/1.1 200 OK");
      header("Status: 200 All rosy") ;
     ?>
    

    Works a treat over at http://paperfish.co.nz/paperfish/ – see http://paperfish.co.nz/paperfish/wp-content/themes/paperfish/paperfish-js.phps

  3. xurizaemon says:

    If you don’t add the Status: header, you still get a “Status: 404 File Not Found” – which is NOT a 404 header per se, but still not really meant to be there. There may be a more correct 200 Status header, but that works :)

    http://paperfish.co.nz/paperfish/wp-content/themes/paperfish/paperfish-js.php is the evaluated JS file

  4. [...] thanks to these dudes for telling me about [...]

  5. [...] en appliquant la solution trouvée déjà chez Ardamis, qui consiste à ne pas appeler le fichier blog-header, mais le wp-config, et ensuite à faire les [...]

  6. Uk shore says:

    Thanks xurizaemon and ardamin,
    I’ve spent over two hours fiddling with the .htaccess file to make the 404s go away (google webmasters kept saying some of my non-blog pages didn’t exist), and the .htaccess wasn’t even the issue – it was the wp-blog-header file…
    Thanks again!

Leave a Reply

Wrap code snippets in <code></code> tags.