Use a robots.txt File to Guide Spiders
Robots.txt files (often erroneously called robot.txt, as singular) are created to mark, or disallow, files and directories of a web site that cooperating search engine spiders should not access, which are otherwise publicly viewable. Robots are often used by search engines to categorize and archive web sites.
Your robots.txt file should be placed in the root directory of your domain. For websites with multiple sub-domains, each sub-domain must have its own robots.txt file. If example.com had a robots.txt file but a.example.com did not, the rules that would apply for example.com will not apply to a.example.com.
The protocol, however, is purely advisory. It relies on the cooperation of the web robot, so that marking an area of a site out of bounds with robots.txt does not guarantee privacy. Some web site administrators have tried to use the robots file to make private parts of a website invisible to the rest of the world, but the file is necessarily publicly available and its content is easily checked by anyone with a web browser.
It works like this: a search spider wants to visit a website URL, say example.com. Before it does, it first checks for https://www.example.com/robots.txt, and finds;
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages in those subfolders.
You can learn more about how to set up your robots.txt file at www.robotstxt.org.
Use Branded Images for the Spiders
Maximize on Google's image and universal search by having a few correctly named and branded pictures ready for them to serve up in results on queries for your type of content.
Ideally you already have your robots.txt file set up to block the search spiders from your images folder: depending how frequently the spiders come through to index your content, this can save you a large amount of bandwidth and of course keep them from serving up all your pictures for free. It should look like this:
User-agent: Googlebot-Image
Disallow: /images/
Set up a folder just for the Google image bot — name it along the lines of "/public-images/" and set this folder open for indexing in your robots.txt file. It should look like this:
User-agent: Googlebot-Image
Allow: /public-images/
In this new public image folder place some of your best teaser pictures for Google to index. Make sure the picture files are named appropriately for the content, e.g. bare-bottom-spanking.jpg or big-natural-boobs.jpg.
Also be sure to brand those pictures by either watermarking your domain name across the picture or adding a branding panel. The idea is to make Google Images work for you rather than against you. Most folks are completely missing this branding opportunity; make sure you're staying a step ahead of them.
Use an SPF Record to Thwart Email Spammers
Many folks have complained about the spam generated using their domains and how there's little anyone can do about it. The spammer puts some-addy@your-domain.com in the "From" field and you get the headaches — as well as possibly getting your domain flagged as a spammer, when all those bounced emails come back to you.
There is a solution to this problem, however: the Sender Policy Framework Project, which boils down to being a simple two-line text file that fights return-path address forgery and makes it easier to identify spoofs.
Domain owners identify sending mail servers in DNS using two text files placed on their website's server. SMTP receivers (the email address receiving the spam mail) can verify the envelope (email) sender's address against this information that was plugged in to the DNS zone on a website's server, and can thus distinguish between authentic messages and forgeries before any messages are received in the recipient's email inbox.
In other words, the SPF files help the receiving email servers to identify whether the email was in fact sent from your domain or if it is a spoof against your domain, if it is a spoof then the email is not delivered. Here is the two-line text file you will need:
v=spf1 record for xbiz.com
v=spf1 ip123.456.789.01 -all
Of course you will place your domain name and MX Record IP number in the file. If you do not have access to your DNS Zone settings, then copy and paste your SPF text in to an email request to your servers/hosting support staff and ask them to add the text file for you. If you have access to your DNS Zone settings, then enter the following into your DNS Zone panel:
NAME: leave blank
TYPE: TXT
VALUE: paste in your SPF text
COMMENT: is optional
Note that SPF records should be published in the DNS as type .SPF records. Give your server time to refresh and propagate the new text info then test it using the Kitterman SPF Record Testing Tools — run the first test on the page to verify your SPF file is recognized and set up properly.
Use HTML, XML, ROR and URL.TXT Sitemaps
According to www.sitemaps.org, "Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL, such as when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site, so that search engines can more intelligently crawl the site. Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of indexing your site."
Google's new sitemap protocol was developed in response to the increasing size and complexity of websites. Business websites often contained hundreds of products in their catalogs; while the popularity of blogging has led to webmasters updating their material at least once a day; not to mention popular community-building tools like forums and message boards. As websites became bigger and bigger, it was difficult for search engines to keep track of all this material, sometimes "skipping" information as it crawled through these rapidly changing pages.
Through the XML protocol, search engines could track the URLs more efficiently, optimizing their search by placing all the information in one page. XML also summarizes how frequently a particular website is updated, and records the last time any changes were made.
The best online tool I've found to create sitemaps is www.xml-sitemaps.com. The free version will supply you with all four sitemap file types (xml, ROR, txt and html) for sites up to 500 pages. If your website is larger than 500 pages, you can purchase the unlimited site map software for only $19.99.
Once your sitemaps are generated, upload them to the root of your domain. Note: Yahoo prefers to use the url.txt format, though they do honor the xml versions.
Use the Tools from Google, Yahoo & MSN
Since we count on search engines to bring us traffic it stands to reason that you want to utilize the webmaster tools provided by the leading search engines to maximize those opportunities.
All three of the top search engines provide you with basic webmaster tools that give you detailed reports about your websites' visibility. These tools give you their view of your website, how they're indexing it and help you to diagnose any problems.
Each of these tools is self explanatory, but help is provided to guide you through the processes if you need it. You can find the tools here: Google; MSN; and Yahoo!.
Monitor Your Website's Traffic Stats
Knowing where your website traffic comes from, how folks find your website at the search engines and how folks interact with your website is critical information to your bottom line. Armed with this information you can significantly improve your search engine rankings and perk up any ad campaigns you run for best results.
Hands down, one of the best tools available is Google Analytics. It is free, easy to use and provides you with sophisticated features to track your website traffic. Best of all it is scalable for any size website.
Sign up today to discover how to strengthen your marketing initiatives and create higher-converting pages at www.google.com/analytics.
That's it for the website structure series of articles. In our next article we'll begin on best SEO practices, including optimizing your pages for search engines — a vital task if you want people to find your pages, as there are millions and millions of web pages on the Internet, and the chances that your customers will find yours is very slim unless you work on optimizing your site.