Written by Joe Rinehart
Sunday, 27 June 2010 13:38
The other day I received one of the best search engine optimization (SEO) questions ever! It came as no surprise that the question came from a traditional print magazine media publisher client, so I asked him if he'd mind if I used the question and answer for an article on my website.
Kevin Ireland, Publisher of http://www.InsiteGainesville.com and http://www.GainesvilleBizReport.com (both sites of which are using the Joomla content management system) which represent his Gainesville, Florida print media, asked:
Hey Joe, I'm trying to figure a way that I can make the online PDFs of my
magazines searchable by Google. By that I mean if someone plugs "inventor
John Smith" into Google, the PDF of my magazine that includes an article
with inventor John Smith will come up high in the returned results. We've
already saved all PDFs in searchable form, so someone who opens the
magazine can search for specific words but we can't figure a way to get
Google to drill down into the pages to identify specific key words. Do you
know of a method?
My emailed response which I reserve the right to edit for the benefit of everyone down the road:
PDF's are searchable by default these days and Google has got really good at it. However, if the PDF is created in Photshop as opposed to Adobe Pagemaker or MS Word, it'll be one big image and Google can't index the text within images. So Adobe. MS Word, or any text processor editor that'll convert to PDF is the only way to go. FYI, all the articles within Joomla have the ability to be converted to PDF, assuming your development company didn't turn the feature off.
The example you gave regarding a John Smith, is well, not the best example because, the last name Smith is one of the most common ;) However I'd suggest if someone typed John Smith in Gainesville, you'd have a shot.
The fastest method of getting Google to index your PDF's is to first have a sitemap and in Joomla I'd recommend Xmap. Then you'd go to webmaster tools using your Google account or one you have established for all the Google goodies and your site(s) and make sure it's registered.
Another thing to be aware of is that Google doesn't actually index every single word! There are what's known as stop words and here's a URL of the most common:
http://www.link-assistant.com/seo-stop-words.html
Also, the same principles apply to PDF's as HTML pages. ie:
1. The name of the file in the actual URL or address bar. If you are trying to be at the top for a name like smith you'd better have it everywhere ;)
2. The Title and or header ie: usually at the top, is actually more important than the majority of the content.
3. The 1st paragraph following the title or header.
4. Of course, the remaining content on the page.
5. If you are using photos consider adding a caption of text underneath
6. Take advantage of the document meta data that PDF-creation software or Adobe Reader itself offers.
The big thing you need to do is to make sure the PDF's are in a sitemap. If not picked up via the Joomla Xmap XML sitemap, then you'll need to spend a few bucks for one that will index all non-joomla files not attached within your web site. The one I recommend and have set up several times includes:
http://www.xml-sitemaps.com/ and they also have a free up to 500 pages index that can be submitted to Google or copied to your web site :)
This was a great question! I might even have to edit this response for an article for my website FAQ's;)
If you have any further questions, please contact me.
Best 'net regards,
Joe
Last Updated on Monday, 17 May 2010 17:04
Written by Joe Rinehart
Monday, 03 October 1994 19:00
Joe Rinehart at a Canton Ohio computer club in 1993
Welcome to the Joe page!I'm
Joe Rinehart from Ravenna, Ohio. Carl Knorr who first thought of the name and I co-founded the company
config.sys which later evolved into the name
config.com to avoid confusion between a few other config somethings throughout the World.
This photo of me was taken by "The MusicMan" of the
Buckeye Computer Club in Canton, Ohio while I was discussing "
Information Technology" in 1993. I've since cut my hair to Marine corps standards.
In 1999, I helped develope a company called
Config.Com, Inc.. These days I keep very busy
managing and consulting on information technology and electronic publishing projects. Although, I am the President of Config.Com, Inc. for legal reasons however, I really prefer to use the title Network Administrator since several others enjoy helping within our corporate environment.
If you are interested in more personal information about me you will find it on my bio page.
The collection of CD-ROM reviews that first got World-wide attention which I started before the web, is no longer available or on-line. With the help from Zbigniew Tyrlik of Apk.Net we created the alt.cd-rom.reviews newsgroup, then with Ian Vershuren formerly of Apk.Net a homepage for the project in the early days of web development which was the first web page in Ohio. This is a hard thing to prove these days as even archive.org didn't exist back then, but that's OK, because there were plenty of other firsts. It was also during this era that I authored the first URL ever advertised in a computer magazine (Computer Shopper Oct 1993 - Oct 1994).
Over the years the entire early staff of Apk.Net has made contributions toward the success of the alt.cd-rom.review project! The project has received as many as 731,000 total page requests weekly. The project went off-line in Oct 1996 due to the high traffic and that we no longer sold CD-ROM titles to fund it or had sponsorship. I might point out that several larger companies started their own review programs rather than offer sponsorship funding for ours which was discouraging. Although I no longer collect CD-ROM reviews, I still monitor the alt.cd-rom.reviews newsgroup when time permits.
My first non-profit web project
I had a lot of fun working with Dr. Brandon Ward in 1995 consulting for Child Search which lead to the placing their worthy charitable organization on the web. Dr. Ward, has since found T-3 hosting sponsorship, learned HTML and has been editing the web site with the same enthusium in which he founded the organization. These days, not much is left of the orginial site except the scanned images;^)
The Akron, Ohio Area Internet Directory was the first of several community Internet directories which I created is now also off-line since these days the yellow pages of every kind are database driven and keep current information. The Akron community directory project was recognized by Compton's on-line encylopedia, a book "It's raining cat's and sea dog's" (about minor league baseball), and in several newspaper articles.
Our Kent, Ohio Area Community directory has since become our model community directory due to access of information as a result of a high degree of support from: The Kent, Ohio Area Chamber of Commerce, the City of Kent, Ohio, Kent State University, the local school system, and of course, the citizens. Ken McGregor of the ArtArmory.com has since taken over the active management of the project which has now been online for over a decade.
Our company maintains a single guestbook of all combined projects which is always pleasant reading. Please consider contributing your public comments as well.
On June 4th 1996 config.sys Productions, Ltd. started providing Internet services in Portage County, Ohio through a sales agreement with Apk.Net. We used the Internet services resellor opportunity to further develope our cybercafe (public access) market research project and became the 1st "Public Access" high speed T-1 site in Ohio. In 1997 config.sys Productions, LLC was closed and on On December 24th, 1998, we restructured to expand our well known commercial services to include consumer dial-in services and are no longer associated with APK.net!
On July 1st 1997, one of the former principles left the company to develope his own Internet services alliance in Portage county so we restructured the company to become Config.Com, Inc. and Config.Sys Productions, LTD. was disolved legally. Ironically, the new competition (Nacs.Net) bought their bandwidth connection from Apk.Net ;^) We choose NOT to purchase our bandwidth from another ISP, but rather, wholesale direct to the Internet OC-3 backbone via full T-1. This fact in addition to the state of the art network hardware listed on our consumer dial-in access server
http://home.config.com/ allow us to offer the highest quality of access possible for our Portage County, Ohio communities.
Since around July of 1997, I've been keeping busy developing the
config.sys/config.com commercial internet clients and helping whomever I can get connected. Prior to July of '97 we focused on publishing community content for Northeast Ohio.
Read more: My career history