Meatgrinder HTML Inconsistency

Mark Coker blames Microsoft Word for formatting issues on files processed through his “Meatgrinder” (MG). I decided to do an experiment. The original file that I uploaded to Smashwords has morphed because I finally made the changes suggested by my editors and readers. The file that had been ground up by MG no longer exists, so I couldn’t run the experiment with the file that was on Smashwords.

I decided to re-upload the new file.

Before I uploaded it, I checked to make sure that the files formatting was consistent. For those familiar with formatting, one of the features in MSWord is the ability to “Select All X Instances” of a particular style. In my case X=1691. That means there were 1691 paragraphs with that same indented format. Since selecting all instances results in highlighting every paragraph with that format, I was able to scan the entire document visually and see that every story text paragraph was the same format, as recommended in the “Style Guide.”

A screen showing a portion of the 1691 paragraphs selected is below. These are example paragraphs for which the HTML from MG and MSWord later in this article are taken.

After uploading the new file (it passed autovetter with no problems), I downloaded the Smashwords Epub version and unzipped it to look at what MG had done to it. I noticed inconsistencies in the paragraph formatting MG turned out . That seemed strange. I had made sure that every MSWord paragraph had exactly the same style. Worse, viewing the Epub file using Calibre showed the same mix of indented and block paragraphs that Mark Coker had noted yesterday with the old upload.

Next, I saved the MSWord file that I uploaded as a filtered HTML file (the same format I use to start creating an Epub file at the XHTML level.) MSWord, knowing that all paragraphs had the exact same style, assigned the exact same HTML paragraph style to every one.

MG instead had assigned different styles, seemingly randomly, to different paragraphs.

Every one of the paragraphs MSWord produced had <p class=StyleTimesNewRomanJustifiedFirstline03After0pt> as the style (not very elegant, but we are talking MSWord).

However, MG assigned two different sets of styles, seemingly randomly to paragraphs. Style tags <p class=”c13″><span class=”c14″> from MG. delineated indented paragraphs. The other randomly assigned style tags <p class=”c15″>”<span class=”c16″> delineated block paragraphs with a leading blank lines above and below.

The next image compares the MG and the MSWord HTML renderings of the same MSWord file. The HTML on the left is rendered by MSWord consistently for all paragraphs. The HTML on the right is the random HTML generated by MG.

It is apparent that even if you follow the “Style Guide” and make sure that all your paragraphs have exactly the same style, MG is somehow assigning different styles randomly which makes your book look unprofessional, but it is Smashwords Meatgrinder making it that way—not the author or MSWord.

For the sake of the code visibility, you can view a larger image here.

Advertisements

7 Comments

Filed under Meatgrinder, Meatgrinder HTML Conversion Problems, Smashwords

7 responses to “Meatgrinder HTML Inconsistency

  1. In my experience, saving as filtered HTML gets rid of just the sort of cruft I imagine the Meatgrinder is choking on. You wouldn’t believe the sort of crap MS Word leaves lying around that you can’t see (there is usually a sign of just about every formatting change you’ve ever made, even though it’s long gone). It’s a pity there’s no way to clean it up without nuking the file. Which is precisely the issue I’m having with an e-book I’m doing for a friend, since her formatter flaked on her.

    I’m curious if you could post some of the CSS so I can see just what those referenced classes do. It might shed some light (particularly to compare the paragraph style to the span style) on what sort of cruft is floating in your file.

    Also, something interesting I noticed from you screen captures there … the open double quotes appear before the c16 spans. It’s strange and makes me doubly wonder what those span styles are doing.

    • Ryan, I think you may be on to something. What I thought was random assignment of indented vs. block paragraphs now appears to have a pattern. Meatgrinder assigns Block style to those paragraphs that start with a quotation and leaves paragraphs indented that don’t start with a quote.

      I wonder how many other people have been frustrated by the Smashwords Meatgrinder messing up paragraphs like this and being told that they had to use the “Nuclear” option because of bad formatting. I think we’ve seen that Coker is quick to blame everyone else (Microsoft, authors, etc.) when the real problem is in his Meatgrinder.

      It’s been a Meatgrinder conversion problem all along. I checked quite a few of the other chapters in the book and they are all the same. Paragraphs beginning with a quote are assigned a block style by Meatgrinder (the c15 and c16 CSS codes you noted) and the other paragraphs are kept indented which is the paragraph style assigned in MSWord to all 1691 paragraphs in the body of my story.

      It appears that even if you follow Coker’s style guide, Meatgrinder can still find a way to mess up your files.

      • I still wouldn’t completely discount Word having something to do with it (mainly because I’m quick to blame Microsoft for just about every fault that exists with their software that wouldn’t exist if only the coder had a brain), but there’s clearly something going on here. I couldn’t say for sure without seeing the direct HTML output from the file and not the filtered one. As I said, filtered HTML file clears out a lot of the junk that can cause these errors, while the Doc file will continue to contain those errors when it is submitted to the Meatgrinder. Unfortunately, there’s no way to press a button and clean up the Word files. If it were that simple, I would be finished e-book formatting for a friend of mine (the files confound even KDP; I don’t even want to know what the Meatgrinder would do to it).

        Of course, I’m not saying it’s impossible that it’s not the Meatgrinder’s fault, but since I have been submitting documents to the Meatgrinder for almost 3 years now without any of these issues others have (beyond the first try, back in 2010 for a client of mine, that involved it choking on paragraph spacing AND indentation), I can’t help but wonder if there’s something else at work here.

        For certain, this is very, very strange. Unfortunately, converting from one format to another has never been a clean process, as evidenced by all of the attempts to convert MS Word files from one format to another in competing office software. Without detailed and consistent application of a format, one can never achieve a perfect conversion, especially when converting to formats that are too different.

        They used to accept HTML files, which in my opinion is the better bet for conversion to e-books as they most closely resemble e-books (being HTML+CSS). Unfortunately, because most people made their HTML files in WYSIWYG editors and things like MS Word, that produce broken, garbage, invalid HTML files, Smashwords stopped accepting them due to the hassles involved in making them valid. I can’t say I blame them for that one.

    • Ryan,

      I added the Unfiltered HTML from the word document on the page on my website that allows more room for viewing than this blog. (http://cm-lance.com/MG-MSW%20HTML%20Comp.htm)
      You will see that the paragraph style code is exactly the same in the filtered and unfiltered HTML versions. With one exception that is noted on the website page.

      The code that is filtered out is the style cruft in the beginning. However, there isn’t any reason that paragraphs with exactly the same paragraph styles – one after the other, with no intervening style definitions – should be treated differently by Meatgrinder.

      • Ah, I see. Well in that case, I can’t help but concur. There must be something wrong with the Meatgrinder, whether that be something within the styles or the HTML. Either way, it seems to be choking on lines that start with open double quotes. Why it does that is beyond me. It seems to use a combination of OpenOffice and Calibre as its internals, I assume heavily modified lest it be able to accept any other format we send it. I don’t recall seeing that kind of behaviour with Calibre, but since I generate my epub and mobi files via scripts (that generate an epub then run Kindlegen for the mobi), I have not experienced that sort of behaviour.

        And of course, at the same time, I haven’t had any issues with my own books having open double quotes and experiencing this kind of bizarre behaviour, so it remains a mystery as to what sort of combination of factors results in confounding Meatgrinder so.

        What I find particularly interesting is the sort of output you seem to be getting from the Meatgrinder. Whereas you have those paragraph tags coupled with span tags with separate classes in both success and failure cases, I have just one paragraph tag in all cases:

        The class is “c” when I start a chapter, since the first paragraph is unindented.

        It’s certainly interesting the kind of inconsistency that’s being seen here. If not for the fact I know it would give me a headache and I’d want to gouge out my eyes long before I finished reading it all, I would love to see the source of the Meatgrinder to see just what it’s trying to do.

  2. Here’s the Meatgrinder generated CSS for the HTML shown.
    .c13 {
    color: #000;
    direction: ltr;
    display: block;
    font-family: Times New Roman, serif;
    line-height: 110%;
    margin-bottom: 0;
    margin-left: 0;
    margin-right: 0;
    margin-top: 0;
    orphans: 2;
    text-align: justify;
    text-indent: 0.3in;
    widows: 2
    }
    .c14 {
    font-size: 0.88889em
    }
    .c15 {
    color: #000;
    direction: ltr;
    display: block;
    line-height: 115%;
    margin-bottom: 0.08in;
    margin-left: 0;
    margin-right: 0.5in;
    margin-top: 0.17in;
    orphans: 2;
    text-align: justify;
    widows: 2
    }
    .c16 {
    font-family: Times New Roman, serif;
    font-size: 0.88889em

  3. Pingback: A new look at self-publishing | The "Professional" Blog of J. M. Brink

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s