nasg/pandoc.py

__author__ = "Peter Molnar"
__copyright__ = "Copyright 2017-2019, Peter Molnar"
__license__ = "apache-2.0"
__maintainer__ = "Peter Molnar"
__email__ = "mail@petermolnar.net"

import subprocess
import logging
from tempfile import gettempdir
import hashlib
import os
import settings

class Pandoc(str):
    in_format = 'html'
    in_options = []
    out_format = 'plain'
    out_options = []
    columns = None

    @property
    def hash(self):
        return str(hashlib.sha1(self.source.encode()).hexdigest())

    @property
    def cachefile(self):
        return os.path.join(
            settings.tmpdir,
            "%s_%s.pandoc" % (
                self.__class__.__name__,
                self.hash
            )
        )

    @property
    def cache(self):
        if not os.path.exists(self.cachefile):
            return False
        with open(self.cachefile, 'rt') as f:
            self.result = f.read()
            return True

    def __init__(self, text):
        self.source = text
        if self.cache:
            return
        conv_to = '--to=%s' % (self.out_format)
        if (len(self.out_options)):
            conv_to = '%s+%s' % (
                conv_to,
                '+'.join(self.out_options)
            )

        conv_from = '--from=%s' % (self.in_format)
        if (len(self.in_options)):
            conv_from = '%s+%s' % (
                conv_from,
                '+'.join(self.in_options)
            )

        cmd = [
            'pandoc',
            '-o-',
            conv_to,
            conv_from,
            '--quiet',
            '--no-highlight'
        ]
        if self.columns:
            cmd.append(self.columns)

        p = subprocess.Popen(
            tuple(cmd),
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
        )

        stdout, stderr = p.communicate(input=text.encode())
        if stderr:
            logging.warning(
                "Error during pandoc covert:\n\t%s\n\t%s",
                cmd,
                stderr
            )
        r = stdout.decode('utf-8').strip()
        self.result = r
        with open(self.cachefile, 'wt') as f:
            f.write(self.result)

    def __str__(self):
        return str(self.result)

    def __repr__(self):
        return str(self.result)


class PandocMD2HTML(Pandoc):
    in_format = 'markdown'
    in_options = [
        'footnotes',
        'pipe_tables',
        'strikeout',
        # 'superscript',
        # 'subscript',
        'raw_html',
        'definition_lists',
        'backtick_code_blocks',
        'fenced_code_attributes',
        'shortcut_reference_links',
        'lists_without_preceding_blankline',
        'autolink_bare_uris',
    ]
    out_format = 'html5'
    out_options = []


class PandocHTML2MD(Pandoc):
    in_format = 'html'
    in_options = []
    out_format = 'markdown'
    out_options = [
        'footnotes',
        'pipe_tables',
        'strikeout',
        'raw_html',
        'definition_lists',
        'backtick_code_blocks',
        'fenced_code_attributes',
        'shortcut_reference_links',
        'lists_without_preceding_blankline',
        'autolink_bare_uris',
    ]


class PandocMD2TXT(Pandoc):
    in_format = 'markdown'
    in_options = [
        'footnotes',
        'pipe_tables',
        'strikeout',
        'raw_html',
        'definition_lists',
        'backtick_code_blocks',
        'fenced_code_attributes',
        'shortcut_reference_links',
        'lists_without_preceding_blankline',
        'autolink_bare_uris',
    ]
    out_format = 'plain'
    out_options = []
    columns = '--columns=80'


class PandocHTML2TXT(Pandoc):
    in_format = 'html'
    in_options = []
    out_format = 'plain'
    out_options = []
    columns = '--columns=80'
After long discussions, mainly listening and reading, I'm giving up on GPL licencing and moving to Apache 2.0. The short summary is that while I still sort of believe in what GPL stands for, reality is not that simple to immediately open source everything. Because GPL is scary, many people avoid it, and one of the main achievements of open source should be that nobody has to reinvent the wheel. 2018-12-03 10:36:10 +00:00			`__author__ = "Peter Molnar"`
updated copyright date; switched from frontmatter lib to pyyaml lib + regex because frontmatter is horrible at handling unicode 2019-01-05 11:55:40 +00:00			`__copyright__ = "Copyright 2017-2019, Peter Molnar"`
After long discussions, mainly listening and reading, I'm giving up on GPL licencing and moving to Apache 2.0. The short summary is that while I still sort of believe in what GPL stands for, reality is not that simple to immediately open source everything. Because GPL is scary, many people avoid it, and one of the main achievements of open source should be that nobody has to reinvent the wheel. 2018-12-03 10:36:10 +00:00			`__license__ = "apache-2.0"`
			`__maintainer__ = "Peter Molnar"`
			`__email__ = "mail@petermolnar.net"`

Back To Pandoc So, Python Markdown is a bottomless pit of horrors, including crippling parsing bugs, random   out of nowhere, lack of features. It's definitely much faster, than Pandoc, but Pandoc doesn't go full retard where there's a regex in a fenced code block, that happens to be a regex for markdown elements. Also added some ugly post string replacements to make Pandoc fenced code output work with Prism: instead of the Pandoc <pre class="codelang"><code>, Prism wants <pre><code class="language-codelang>, so I added a regex sub, because it's 00:32. 2018-08-04 00:28:55 +01:00			`import subprocess`
			`import logging`
jsonfeed and gopher fixes 2019-03-22 15:49:24 +00:00			`from tempfile import gettempdir`
			`import hashlib`
			`import os`
			`import settings`
Back To Pandoc So, Python Markdown is a bottomless pit of horrors, including crippling parsing bugs, random   out of nowhere, lack of features. It's definitely much faster, than Pandoc, but Pandoc doesn't go full retard where there's a regex in a fenced code block, that happens to be a regex for markdown elements. Also added some ugly post string replacements to make Pandoc fenced code output work with Prism: instead of the Pandoc <pre class="codelang"><code>, Prism wants <pre><code class="language-codelang>, so I added a regex sub, because it's 00:32. 2018-08-04 00:28:55 +01:00
jsonfeed and gopher fixes 2019-03-22 15:49:24 +00:00			`class Pandoc(str):`
- gopher? gopher. - no js needed button - removed donation, I'll figure out something better, sometimes, in the future - text/plain alternate, also for gopher - better Pandoc subclassing - bye message - first image becomes OG image - removed duplicate reply symbol - DAT .well-known prepare, not active - oembed singular vars, should they ever be needed - fixed target lookup for webmentions so it works both with index.html or with path only 2019-02-25 22:40:01 +00:00			`in_format = 'html'`
			`in_options = []`
			`out_format = 'plain'`
			`out_options = []`
			`columns = None`

jsonfeed and gopher fixes 2019-03-22 15:49:24 +00:00			`@property`
			`def hash(self):`
			`return str(hashlib.sha1(self.source.encode()).hexdigest())`

			`@property`
			`def cachefile(self):`
			`return os.path.join(`
			`settings.tmpdir,`
			`"%s_%s.pandoc" % (`
			`self.__class__.__name__,`
			`self.hash`
			`)`
			`)`

			`@property`
			`def cache(self):`
			`if not os.path.exists(self.cachefile):`
			`return False`
			`with open(self.cachefile, 'rt') as f:`
			`self.result = f.read()`
			`return True`

- gopher? gopher. - no js needed button - removed donation, I'll figure out something better, sometimes, in the future - text/plain alternate, also for gopher - better Pandoc subclassing - bye message - first image becomes OG image - removed duplicate reply symbol - DAT .well-known prepare, not active - oembed singular vars, should they ever be needed - fixed target lookup for webmentions so it works both with index.html or with path only 2019-02-25 22:40:01 +00:00			`def __init__(self, text):`
			`self.source = text`
jsonfeed and gopher fixes 2019-03-22 15:49:24 +00:00			`if self.cache:`
			`return`
- gopher? gopher. - no js needed button - removed donation, I'll figure out something better, sometimes, in the future - text/plain alternate, also for gopher - better Pandoc subclassing - bye message - first image becomes OG image - removed duplicate reply symbol - DAT .well-known prepare, not active - oembed singular vars, should they ever be needed - fixed target lookup for webmentions so it works both with index.html or with path only 2019-02-25 22:40:01 +00:00			`conv_to = '--to=%s' % (self.out_format)`
			`if (len(self.out_options)):`
			`conv_to = '%s+%s' % (`
			`conv_to,`
			`'+'.join(self.out_options)`
- CSS fixes and simplifications - prism.js inlined (only for entries with code blocks) - pandoc is a subclass is str now - added 'nasg' logger - minor bugfixes 2018-08-04 09:30:26 +01:00			`)`
redesign, new home ideas 2019-01-15 21:28:58 +00:00
- gopher? gopher. - no js needed button - removed donation, I'll figure out something better, sometimes, in the future - text/plain alternate, also for gopher - better Pandoc subclassing - bye message - first image becomes OG image - removed duplicate reply symbol - DAT .well-known prepare, not active - oembed singular vars, should they ever be needed - fixed target lookup for webmentions so it works both with index.html or with path only 2019-02-25 22:40:01 +00:00			`conv_from = '--from=%s' % (self.in_format)`
			`if (len(self.in_options)):`
			`conv_from = '%s+%s' % (`
			`conv_from,`
			`'+'.join(self.in_options)`
			`)`

			`cmd = [`
redesign, new home ideas 2019-01-15 21:28:58 +00:00			`'pandoc',`
			`'-o-',`
- gopher? gopher. - no js needed button - removed donation, I'll figure out something better, sometimes, in the future - text/plain alternate, also for gopher - better Pandoc subclassing - bye message - first image becomes OG image - removed duplicate reply symbol - DAT .well-known prepare, not active - oembed singular vars, should they ever be needed - fixed target lookup for webmentions so it works both with index.html or with path only 2019-02-25 22:40:01 +00:00			`conv_to,`
			`conv_from,`
redesign, new home ideas 2019-01-15 21:28:58 +00:00			`'--quiet',`
- gopher? gopher. - no js needed button - removed donation, I'll figure out something better, sometimes, in the future - text/plain alternate, also for gopher - better Pandoc subclassing - bye message - first image becomes OG image - removed duplicate reply symbol - DAT .well-known prepare, not active - oembed singular vars, should they ever be needed - fixed target lookup for webmentions so it works both with index.html or with path only 2019-02-25 22:40:01 +00:00			`'--no-highlight'`
			`]`
			`if self.columns:`
			`cmd.append(self.columns)`

redesign, new home ideas 2019-01-15 21:28:58 +00:00			`p = subprocess.Popen(`
- gopher? gopher. - no js needed button - removed donation, I'll figure out something better, sometimes, in the future - text/plain alternate, also for gopher - better Pandoc subclassing - bye message - first image becomes OG image - removed duplicate reply symbol - DAT .well-known prepare, not active - oembed singular vars, should they ever be needed - fixed target lookup for webmentions so it works both with index.html or with path only 2019-02-25 22:40:01 +00:00			`tuple(cmd),`
redesign, new home ideas 2019-01-15 21:28:58 +00:00			`stdin=subprocess.PIPE,`
			`stdout=subprocess.PIPE,`
			`stderr=subprocess.PIPE,`
			`)`

			`stdout, stderr = p.communicate(input=text.encode())`
			`if stderr:`
			`logging.warning(`
			`"Error during pandoc covert:\n\t%s\n\t%s",`
			`cmd,`
			`stderr`
			`)`
			`r = stdout.decode('utf-8').strip()`
- gopher? gopher. - no js needed button - removed donation, I'll figure out something better, sometimes, in the future - text/plain alternate, also for gopher - better Pandoc subclassing - bye message - first image becomes OG image - removed duplicate reply symbol - DAT .well-known prepare, not active - oembed singular vars, should they ever be needed - fixed target lookup for webmentions so it works both with index.html or with path only 2019-02-25 22:40:01 +00:00			`self.result = r`
jsonfeed and gopher fixes 2019-03-22 15:49:24 +00:00			`with open(self.cachefile, 'wt') as f:`
			`f.write(self.result)`
- gopher? gopher. - no js needed button - removed donation, I'll figure out something better, sometimes, in the future - text/plain alternate, also for gopher - better Pandoc subclassing - bye message - first image becomes OG image - removed duplicate reply symbol - DAT .well-known prepare, not active - oembed singular vars, should they ever be needed - fixed target lookup for webmentions so it works both with index.html or with path only 2019-02-25 22:40:01 +00:00
			`def __str__(self):`
			`return str(self.result)`

			`def __repr__(self):`
			`return str(self.result)`


jsonfeed and gopher fixes 2019-03-22 15:49:24 +00:00			`class PandocMD2HTML(Pandoc):`
- gopher? gopher. - no js needed button - removed donation, I'll figure out something better, sometimes, in the future - text/plain alternate, also for gopher - better Pandoc subclassing - bye message - first image becomes OG image - removed duplicate reply symbol - DAT .well-known prepare, not active - oembed singular vars, should they ever be needed - fixed target lookup for webmentions so it works both with index.html or with path only 2019-02-25 22:40:01 +00:00			`in_format = 'markdown'`
			`in_options = [`
			`'footnotes',`
			`'pipe_tables',`
			`'strikeout',`
			`# 'superscript',`
			`# 'subscript',`
			`'raw_html',`
			`'definition_lists',`
			`'backtick_code_blocks',`
			`'fenced_code_attributes',`
			`'shortcut_reference_links',`
			`'lists_without_preceding_blankline',`
			`'autolink_bare_uris',`
			`]`
			`out_format = 'html5'`
			`out_options = []`


jsonfeed and gopher fixes 2019-03-22 15:49:24 +00:00			`class PandocHTML2MD(Pandoc):`
- gopher? gopher. - no js needed button - removed donation, I'll figure out something better, sometimes, in the future - text/plain alternate, also for gopher - better Pandoc subclassing - bye message - first image becomes OG image - removed duplicate reply symbol - DAT .well-known prepare, not active - oembed singular vars, should they ever be needed - fixed target lookup for webmentions so it works both with index.html or with path only 2019-02-25 22:40:01 +00:00			`in_format = 'html'`
			`in_options = []`
			`out_format = 'markdown'`
			`out_options = [`
			`'footnotes',`
			`'pipe_tables',`
			`'strikeout',`
			`'raw_html',`
			`'definition_lists',`
			`'backtick_code_blocks',`
			`'fenced_code_attributes',`
			`'shortcut_reference_links',`
			`'lists_without_preceding_blankline',`
			`'autolink_bare_uris',`
			`]`


jsonfeed and gopher fixes 2019-03-22 15:49:24 +00:00			`class PandocMD2TXT(Pandoc):`
- gopher? gopher. - no js needed button - removed donation, I'll figure out something better, sometimes, in the future - text/plain alternate, also for gopher - better Pandoc subclassing - bye message - first image becomes OG image - removed duplicate reply symbol - DAT .well-known prepare, not active - oembed singular vars, should they ever be needed - fixed target lookup for webmentions so it works both with index.html or with path only 2019-02-25 22:40:01 +00:00			`in_format = 'markdown'`
			`in_options = [`
			`'footnotes',`
			`'pipe_tables',`
			`'strikeout',`
			`'raw_html',`
			`'definition_lists',`
			`'backtick_code_blocks',`
			`'fenced_code_attributes',`
			`'shortcut_reference_links',`
			`'lists_without_preceding_blankline',`
			`'autolink_bare_uris',`
			`]`
			`out_format = 'plain'`
			`out_options = []`
jsonfeed and gopher fixes 2019-03-22 15:49:24 +00:00			`columns = '--columns=80'`


			`class PandocHTML2TXT(Pandoc):`
			`in_format = 'html'`
			`in_options = []`
			`out_format = 'plain'`
			`out_options = []`
			`columns = '--columns=80'`