extract_msg.msg_classes.message_base Module

Module contents

class extract_msg.msg_classes.message_base.MessageBase(path, **kwargs)[source]

Bases: MSGFile

Base class for Message-like MSG files.

Supports all of the options from MSGFile.__init__() with some additional ones.

Parameters:
  • recipientSeparator – Optional, separator string to use between recipients.

  • deencapsulationFunc – Optional, if specified must be a callable that will override the way that HTML/text is deencapsulated from the RTF body. This function must take exactly 2 arguments, the first being the RTF body from the message and the second being an instance of the enum DeencapType that will tell the function what type of body is desired. The function should return a string for plain text and bytes for HTML. If any problems occur, the function must either return None or raise one of the appropriate exceptions from extract_msg.exceptions. All other exceptions must be handled internally or they will not be caught. The original deencapsulation method will not run if this is set.

asEmailMessage() EmailMessage[source]

Returns an instance of EmailMessage used to represent the contents of this message.

Raises:

ConversionError – The function failed to convert one of the attachments into a form that it could attach, and the attachment data type was not None.

deencapsulateBody(rtfBody: bytes, bodyType: DeencapType) bytes | str | None[source]

A method to deencapsulate the specified body from the RTF body.

Returns a string for plain text and bytes for HTML. If specified, uses the deencapsulation override function. Returns None if nothing could be deencapsulated.

If you want to change the deencapsulation behaviour in a base class, simply override this function.

dump() None[source]

Prints out a summary of the message.

getInjectableHeader(prefix: str, joinStr: str, suffix: str, formatter: Callable[[str, str], str]) str[source]

Using the specified prefix, suffix, formatter, and join string, generates the injectable header.

Prefix is placed at the beginning, followed by a series of format strings joined together with the join string, with the suffix placed afterwards. Effectively makes this structure: {prefix}{formatter()}{joinStr}{formatter()}{joinStr}…{formatter()}{suffix}

Formatter be a function that takes first a name variable then a value variable and formats the line.

If self.headerFormatProperties is None, immediately returns an empty string.

getJson() str[source]

Returns the JSON representation of the Message.

getSaveBody(**_) bytes[source]

Returns the plain text body that will be used in saving based on the arguments.

Parameters:

_ – Used to allow kwargs expansion in the save function. Arguments absorbed by this are simply ignored.

getSaveHtmlBody(preparedHtml: bool = False, charset: str = 'utf-8', **_) bytes[source]

Returns the HTML body that will be used in saving based on the arguments.

Parameters:
  • preparedHtml – Whether or not the HTML should be prepared for standalone use (add tags, inject images, etc.).

  • charset – If the html is being prepared, the charset to use for the Content-Type meta tag to insert. This exists to ensure that something parsing the html can properly determine the encoding (as not having this tag can cause errors in some programs). Set this to None or an empty string to not insert the tag. (Default: ‘utf-8’)

  • _ – Used to allow kwargs expansion in the save function. Arguments absorbed by this are simply ignored.

getSavePdfBody(wkPath=None, wkOptions=None, **kwargs) bytes[source]

Returns the PDF body that will be used in saving based on the arguments.

Parameters:
  • wkPath – Used to manually specify the path of the wkhtmltopdf executable. If not specified, the function will try to find it. Useful if wkhtmltopdf is not on the path. If :param pdf: is False, this argument is ignored.

  • wkOptions – Used to specify additional options to wkhtmltopdf. this must be a list or list-like object composed of strings and bytes.

  • kwargs – Used to allow kwargs expansion in the save function. Arguments absorbed by this are simply ignored, except for keyword arguments used by getSaveHtmlBody().

Raises:
  • ExecutableNotFound – The wkhtmltopdf executable could not be found.

  • WKError – Something went wrong in creating the PDF body.

getSaveRtfBody(**_) bytes[source]

Returns the RTF body that will be used in saving based on the arguments.

Parameters:

kwargs – Used to allow kwargs expansion in the save function. Arguments absorbed by this are simply ignored.

injectHtmlHeader(prepared: bool = False) bytes[source]

Returns the HTML body from the MSG file (will check that it has one) with the HTML header injected into it.

Parameters:

prepared – Determines whether to be using the standard HTML (False) or the prepared HTML (True) body. (Default: False)

Raises:

AttributeError – The correct HTML body cannot be acquired.

injectRtfHeader() bytes[source]

Returns the RTF body from this MSG file (will check that it has one) with the RTF header injected into it.

Raises:
  • AttributeError – The RTF body cannot be acquired.

  • RuntimeError – All injection attempts failed.

save(**kwargs) Tuple[SaveType, List[str] | str | None][source]

Saves the message body and attachments found in the message.

The body and attachments are stored in a folder in the current running directory unless :param customPath: has been specified. The name of the folder will be determined by 3 factors.

  • If :param customFilename: has been set, the value provided for that will be used.

  • If :param useMsgFilename: has been set, the name of the file used to create the Message instance will be used.

  • If the file name has not been provided or :param useMsgFilename: has not been set, the name of the folder will be created using :property defaultFolderName:.

  • Setting :param maxNameLength: will force all file names to be shortened to fit in the space (with the extension included in the length). If a number is added to the directory that will not be included in the length, so it is recommended to plan for up to 5 characters extra to be a part of the name. Default is 256.

It should be noted that regardless of the value for :param maxNameLength:, the name of the file containing the body will always have the name ‘message’ followed by the full extension.

There are several parameters used to determine how the message will be saved. By default, the message will be saved as plain text. Setting one of the following parameters to True will change that:

  • param html:

    will output the message in HTML format.

  • param json:

    will output the message in JSON format.

  • param raw:

    will output the message in a raw format.

  • param rtf:

    will output the message in RTF format.

Usage of more than one formatting parameter will raise an exception.

Using HTML or RTF will raise an exception if they could not be retrieved unless you have :param allowFallback: set to True. Fallback will go in this order, starting at the top most format that is set:

  • HTML

  • RTF

  • Plain text

If you want to save the contents into a ZipFile or similar object, either pass a path to where you want to create one or pass an instance to :param zip:. If :param zip: is set, :param customPath: will refer to a location inside the zip file.

Parameters:
  • attachmentsOnly – Turns off saving the body and only saves the attachments when set.

  • saveHeader – Turns on saving the header as a separate file when set.

  • skipAttachments – Turns off saving attachments.

  • skipHidden – If True, skips attachments marked as hidden. (Default: False)

  • skipBodyNotFound – Suppresses errors if no valid body could be found, simply skipping the step of saving the body.

  • charset – If the HTML is being prepared, the charset to use for the Content-Type meta tag to insert. This exists to ensure that something parsing the HTML can properly determine the encoding (as not having this tag can cause errors in some programs). Set this to None or an empty string to not insert the tag (Default: 'utf-8').

  • kwargs – Used to allow kwargs expansion in the save function.

  • preparedHtml – When set, prepares the HTML body for standalone usage, doing things like adding tags, injecting attachments, etc. This is useful for things like trying to convert the HTML body directly to PDF.

  • pdf – Used to enable saving the body as a PDF file.

  • wkPath – Used to manually specify the path of the wkhtmltopdf executable. If not specified, the function will try to find it. Useful if wkhtmltopdf is not on the path. If :param pdf: is False, this argument is ignored.

  • wkOptions – Used to specify additional options to wkhtmltopdf. this must be a list or list-like object composed of strings and bytes.

property bcc: str | None

The “Bcc” field, if it exists.

property body: str | None

The message body, if it exists.

property cc: str | None

The “Cc” field, if it exists.

property compressedRtf: bytes | None

The compressed RTF stream, if it exists.

property crlf: str

The value of self.__crlf, should you need it for whatever reason.

property date: datetime | None

The send date, if it exists.

property deencapsulatedRtf: DeEncapsulator | None

The instance of the deencapsulated RTF body.

If there is no RTF body or the body is not encasulated, returns None.

property defaultFolderName: str

Generates the default name of the save folder.

property detectedBodies: BodyTypes

The types of bodies stored in the .msg file.

property header: Message

The message header, if it exists.

If one does not exist as a stream, it will be generated from the other properties.

property headerDict: Dict[str, Any]

A dictionary of the entries in the header

property headerFormatProperties: Dict[str, str | Tuple[str | None, bool] | None | Dict[str, str | Tuple[str | None, bool] | None]] | None

Returns a dictionary of properties, in order, to be formatted into the header.

Keys are the names to use in the header while the values are one of the following:

  • None: Signifies no data was found for the property and it should be omitted from the header.

  • str: A string to be formatted into the header using the string encoding.

  • Tuple[Union[str, None], bool]: A string should be formatted into the header. If the bool is True, then place an empty string if the first value is None, otherwise follow the same behavior as regular None.

Additional note: If the value is an empty string, it will be dropped as well by default.

Additionally you can group members of a header together by placing them in an embedded dictionary. Groups will be spaced out using a second instance of the join string. If any member of a group is being printed, it will be spaced apart from the next group/item.

If you class should not do any header injection, return None from this property.

property headerInit: bool

Checks whether the header has been initialized.

property headerText: str | None

The raw text of the header stream, if it exists.

property htmlBody: bytes | None

The HTML body, if it exists.

property htmlBodyPrepared: bytes | None

The HTML body that has (where possible) the embedded attachments inserted into the body.

property htmlInjectableHeader: str

The header that can be formatted and injected into the HTML body.

property inReplyTo: str | None

The message id that this message is in reply to.

property isRead: bool

Whether this email has been marked as read.

property isSent: bool

Whether this email has been marked as sent.

Assumes True if no flags are found.

property messageId: str | None
property parsedDate: Tuple[int, int, int, int, int, int, int, int, int] | None

A 9 tuple of the parsed date from the header.

property receivedTime: datetime | None

The date and time the message was received by the server.

property recipientSeparator: str
property recipients: List[Recipient]

A list of all recipients.

property recipientTypeClass: Type[IntEnum]

The class to use for a recipient’s recipientType property.

The default is extract_msg.enums.RecipientType. If a subclass’s attributes have different meanings to the values, you can override this property to return a valid enum.

property reportTag: ReportTag | None

Data that is used to correlate the report and the original message.

property responseRequested: bool

Whether to send Meeting Response objects to the organizer.

property rtfBody: bytes | None

The decompressed Rtf body from the message.

property rtfEncapInjectableHeader: bytes

The header that can be formatted and injected into the plain RTF body.

property rtfPlainInjectableHeader: bytes

The header that can be formatted and injected into the encapsulated RTF body.

property sender: str | None

The message sender, if it exists.

property subject: str | None

The message subject, if it exists.

property to: str | None

The “To” field, if it exists.

filename: str | None