extract_msg Package

Subpackages/Submodules

Package contents

extract_msg:

Extracts emails and attachments saved in Microsoft Outlook’s .msg files.

https://github.com/TeamMsgExtractor/msg-extractor

class extract_msg.Attachment(msg: MSGFile, dir_: str, propStore: PropertiesStore)[source]

Bases: AttachmentBase

A standard data attachment of an MSG file.

Parameters:
  • msg – the Message instance that the attachment belongs to.

  • dir – the directory inside the MSG file where the attachment is located.

  • propStore – The PropertiesStore instance for the attachment.

getFilename(**kwargs) str[source]

Returns the filename to use for the attachment.

Parameters:
  • contentId – Use the contentId, if available.

  • customFilename – A custom name to use for the file.

If the filename starts with “UnknownFilename” then there is no guarantee that the files will have exactly the same filename.

regenerateRandomName() str[source]

Used to regenerate the random filename used if the attachment cannot find a usable filename.

save(**kwargs) Tuple[SaveType, List[str] | str | None][source]

Saves the attachment data.

The name of the file is determined by several factors. The first thing that is checked is if you have provided :param customFilename: to this function. If you have, that is the name that will be used. If no custom name has been provided and :param contentId: is True, the file will be saved using the content ID of the attachment. If it is not found or :param contentId: is False, the long filename will be used. If the long filename is not found, the short one will be used. If after all of this a usable filename has not been found, a random one will be used (accessible from randomFilename()). After the name to use has been determined, it will then be shortened to make sure that it is not more than the value of :param maxNameLength:.

To change the directory that the attachment is saved to, set the value of :param customPath: when calling this function. The default save directory is the working directory.

If you want to save the contents into a ZipFile or similar object, either pass a path to where you want to create one or pass an instance to :param zip:. If :param zip: is an instance, :param customPath: will refer to a location inside the zip file.

Parameters:
  • extractEmbedded – If True, causes the attachment, should it be an embedded MSG file, to save as a .msg file instead of calling it’s save function.

  • skipEmbedded – If True, skips saving this attachment if it is an embedded MSG file.

property data: bytes

The bytes making up the attachment data.

property randomFilename: str

The random filename to be used by this attachment.

property type: AttachmentType

An enum value that identifies the type of attachment.

class extract_msg.AttachmentBase(msg: MSGFile, dir_: str, propStore: PropertiesStore)[source]

Bases: ABC

The base class for all standard Attachments used by the module.

Parameters:
  • msg – the Message instance that the attachment belongs to.

  • dir – the directory inside the MSG file where the attachment is located.

  • propStore – The PropertiesStore instance for the attachment.

exists(filename: str | List[str] | Tuple[str]) bool[source]

Checks if stream exists inside the attachment folder.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

sExists(filename: str | List[str] | Tuple[str]) bool[source]

Checks if the string stream exists inside the attachment folder.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

existsTypedProperty(id, _type=None) Tuple[bool, int][source]

Determines if the stream with the provided id exists.

The return of this function is 2 values, the first being a bool for if anything was found, and the second being how many were found.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

abstract getFilename(**kwargs) str[source]

Returns the filename to use for the attachment.

Parameters:
  • contentId – Use the contentId, if available.

  • customFilename – A custom name to use for the file.

If the filename starts with “UnknownFilename” then there is no guarantee that the files will have exactly the same filename.

getMultipleBinary(filename: str | List[str] | Tuple[str]) List[bytes] | None[source]

Gets a multiple binary property as a list of bytes objects.

Like getStringStream(), the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00011102” then the filename would simply be “__substg1.0_0001”.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getMultipleString(filename: str | List[str] | Tuple[str]) List[str] | None[source]

Gets a multiple string property as a list of str objects.

Like getStringStream(), the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00011102” then the filename would simply be “__substg1.0_0001”.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getNamedAs(propertyName: str, guid: str, overrideClass: Type[_T] | Callable[[Any], _T]) _T | None[source]

Returns the named property, setting the class if specified.

Parameters:

overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s __init__ method or the function itself, if that is what is provided. If the value is None, this function is not called. If you want it to be called regardless, you should handle the data directly.

getNamedProp(propertyName: str, guid: str, default: _T | None = None) Any | _T[source]

instance.namedProperties.get((propertyName, guid), default)

Can be overriden to create new behavior.

getPropertyAs(propertyName: int | str, overrideClass: Type[_T] | Callable[[Any], _T]) _T | None[source]

Returns the property, setting the class if found.

Parameters:

overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s __init__ method or the function itself, if that is what is provided. If the value is None, this function is not called. If you want it to be called regardless, you should handle the data directly.

getPropertyVal(name: int | str, default: _T | None = None) Any | _T[source]

instance.props.getValue(name, default)

Can be overridden to create new behavior.

getSingleOrMultipleBinary(filename: str | List[str] | Tuple[str]) List[bytes] | bytes | None[source]

A combination of getStringStream() and getMultipleString().

Checks to see if a single binary stream exists to return, otherwise tries to return the multiple binary stream of the same ID.

Like getStringStream(), the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00010102” then the filename would simply be “__substg1.0_0001”.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getSingleOrMultipleString(filename: str | List[str] | Tuple[str]) str | List[str] | None[source]

A combination of getStringStream() and getMultipleString().

Checks to see if a single string stream exists to return, otherwise tries to return the multiple string stream of the same ID.

Like getStringStream(), the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_0001001F” then the filename would simply be “__substg1.0_0001”.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getStream(filename: str | List[str] | Tuple[str]) bytes | None[source]

Gets a binary representation of the requested stream.

This should ALWAYS return a bytes object if it was found, otherwise returns None.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getStreamAs(streamID: str | List[str] | Tuple[str], overrideClass: Type[_T] | Callable[[Any], _T]) _T | None[source]

Returns the specified stream, modifying it to the specified class if it is found.

Parameters:

overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s __init__ method or the function itself, if that is what is provided. If the value is None, this function is not called. If you want it to be called regardless, you should handle the data directly.

getStringStream(filename: str | List[str] | Tuple[str]) str | None[source]

Gets a string representation of the requested stream.

Rather than the full filename, you should only feed this function the filename sans the type. So if the full name is “__substg1.0_001A001F”, the filename this function should receive should be “__substg1.0_001A”.

This should ALWAYS return a string if it was found, otherwise returns None.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getStringStreamAs(streamID: str | List[str] | Tuple[str], overrideClass: Type[_T] | Callable[[Any], _T]) _T | None[source]

Returns the specified string stream, modifying it to the specified class if it is found.

Parameters:

overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s __init__ method or the function itself, if that is what is provided. If the value is None, this function is not called. If you want it to be called regardless, you should handle the data directly.

listDir(streams: bool = True, storages: bool = False) List[List[str]][source]

Lists the streams and/or storages that exist in the attachment directory.

Returns the paths excluding the attachment directory, allowing the paths to be directly used for accessing a stream.

slistDir(streams: bool = True, storages: bool = False) List[str][source]

Like listDir, except it returns the paths as strings.

abstract save(**kwargs) Tuple[SaveType, List[str] | str | None][source]

Saves the attachment data.

The name of the file is determined by the logic of getFilename(). If you are a developer, ensure that you use this behavior.

To change the directory that the attachment is saved to, set the value of :param customPath: when calling this function. The default save directory is the working directory.

If you want to save the contents into a ZipFile or similar object, either pass a path to where you want to create one or pass an instance to :param zip:. If :param zip: is an instance, :param customPath: will refer to a location inside the zip file.

Parameters:
  • extractEmbedded – If True, causes the attachment, should it be an embedded MSG file, to save as a .msg file instead of calling its save function.

  • skipEmbedded – If True, skips saving this attachment if it is an embedded MSG file.

Returns:

A tuple that specifies how the data was saved. The value of the first item specifies what the second value will be.

property attachmentEncoding: bytes | None

The encoding information about the attachment object.

Will return b'\x2A\x86\x48\x86\xf7\x14\x03\x0b\x01' if encoded in MacBinary format, otherwise it is unset.

property additionalInformation: str | None

The additional information about the attachment.

This property MUST be an empty string if attachmentEncoding is not set. Otherwise it MUST be set to a string of the format “:CREA:TYPE” where “:CREA” is the four-letter Macintosh file creator code and “:TYPE” is a four-letter Macintosh type code.

property cid: str | None

Returns the Content ID of the attachment, if it exists.

property clsid: str

Returns the CLSID for the data stream/storage of the attachment.

property contentId: str | None

Alias of cid.

property createdAt: datetime | None

Alias of creationTime.

property creationTime: datetime | None

The time the attachment was created.

abstract property data: object | None

The attachment data, if any.

Returns None if there is no data to save.

property dataType: Type[object] | None

The class that the data type will use, if it can be retrieved.

This is a safe way to do type checking on data before knowing if it will raise an exception. Returns None if no data will be returned or if an exception will be raised.

property dir: str

Returns the directory inside the MSG file where the attachment is located.

property displayName: str | None

Returns the display name of the folder.

property exceptionReplaceTime: datetime | None

The original date and time at which the instance in the recurrence pattern would have occurred if it were not an exception.

Only applicable if the attachment is an Exception object.

property extension: str | None

The reported extension for the file.

property hidden: bool

Indicates whether an Attachment object is hidden from the end user.

property isAttachmentContactPhoto: bool

Whether the attachment is a contact photo for a Contact object.

property lastModificationTime: datetime | None

The last time the attachment was modified.

property longFilename: str | None

Returns the long file name of the attachment, if it exists.

property longPathname: str | None

The fully qualified path and file name with extension.

property mimetype: str | None

The content-type mime header of the attachment, if specified.

property modifiedAt: datetime | None

Alias of lastModificationTime.

property msg: MSGFile

Returns the MSGFile instance the attachment belongs to.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

property name: str | None

The best name available for the file.

Uses long filename before short.

property namedProperties: NamedProperties

The NamedAttachmentProperties instance for this attachment.

property payloadClass: str | None

The class name of an object that can display the contents of the message.

property props: PropertiesStore

Returns the Properties instance of the attachment.

property renderingPosition: int | None

The offset, in rendered characters, to use when rendering the attachment within the main message text.

A value of 0xFFFFFFFF indicates a hidden attachment that is not to be rendered.

property shortFilename: str | None

The short file name of the attachment, if it exists.

property treePath: List[weakref.ReferenceType[Any]]

A path, as a tuple of instances, needed to get to this instance through the MSGFile-Attachment tree.

abstract property type: AttachmentType

An enum value that identifies the type of attachment.

class extract_msg.Message(path, **kwargs)[source]

Bases: MessageBase

Parser for Microsoft Outlook message files.

Supports all of the options from MSGFile.__init__() with some additional ones.

Parameters:
  • recipientSeparator – Optional, separator string to use between recipients.

  • deencapsulationFunc – Optional, if specified must be a callable that will override the way that HTML/text is deencapsulated from the RTF body. This function must take exactly 2 arguments, the first being the RTF body from the message and the second being an instance of the enum DeencapType that will tell the function what type of body is desired. The function should return a string for plain text and bytes for HTML. If any problems occur, the function must either return None or raise one of the appropriate exceptions from extract_msg.exceptions. All other exceptions must be handled internally or they will not be caught. The original deencapsulation method will not run if this is set.

filename: str | None
class extract_msg.MSGFile(path, **kwargs)[source]

Bases: object

Base handler for all .msg files.

Parameters:
  • path – Path to the MSG file in the system or the bytes of the MSG file.

  • prefix – Used for extracting embedded MSG files inside the main one. Do not set manually unless you know what you are doing.

  • parentMsg – Used for synchronizing instances of shared objects. Do not set this unless you know what you are doing.

  • initAttachment – Optional, the method used when creating an attachment for an MSG file. MUST be a function that takes 2 arguments (the MSGFile instance and the directory in the MSG file where the attachment is) and returns an instance of AttachmentBase.

  • delayAttachments – Optional, delays the initialization of attachments until the user attempts to retrieve them. Allows MSG files with bad attachments to be initialized so the other data can be retrieved.

  • filename – Optional, the filename to be used by default when saving.

  • errorBehavior – Optional, the behavior to use in the event of certain types of errors. Uses the ErrorBehavior enum.

  • overrideEncoding – Optional, an encoding to use instead of the one specified by the MSG file. If the value is "chardet" and you have the chardet module installed, an attempt will be made to auto-detect the encoding based on some of the string properties. Do not report encoding errors caused by this.

  • treePath – Internal variable used for giving representation of the path, as a tuple of objects, of the MSGFile. When passing, this is the path to the parent object of this instance.

  • insecureFeatures – Optional, an enum value that specifies if certain insecure features should be enabled. These features should only be used on data that you trust. Uses the InsecureFeatures enum.

  • dateFormat – Optional, the format string to use for dates.

  • datetimeFormat – Optional, the format string to use for dates that include a time component.

Raises:
  • InvalidFileFormatError – The file is not an OLE file or could not be parsed as an MSG file.

  • StandardViolationError – Some part of the file badly violates the standard.

  • IOError – There is an issue opening the MSG file.

  • NameError – The encoding provided is not supported.

  • PrefixError – The prefix is not a supported type.

  • TypeError – The parent is not an instance of MSGFile or a subclass.

  • ValueError – The path is invalid.

It’s recommended to check the error message to ensure you know why a specific exception was raised.

filename: str | None
close() None[source]
debug() None[source]
exists(filename: str | List[str] | Tuple[str], prefix: bool = True) bool[source]

Checks if the stream exists in the MSG file.

Parameters:

prefix – Bool, whether to search for the entry at the root of the MSG file (False) or look in the current child MSG file (True). (Default: True)

sExists(filename: str | List[str] | Tuple[str], prefix: bool = True) bool[source]

Checks if string stream exists in the MSG file.

Parameters:

prefix – Bool, whether to search for the entry at the root of the MSG file (False) or look in the current child MSG file (True). (Default: True)

existsTypedProperty(_id: str, location=None, _type=None, prefix: bool = True, propertiesInstance: PropertiesStore | None = None) Tuple[bool, int][source]

Determines if the stream with the provided id exists in the location specified.

If no location is specified, the root directory is searched. The return of this function is 2 values, the first being a boolean for if anything was found, and the second being how many were found.

Because of how this method works, any folder that contains it’s own “__properties_version1.0” file should have this method called from it’s class.

Parameters:

prefix – Bool, whether to search for the entry at the root of the MSG file (False) or look in the current child MSG file (True). (Default: True)

export(path) None[source]

Exports the contents of this MSG file to a new MSG files specified by the path given.

If this is an embedded MSG file, the embedded streams and directories will be added to it as if they were at the root, allowing you to save it as it’s own MSG file.

This function pulls directly from the source MSG file, so modifications to the properties of of an MSGFile object (or one of it’s subclasses) will not be reflected in the saved file.

Parameters:

path – A path-like object (including strings and pathlib.Path objects) or an IO device with a write method which accepts bytes.

exportBytes() bytes[source]

Saves a new copy of the MSG file, returning the bytes.

fixPath(inp: str | List[str] | Tuple[str], prefix: bool = True) str[source]

Changes paths so that they have the proper prefix (should :param prefix: be True) and are strings rather than lists or tuples.

getMultipleBinary(filename: str | List[str] | Tuple[str], prefix: bool = True) List[bytes] | None[source]

Gets a multiple binary property as a list of bytes objects.

Like getStringStream(), the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00011102” then the filename would simply be “__substg1.0_0001”.

Parameters:

prefix – Bool, whether to search for the entry at the root of the MSG file (False) or look in the current child MSG file (True). (Default: True)

getMultipleString(filename: str | List[str] | Tuple[str], prefix: bool = True) List[str] | None[source]

Gets a multiple string property as a list of str objects.

Like getStringStream(), the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00011102” then the filename would simply be “__substg1.0_0001”.

Parameters:

prefix – Bool, whether to search for the entry at the root of the MSG file (False) or look in the current child MSG file (True). (Default: True)

getNamedAs(propertyName: str, guid: str, overrideClass: Type[_T] | Callable[[Any], _T]) _T | None[source]

Returns the named property, setting the class if specified.

Parameters:

overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s __init__ method or the function itself, if that is what is provided. If the value is None, this function is not called. If you want it to be called regardless, you should handle the data directly.

getNamedProp(propertyName: str, guid: str, default: _T | None = None) Any | _T[source]

instance.namedProperties.get((propertyName, guid), default)

Can be override to create new behavior.

getPropertyAs(propertyName: int | str, overrideClass: Type[_T] | Callable[[Any], _T]) _T | None[source]

Returns the property, setting the class if found.

Parameters:

overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s __init__ method or the function itself, if that is what is provided. If the value is None, this function is not called. If you want it to be called regardless, you should handle the data directly.

getPropertyVal(name: int | str, default: _T | None = None) Any | _T[source]

instance.props.getValue(name, default)

Can be overridden to create new behavior.

getSingleOrMultipleBinary(filename: str | List[str] | Tuple[str], prefix: bool = True) List[bytes] | bytes | None[source]

Combination of getStream() and getMultipleBinary().

Checks to see if a single binary stream exists to return, otherwise tries to return the multiple binary stream of the same ID.

Like getStringStream(), the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00010102” then the filename would simply be “__substg1.0_0001”.

Parameters:

prefix – Bool, whether to search for the entry at the root of the MSG file (False) or look in the current child MSG file (True). (Default: True)

getSingleOrMultipleString(filename: str | List[str] | Tuple[str], prefix: bool = True) str | List[str] | None[source]

Combination of getStringStream() and getMultipleString().

Checks to see if a single string stream exists to return, otherwise tries to return the multiple string stream of the same ID.

Like getStringStream(), the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_0001001F” then the filename would simply be “__substg1.0_0001”.

Parameters:

prefix – Bool, whether to search for the entry at the root of the MSG file (False) or look in the current child MSG file (True). (Default: True)

getStream(filename: str | List[str] | Tuple[str], prefix: bool = True) bytes | None[source]

Gets a binary representation of the requested stream.

This should ALWAYS return a bytes object if it was found, otherwise returns None.

Parameters:

prefix – Bool, whether to search for the entry at the root of the MSG file (False) or look in the current child MSG file (True). (Default: True)

getStreamAs(streamID: str | List[str] | Tuple[str], overrideClass: Type[_T] | Callable[[Any], _T]) _T | None[source]

Returns the specified stream, modifying it to the specified class if it is found.

Parameters:

overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s __init__ method or the function itself, if that is what is provided. If the value is None, this function is not called. If you want it to be called regardless, you should handle the data directly.

getStringStream(filename: str | List[str] | Tuple[str], prefix: bool = True) str | None[source]

Gets a string representation of the requested stream.

Rather than the full filename, you should only feed this function the filename sans the type. So if the full name is “__substg1.0_001A001F”, the filename this function should receive should be “__substg1.0_001A”.

This should ALWAYS return a string if it was found, otherwise returns None.

Parameters:

prefix – Bool, whether to search for the entry at the root of the MSG file (False) or look in the current child MSG file (True). (Default: True)

getStringStreamAs(streamID: str | List[str] | Tuple[str], overrideClass: Type[_T] | Callable[[Any], _T]) _T | None[source]

Returns the specified string stream, modifying it to the specified class if it is found.

Parameters:

overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s __init__ method or the function itself, if that is what is provided. If the value is None, this function is not called. If you want it to be called regardless, you should handle the data directly.

listDir(streams: bool = True, storages: bool = False, includePrefix: bool = True) List[List[str]][source]

Replacement for OleFileIO.listdir that runs at the current prefix directory.

Parameters:

includePrefix – If False, removes the part of the path that is the prefix.

slistDir(streams: bool = True, storages: bool = False, includePrefix: bool = True) List[str][source]

Replacement for OleFileIO.listdir that runs at the current prefix directory. Returns a list of strings instead of lists.

save(**kwargs) Tuple[SaveType, List[str] | str | None][source]
saveAttachments(skipHidden: bool = False, **kwargs) None[source]

Saves only attachments in the same folder.

Parameters:

skipHidden – If True, skips attachments marked as hidden. (Default: False)

saveRaw(path) None[source]
property areStringsUnicode: bool

Whether the strings are Unicode encoded or not.

property attachments: List[AttachmentBase] | List[SignedAttachment]

A list of all attachments.

property attachmentsDelayed: bool

Returns True if the attachment initialization was delayed.

property attachmentsReady: bool

Returns True if the attachments are ready to be used.

property classified: bool

Indicates whether the contents of this message are regarded as classified information.

property classType: str | None

The class type of the MSG file.

property commonEnd: datetime | None

The end time for the object.

property commonStart: datetime | None

The start time for the object.

property contactLinkEntry: ContactLinkEntry | None

A class that contains the list of Address Book EntryIDs linked to this Message object.

property contacts: List[str] | None

Contains the display name property of each Address Book EntryID referenced in the value of the contactLinkEntry property.

property currentVersion: int | None

Specifies the build number of the client application that sent the message.

property currentVersionName: str | None

Specifies the name of the client application that sent the message.

property dateFormat: str

The format string to use when converting dates to strings.

This is used for dates with no time component.

property datetimeFormat: str

The format string to use when converting datetimes to strings.

This is used for dates that have time components.

property errorBehavior: ErrorBehavior

The behavior to follow when certain errors occur.

Will be an instance of the ErrorBehavior enum.

property importance: Importance | None

The specified importance of the MSG file.

property importanceString: str | None

The importance string to use for saving.

If the importance is medium then it returns None. Mainly used for saving.

property initAttachmentFunc: Callable[[MSGFile, str], AttachmentBase]

The method for initializing attachments being used, should you need to use it externally for whatever reason.

property insecureFeatures: InsecureFeatures

An enum specifying what insecure features have been enabled for this file.

property kwargs: Dict[str, Any]

The kwargs used to initialize this message, excluding the prefix.

This is used for initializing embedded MSG files.

property named: Named

The main named properties storage.

This is not usable to access the data of the properties directly.

Raises:

ReferenceError – The parent MSGFile instance has been garbage collected.

property namedProperties: NamedProperties

The NamedProperties instances usable to access the data for named properties.

property overrideEncoding: str | None

None if the encoding has not been overridden, otherwise the encoding used for string streams.

property path

The message path if generated from a file, otherwise the data used to generate the MSGFile instance.

property prefix: str

The prefix of the MSGFile instance.

Intended for developer use.

property prefixLen: int

The number of elements in the prefix.

Dividing by 2 will typically tell you how deeply nested the MSG file is.

property prefixList: List[str]

The prefix list of the Message instance.

Intended for developer use.

property priority: Priority | None

The specified priority of the MSG file.

property props: PropertiesStore

The PropertiesStore instance used by the MSGFile instance.

property retentionDate: datetime | None

The date, in UTC, after which a Message Object is expired by the server.

If None, the Message object never expires.

property retentionFlags: RetentionFlags | None

Flags that specify the status or nature of an item’s retention tag or archive tag.

property sensitivity: Sensitivity | None

The specified sensitivity of the MSG file.

property sideEffects: SideEffect | None

Controls how a Message object is handled by the client in relation to certain user interface actions by the user, such as deleting a message.

property stringEncoding: str
property treePath: List[weakref.ReferenceType[Any]]

A path, as a list of weak reference to the instances needed to get to this instance through the MSGFile-Attachment tree.

These are weak references to ensure the garbage collector doesn’t see the references back to higher objects.

class extract_msg.Named(msg: MSGFile)[source]

Bases: object

Class for handling access to the named properties themselves.

exists(filename: str | List[str] | Tuple[str]) bool[source]

Checks if stream exists inside the named properties folder.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

get(propertyName: Tuple[str, str], default: _T | None = None) NamedPropertyBase | _T[source]

Tries to get a named property based on its key.

Returns :param default: if not found. Key is a tuple of the name and the property set GUID.

getPropNameByStreamID(streamID: int | str) Tuple[str, str] | None[source]

Gets the name of a property (as a key for the internal dict) that is stored in the specified stream.

Useful for determining if a stream/property stream entry is a named property.

Parameters:

streamID – A 4 hex character identifier that will be checked. May also be an integer that can convert to 4 hex characters.

Returns:

The name, if the stream is a named property, otherwise None.

Raises:
getStream(filename: str | List[str] | Tuple[str]) bytes | None[source]

Gets a binary representation of the requested stream.

This should ALWAYS return a bytes object if it was found, otherwise returns None.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

items() Iterable[Tuple[Tuple[str, str], NamedPropertyBase]][source]
keys() Iterable[Tuple[str, str]][source]
pprintKeys() None[source]

Uses the pprint function on a sorted list of keys.

values() Iterable[NamedPropertyBase][source]
property dir

Returns the directory inside the MSG file where the named properties are located.

property msg: MSGFile

Returns the Message instance the attachment belongs to.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

property namedProperties: Dict[Tuple[str, str], NamedPropertyBase]

Returns a copy of the dictionary containing all the named properties.

class extract_msg.NamedProperties(named: Named, streamSource: MSGFile | AttachmentBase)[source]

Bases: object

An instance that uses a Named instance and an extract-msg class to read the data of named properties.

Parameters:
  • named – The Named instance to refer to for named properties entries.

  • streamSource – The source to use for acquiring the data of a named property.

get(item: Tuple[str, str] | NamedPropertyBase, default: _T | None = None) Any | _T[source]

Get a named property, returning the value of :param default: if not found. Item must be a tuple with 2 items: the name and the GUID string.

Raises:

ReferenceError – The associated instance for getting actual property data has been garbage collected.

class extract_msg.OleWriter(rootClsid: bytes = b'\x0b\r\x02\x00\x00\x00\x00\x00\xc0\x00\x00\x00\x00\x00\x00F')[source]

Bases: object

Takes data to write to a compound binary format file, as specified in [MS-CFB].

addEntry(path: str | List[str] | Tuple[str], data: bytes | SupportsBytes | None = None, storage: bool = False, **kwargs) None[source]

Adds an entry to the OleWriter instance at the path specified, adding storages with default settings where necessary. If the entry is not a storage, :param data: must be set.

Parameters:
  • path – The path to add the entry at. Must not contain a path part that is an already added stream.

  • data – The bytes for a stream or an object with a __bytes__ method.

  • storage – If True, the entry to add is a storage. Otherwise, the entry is a stream.

  • clsid – The CLSID for the stream/storage. Must a a bytes instance that is 16 bytes long.

  • creationTime – An 8 byte filetime int. Sets the creation time of the entry. Not applicable to streams.

  • modifiedTime – An 8 byte filetime int. Sets the modification time of the entry. Not applicable to streams.

  • stateBits – A 4 byte int. Sets the state bits, user-defined flags, of the entry. For a stream, this SHOULD be unset.

Raises:
  • OSError – A stream was found on the path before the end or an entry with the same name already exists.

  • ValueError – Attempts to access an internal item.

  • ValueError – The data provided is too large.

addOleEntry(path: str | List[str] | Tuple[str], entry: OleDirectoryEntry, data: bytes | SupportsBytes | None = None) None[source]

Uses the entry provided to add the data to the writer.

Raises:
  • OSError – Tried to add an entry to a path that has not yet been added, tried to add as a child of a stream, or tried to add an entry where one already exists under the same name.

  • ValueError – The data provided is too large.

deleteEntry(path) None[source]

Deletes the entry specified by :param path:, including all children.

Raises:
  • OSError – If the entry does not exist or a part of the path that is not the last was a stream.

  • ValueError – Attempted to delete an internal data stream.

editEntry(path: str | List[str] | Tuple[str], **kwargs) None[source]

Used to edit values of an entry by setting the specific kwargs. Set a value to something other than None to set it.

Parameters:
  • data – The data of a stream. Will error if used for something other than a stream. Must be bytes or convertable to bytes.

  • clsid – The CLSID for the stream/storage. Must a a bytes instance that is 16 bytes long.

  • creationTime – An 8 byte filetime int. Sets the creation time of the entry. Not applicable to streams.

  • modifiedTime – An 8 byte filetime int. Sets the modification time of the entry. Not applicable to streams.

  • stateBits – A 4 byte int. Sets the state bits, user-defined flags, of the entry. For a stream, this SHOULD be unset.

To convert a 32 character hexadecial CLSID into the bytes for this function, the _unClsid function in the ole_writer submodule can be used.

Raises:
  • OSError – The entry does not exist in the file.

  • TypeError – Attempted to modify the bytes of a storage.

  • ValueError – The type of a parameter was wrong, or the data of a parameter was invalid.

fromMsg(msg: MSGFile) None[source]

Copies the streams and stream information necessary from the MSG file.

fromOleFile(ole: OleFileIO, rootPath: str | List[str] | Tuple[str] = []) None[source]

Copies all the streams from the proided OLE file into this writer.

NOTE: This method does not handle any special rule that may be required by a format that uses the compound binary file format as a base when extracting an embedded directory. For example, MSG files require modification of an embedded properties stream when extracting an embedded MSG file.

Parameters:

rootPath – A path (accepted by olefile.OleFileIO) to the directory to use as the root of the file. If not provided, the file root will be used.

Raises:

OSError – If :param rootPath: does not exist in the file.

getEntry(path: str | List[str] | Tuple[str]) DirectoryEntry[source]

Finds and returns a copy of an existing DirectoryEntry instance in the writer. Use this method to check the internal status of an entry.

Raises:
  • OSError – If the entry does not exist.

  • ValueError – If access to an internal item is attempted.

listItems(streams: bool = True, storages: bool = False) List[List[str]][source]

Returns a list of the specified items currently in the writter.

Parameters:
  • streams – If True, includes the path for each stream in the list.

  • storages – If True, includes the path for each storage in the list.

renameEntry(path: str | List[str] | Tuple[str], newName: str) None[source]

Changes the name of an entry, leaving it in it’s current position.

Raises:
  • OSError – If the entry does not exist or an entry with the new name already exists,

  • ValueError – If access to an internal item is attempted or the new name provided is invalid.

walk() Iterator[Tuple[List[str], List[str], List[str]]][source]

Functional equivelent to os.walk, but for going over the file structure of the OLE file to be written. Unlike os.walk, it takes no arguments.

Returns:

A tuple of three lists. The first is the path, as a list of strings, for the directory (or an empty list for the root), the second is a list of the storages in the current directory, and the last is a list of the streams. Streams and storages are sorted caselessly.

write(path) None[source]

Writes the data to the path specified.

If :param path: has a write method, the object will be used directly.

If a failure occurs, the file or IO device may have been modified.

Raises:

TooManySectorsError – The number of sectors requires for a part of writing is too large.

class extract_msg.PropertiesStore(data: bytes | None, type_: PropertiesType, writable: bool = False)[source]

Bases: object

Parser for msg properties files.

Reads a properties stream or creates a brand new PropertiesStore object.

Parameters:
  • data – The bytes of the properties instance. Setting to None or empty bytes will cause the properties instance to not be valid unless writable is set to True. If that is the case, the instance will be setup for creating a new properties stream.

  • type – The type of properties stream this instance represents.

  • writable – Whether this properties stream should accept modification.

addProperty(prop: PropBase, force: bool = False) None[source]

Adds the property if it does not exist.

Parameters:
  • prop – The property to add.

  • force – If True, the writable property will be ignored. This will not be reflected when converting to bytes if the instance is not readable.

Raises:
  • KeyError – A property already exists with the chosen name.

  • NotWritableError – The method was used on an unwritable instance.

get(name: str | int, default: _T | None = None) PropBase | _T[source]

Retrieve the property of :param name:.

Returns:

The property, or the value of :param default: if the property could not be found.

getProperties(id_: str | int) List[PropBase][source]

Gets all properties with the specified ID.

Parameters:

ID – An 4 digit hexadecimal string or an int that is less than 0x10000.

getValue(name: str | int, default: _T | None = None) Any | _T[source]

Attempts to get the first property

items() a set-like object providing a view on D's items[source]
keys() a set-like object providing a view on D's keys[source]
makeWritable() PropertiesStore[source]

Returns a copy of this PropertiesStore object that allows modification.

If the instance is already writable, this will return the object.

pprintKeys() None[source]

Uses the pprint function on a sorted list of the keys.

removeProperty(nameOrProp: str | PropBase) None[source]

Removes the property by name or by instance.

Due to possible ambiguities, this function does not accept an int argument nor will it be able to find a property based on the 4 character hex ID.

Raises:
  • KeyError – The property was not found.

  • NotWritableError – The instance is not writable.

  • TypeError – The type for :param nameOrProp: was wrong.

toBytes() bytes[source]
values() an object providing a view on D's values[source]
property attachmentCount: int

The number of Attachment objects for the MSGFile object.

Raises:
  • NotWritableError – The setter was used on an unwritable instance.

  • TypeError – The Properties instance is not for an MSGFile object.

property date: datetime | None

Returns the send date contained in the Properties file.

property isError: bool

Whether the instance is in an invalid state.

If the instance is not writable and was given no data, this will be True.

property nextAttachmentId: int

The ID to use for naming the next Attachment object storage if one is created inside the .msg file.

Raises:
  • NotWritableError – The setter was used on an unwritable instance.

  • TypeError – The Properties instance is not for an MSGFile object.

property nextRecipientId: int

The ID to use for naming the next Recipient object storage if one is created inside the .msg file.

Raises:
  • NotWritableError – The setter was used on an unwritable instance.

  • TypeError – The Properties instance is not for an MSGFile object.

property props: Dict[str, PropBase]

Returns a copy of the internal properties dict.

property recipientCount: int

The number of Recipient objects for the MSGFile object.

Raises:
  • NotWritableError – The setter was used on an unwritable instance.

  • TypeError – The Properties instance is not for an MSGFile object.

property writable: bool

Whether the instance accepts modification.

class extract_msg.Recipient(_dir: str, msg: MSGFile, recipientTypeClass: Type[_RT])[source]

Bases: Generic[_RT]

Contains the data of one of the recipients in an MSG file.

exists(filename: str | List[str] | Tuple[str]) bool[source]

Checks if stream exists inside the recipient folder.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

sExists(filename: str | List[str] | Tuple[str]) bool[source]

Checks if the string stream exists inside the recipient folder.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

existsTypedProperty(id, _type=None) Tuple[bool, int][source]

Determines if the stream with the provided id exists.

The return of this function is 2 values, the first being a boolean for if anything was found, and the second being how many were found.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getMultipleBinary(filename: str | List[str] | Tuple[str]) List[bytes] | None[source]

Gets a multiple binary property as a list of bytes objects.

Like getStringStream(), the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00011102” then the filename would simply be “__substg1.0_0001”.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getMultipleString(filename: str | List[str] | Tuple[str]) List[str] | None[source]

Gets a multiple string property as a list of str objects.

Like getStringStream(), the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00011102” then the filename would simply be “__substg1.0_0001”.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getPropertyAs(propertyName: int | str, overrideClass: Type[_T] | Callable[[Any], _T]) _T | None[source]

Returns the property, setting the class if found.

Parameters:

overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s __init__ method or the function itself, if that is what is provided. If the value is None, this function is not called. If you want it to be called regardless, you should handle the data directly.

getPropertyVal(name: int | str, default: _T | None = None) Any | _T[source]

instance.props.getValue(name, default)

Can be overridden to create new behavior.

getSingleOrMultipleBinary(filename: str | List[str] | Tuple[str]) List[bytes] | bytes | None[source]

A combination of getStringStream() and getMultipleString().

Checks to see if a single binary stream exists to return, otherwise tries to return the multiple binary stream of the same ID.

Like getStringStream(), the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00010102” then the filename would simply be “__substg1.0_0001”.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getSingleOrMultipleString(filename: str | List[str] | Tuple[str]) str | List[str] | None[source]

A combination of getStringStream() and getMultipleString().

Checks to see if a single string stream exists to return, otherwise tries to return the multiple string stream of the same ID.

Like getStringStream(), the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_0001001F” then the filename would simply be “__substg1.0_0001”.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getStream(filename: str | List[str] | Tuple[str]) bytes | None[source]

Gets a binary representation of the requested stream.

This should ALWAYS return a bytes object if it was found, otherwise returns None.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getStreamAs(streamID: str | List[str] | Tuple[str], overrideClass: Type[_T] | Callable[[Any], _T]) _T | None[source]

Returns the specified stream, modifying it to the specified class if it is found.

Parameters:

overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s __init__ method or the function itself, if that is what is provided. If the value is None, this function is not called. If you want it to be called regardless, you should handle the data directly.

getStringStream(filename: str | List[str] | Tuple[str]) str | None[source]

Gets a string representation of the requested stream.

Rather than the full filename, you should only feed this function the filename sans the type. So if the full name is “__substg1.0_001A001F”, the filename this function should receive should be “__substg1.0_001A”.

This should ALWAYS return a string if it was found, otherwise returns None.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

getStringStreamAs(streamID: str | List[str] | Tuple[str], overrideClass: Type[_T] | Callable[[Any], _T]) _T | None[source]

Returns the specified string stream, modifying it to the specified class if it is found.

Parameters:

overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s __init__ method or the function itself, if that is what is provided. If the value is None, this function is not called. If you want it to be called regardless, you should handle the data directly.

listDir(streams: bool = True, storages: bool = False) List[List[str]][source]

Lists the streams and or storages that exist in the recipient directory.

Returns:

The paths excluding the recipient directory, allowing the paths to be directly used for accessing a file.

slistDir(streams: bool = True, storages: bool = False) List[str][source]

Like listDir(), except it returns the paths as strings.

property account: str | None

The account of this recipient.

property email: str | None

The recipient’s email.

property entryID: PermanentEntryID | None

The recipient’s Entry ID.

property formatted: str

The formatted recipient string.

property instanceKey: bytes | None

The instance key of this recipient.

property name: str | None

The recipient’s name.

property props: PropertiesStore

The Properties instance of the recipient.

property recordKey: bytes | None

The record key of this recipient.

property searchKey: bytes | None

The search key of this recipient.

property smtpAddress: str | None

The SMTP address of this recipient.

property transmittableDisplayName: str | None

The transmittable display name of this recipient.

property type: _RT

The recipient type.

property typeFlags: int

The raw recipient type value and all the flags it includes.

class extract_msg.SignedAttachment(msg, data: bytes, name: str, mimetype: str, node: Message)[source]

Bases: object

Parameters:
  • msg – The MSGFile instance this attachment is associated with.

  • data – The bytes that compose this attachment.

  • name – The reported name of the attachment.

  • mimetype – The reported mimetype of the attachment.

  • node – The email Message instance for this node.

save(**kwargs) Tuple[SaveType, List[str] | str | None][source]

Saves the attachment data.

The name of the file is determined by several factors. The first thing that is checked is if you have provided :param customFilename: to this function. If you have, that is the name that will be used. Otherwise, the name from name will be used. After the name to use has been determined, it will then be shortened to make sure that it is not more than the value of :param maxNameLength:.

To change the directory that the attachment is saved to, set the value of :param customPath: when calling this function. The default save directory is the working directory.

If you want to save the contents into a ZipFile or similar object, either pass a path to where you want to create one or pass an instance to :param zip:. If :param zip: is an instance, :param customPath: will refer to a location inside the zip file.

saveEmbededMessage(**kwargs) Tuple[SaveType, List[str] | str | None][source]

Seperate function from save to allow it to easily be overridden by a subclass.

property asBytes: bytes
property data: bytes | MSGFile

The bytes that compose this attachment.

property dataType: Type[type] | None

The class that the data type will use, if it can be retrieved.

This is a safe way to do type checking on data before knowing if it will raise an exception. Returns None if no data will be returns or if an exception will be raised.

property emailMessage: Message

The email Message instance that is the source for this attachment.

property mimetype: str

The reported mimetype of the attachment.

property msg: MSGFile

The MSGFile instance this attachment belongs to.

Raises:

ReferenceError – The associated MSGFile instance has been garbage collected.

property name: str

The reported name of this attachment.

property longFilename: str

The reported name of this attachment.

property shortFilename: str

The reported name of this attachment.

property treePath: List[weakref]

A path, as a tuple of instances, needed to get to this instance through the MSGFile-Attachment tree.

property type: AttachmentType
extract_msg.openMsg(path, **kwargs) MSGFile[source]

Function to automatically open an MSG file and detect what type it is.

Accepts all of the same arguments as the __init__ method for the class it creates. Extra options will be ignored if the class doesn’t know what to do with them, but child instances may end up using them if they understand them. See MSGFile.__init__ for a list of all globally recognized options.

Parameters:

strict – If set to True, this function will raise an exception when it cannot identify what MSGFile derivitive to use. Otherwise, it will log the error and return a basic MSGFile instance. Default is True.

Raises:
extract_msg.openMsgBulk(path, **kwargs) List[MSGFile] | Tuple[Exception, str | bytes][source]

Takes the same arguments as openMsg, but opens a collection of MSG files based on a wild card. Returns a list if successful, otherwise returns a tuple.

Parameters:

ignoreFailures – If this is True, will return a list of all successful files, ignoring any failures. Otherwise, will close all that successfully opened, and return a tuple of the exception and the path of the file that failed.