extract_msg Package
Subpackages/Submodules
- extract_msg.attachments.attachment Module
- extract_msg.attachments.attachment_base Module
- extract_msg.attachments.broken_att Module
- extract_msg.attachments.custom_att Module
- extract_msg.attachments.custom_att_handler.custom_handler Module
- extract_msg.attachments.custom_att_handler Package
- extract_msg.attachments.custom_att_handler.lnk_obj_att Module
- extract_msg.attachments.custom_att_handler.outlook_image_dib Module
- extract_msg.attachments.custom_att_handler.outlook_image_meta Module
- extract_msg.attachments.emb_msg_att Module
- extract_msg.attachments Package
- extract_msg.attachments.signed_att Module
- extract_msg.attachments.unsupported_att Module
- extract_msg.attachments.web_att Module
- extract_msg.constants Package
- extract_msg.constants.ps Module
- extract_msg.constants.re Module
- extract_msg.constants.st Module
- extract_msg.encoding Package
- extract_msg.encoding.utils Module
- extract_msg.msg_classes.appointment Module
- extract_msg.msg_classes.calendar Module
- extract_msg.msg_classes.calendar_base Module
- extract_msg.msg_classes.contact Module
- extract_msg.msg_classes Package
- extract_msg.msg_classes.journal Module
- extract_msg.msg_classes.meeting_cancellation Module
- extract_msg.msg_classes.meeting_exception Module
- extract_msg.msg_classes.meeting_forward Module
- extract_msg.msg_classes.meeting_related Module
- extract_msg.msg_classes.meeting_request Module
- extract_msg.msg_classes.meeting_response Module
- extract_msg.msg_classes.message Module
- extract_msg.msg_classes.message_base Module
- extract_msg.msg_classes.message_signed Module
- extract_msg.msg_classes.message_signed_base Module
- extract_msg.msg_classes.msg Module
- extract_msg.msg_classes.post Module
- extract_msg.msg_classes.sticky_note Module
- extract_msg.msg_classes.task Module
- extract_msg.msg_classes.task_request Module
- extract_msg.properties Package
- extract_msg.properties.named Module
- extract_msg.properties.prop Module
- extract_msg.properties.properties_store Module
- extract_msg.structures.business_card Module
- extract_msg.structures.cfoas Module
- extract_msg.structures.contact_link_entry Module
- extract_msg.structures.dev_mode_a Module
- extract_msg.structures.dv_target_device Module
- extract_msg.structures.entry_id Module
- extract_msg.structures Package
- extract_msg.structures.misc_id Module
- extract_msg.structures.mon_stream Module
- extract_msg.structures.odt Module
- extract_msg.structures.ole_pres Module
- extract_msg.structures.ole_stream_struct Module
- extract_msg.structures.recurrence_pattern Module
- extract_msg.structures.report_tag Module
- extract_msg.structures.system_time Module
- extract_msg.structures.time_zone_definition Module
- extract_msg.structures.time_zone_struct Module
- extract_msg.structures.toc_entry Module
- extract_msg.structures.tz_rule Module
- extract_msg.enums Module
- extract_msg.exceptions Module
- extract_msg.null_date Module
- extract_msg.ole_writer Module
- extract_msg.open_msg Module
- extract_msg.recipient Module
- extract_msg.utils Module
Package contents
- extract_msg:
Extracts emails and attachments saved in Microsoft Outlook’s .msg files.
https://github.com/TeamMsgExtractor/msg-extractor
- class extract_msg.Attachment(msg: MSGFile, dir_: str, propStore: PropertiesStore)[source]
Bases:
AttachmentBase
A standard data attachment of an MSG file.
- Parameters:
msg – the Message instance that the attachment belongs to.
dir – the directory inside the MSG file where the attachment is located.
propStore – The PropertiesStore instance for the attachment.
- getFilename(**kwargs) str [source]
Returns the filename to use for the attachment.
- Parameters:
contentId – Use the contentId, if available.
customFilename – A custom name to use for the file.
If the filename starts with “UnknownFilename” then there is no guarantee that the files will have exactly the same filename.
- regenerateRandomName() str [source]
Used to regenerate the random filename used if the attachment cannot find a usable filename.
- save(**kwargs) Tuple[SaveType, List[str] | str | None] [source]
Saves the attachment data.
The name of the file is determined by several factors. The first thing that is checked is if you have provided :param customFilename: to this function. If you have, that is the name that will be used. If no custom name has been provided and :param contentId: is
True
, the file will be saved using the content ID of the attachment. If it is not found or :param contentId: isFalse
, the long filename will be used. If the long filename is not found, the short one will be used. If after all of this a usable filename has not been found, a random one will be used (accessible fromrandomFilename()
). After the name to use has been determined, it will then be shortened to make sure that it is not more than the value of :param maxNameLength:.To change the directory that the attachment is saved to, set the value of :param customPath: when calling this function. The default save directory is the working directory.
If you want to save the contents into a
ZipFile
or similar object, either pass a path to where you want to create one or pass an instance to :param zip:. If :param zip: is an instance, :param customPath: will refer to a location inside the zip file.- Parameters:
extractEmbedded – If
True
, causes the attachment, should it be an embedded MSG file, to save as a .msg file instead of calling it’s save function.skipEmbedded – If
True
, skips saving this attachment if it is an embedded MSG file.
- property data: bytes
The bytes making up the attachment data.
- property randomFilename: str
The random filename to be used by this attachment.
- property type: AttachmentType
An enum value that identifies the type of attachment.
- class extract_msg.AttachmentBase(msg: MSGFile, dir_: str, propStore: PropertiesStore)[source]
Bases:
ABC
The base class for all standard Attachments used by the module.
- Parameters:
msg – the Message instance that the attachment belongs to.
dir – the directory inside the MSG file where the attachment is located.
propStore – The PropertiesStore instance for the attachment.
- exists(filename: str | List[str] | Tuple[str]) bool [source]
Checks if stream exists inside the attachment folder.
- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- sExists(filename: str | List[str] | Tuple[str]) bool [source]
Checks if the string stream exists inside the attachment folder.
- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- existsTypedProperty(id, _type=None) Tuple[bool, int] [source]
Determines if the stream with the provided id exists.
The return of this function is 2 values, the first being a
bool
for if anything was found, and the second being how many were found.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- abstract getFilename(**kwargs) str [source]
Returns the filename to use for the attachment.
- Parameters:
contentId – Use the contentId, if available.
customFilename – A custom name to use for the file.
If the filename starts with “UnknownFilename” then there is no guarantee that the files will have exactly the same filename.
- getMultipleBinary(filename: str | List[str] | Tuple[str]) List[bytes] | None [source]
Gets a multiple binary property as a list of
bytes
objects.Like
getStringStream()
, the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00011102” then the filename would simply be “__substg1.0_0001”.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getMultipleString(filename: str | List[str] | Tuple[str]) List[str] | None [source]
Gets a multiple string property as a list of
str
objects.Like
getStringStream()
, the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00011102” then the filename would simply be “__substg1.0_0001”.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getNamedAs(propertyName: str, guid: str, overrideClass: Type[_T] | Callable[[Any], _T]) _T | None [source]
Returns the named property, setting the class if specified.
- Parameters:
overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s
__init__
method or the function itself, if that is what is provided. If the value isNone
, this function is not called. If you want it to be called regardless, you should handle the data directly.
- getNamedProp(propertyName: str, guid: str, default: _T | None = None) Any | _T [source]
instance.namedProperties.get((propertyName, guid), default)
Can be overriden to create new behavior.
- getPropertyAs(propertyName: int | str, overrideClass: Type[_T] | Callable[[Any], _T]) _T | None [source]
Returns the property, setting the class if found.
- Parameters:
overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s
__init__
method or the function itself, if that is what is provided. If the value isNone
, this function is not called. If you want it to be called regardless, you should handle the data directly.
- getPropertyVal(name: int | str, default: _T | None = None) Any | _T [source]
instance.props.getValue(name, default)
Can be overridden to create new behavior.
- getSingleOrMultipleBinary(filename: str | List[str] | Tuple[str]) List[bytes] | bytes | None [source]
A combination of
getStringStream()
andgetMultipleString()
.Checks to see if a single binary stream exists to return, otherwise tries to return the multiple binary stream of the same ID.
Like
getStringStream()
, the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00010102” then the filename would simply be “__substg1.0_0001”.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getSingleOrMultipleString(filename: str | List[str] | Tuple[str]) str | List[str] | None [source]
A combination of
getStringStream()
andgetMultipleString()
.Checks to see if a single string stream exists to return, otherwise tries to return the multiple string stream of the same ID.
Like
getStringStream()
, the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_0001001F” then the filename would simply be “__substg1.0_0001”.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getStream(filename: str | List[str] | Tuple[str]) bytes | None [source]
Gets a binary representation of the requested stream.
This should ALWAYS return a
bytes
object if it was found, otherwise returnsNone
.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getStreamAs(streamID: str | List[str] | Tuple[str], overrideClass: Type[_T] | Callable[[Any], _T]) _T | None [source]
Returns the specified stream, modifying it to the specified class if it is found.
- Parameters:
overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s
__init__
method or the function itself, if that is what is provided. If the value isNone
, this function is not called. If you want it to be called regardless, you should handle the data directly.
- getStringStream(filename: str | List[str] | Tuple[str]) str | None [source]
Gets a string representation of the requested stream.
Rather than the full filename, you should only feed this function the filename sans the type. So if the full name is “__substg1.0_001A001F”, the filename this function should receive should be “__substg1.0_001A”.
This should ALWAYS return a string if it was found, otherwise returns
None
.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getStringStreamAs(streamID: str | List[str] | Tuple[str], overrideClass: Type[_T] | Callable[[Any], _T]) _T | None [source]
Returns the specified string stream, modifying it to the specified class if it is found.
- Parameters:
overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s
__init__
method or the function itself, if that is what is provided. If the value isNone
, this function is not called. If you want it to be called regardless, you should handle the data directly.
- listDir(streams: bool = True, storages: bool = False) List[List[str]] [source]
Lists the streams and/or storages that exist in the attachment directory.
Returns the paths excluding the attachment directory, allowing the paths to be directly used for accessing a stream.
- slistDir(streams: bool = True, storages: bool = False) List[str] [source]
Like listDir, except it returns the paths as strings.
- abstract save(**kwargs) Tuple[SaveType, List[str] | str | None] [source]
Saves the attachment data.
The name of the file is determined by the logic of
getFilename()
. If you are a developer, ensure that you use this behavior.To change the directory that the attachment is saved to, set the value of :param customPath: when calling this function. The default save directory is the working directory.
If you want to save the contents into a
ZipFile
or similar object, either pass a path to where you want to create one or pass an instance to :param zip:. If :param zip: is an instance, :param customPath: will refer to a location inside the zip file.- Parameters:
extractEmbedded – If
True
, causes the attachment, should it be an embedded MSG file, to save as a .msg file instead of calling its save function.skipEmbedded – If
True
, skips saving this attachment if it is an embedded MSG file.
- Returns:
A tuple that specifies how the data was saved. The value of the first item specifies what the second value will be.
- property attachmentEncoding: bytes | None
The encoding information about the attachment object.
Will return
b'\x2A\x86\x48\x86\xf7\x14\x03\x0b\x01'
if encoded in MacBinary format, otherwise it is unset.
- property additionalInformation: str | None
The additional information about the attachment.
This property MUST be an empty string if attachmentEncoding is not set. Otherwise it MUST be set to a string of the format “:CREA:TYPE” where “:CREA” is the four-letter Macintosh file creator code and “:TYPE” is a four-letter Macintosh type code.
- property cid: str | None
Returns the Content ID of the attachment, if it exists.
- property clsid: str
Returns the CLSID for the data stream/storage of the attachment.
- property createdAt: datetime | None
Alias of
creationTime
.
- property creationTime: datetime | None
The time the attachment was created.
- abstract property data: object | None
The attachment data, if any.
Returns
None
if there is no data to save.
- property dataType: Type[object] | None
The class that the data type will use, if it can be retrieved.
This is a safe way to do type checking on data before knowing if it will raise an exception. Returns
None
if no data will be returned or if an exception will be raised.
- property dir: str
Returns the directory inside the MSG file where the attachment is located.
- property displayName: str | None
Returns the display name of the folder.
- property exceptionReplaceTime: datetime | None
The original date and time at which the instance in the recurrence pattern would have occurred if it were not an exception.
Only applicable if the attachment is an Exception object.
- property extension: str | None
The reported extension for the file.
Indicates whether an Attachment object is hidden from the end user.
- property isAttachmentContactPhoto: bool
Whether the attachment is a contact photo for a Contact object.
- property lastModificationTime: datetime | None
The last time the attachment was modified.
- property longFilename: str | None
Returns the long file name of the attachment, if it exists.
- property longPathname: str | None
The fully qualified path and file name with extension.
- property mimetype: str | None
The content-type mime header of the attachment, if specified.
- property modifiedAt: datetime | None
Alias of
lastModificationTime
.
- property msg: MSGFile
Returns the
MSGFile
instance the attachment belongs to.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- property name: str | None
The best name available for the file.
Uses long filename before short.
- property namedProperties: NamedProperties
The NamedAttachmentProperties instance for this attachment.
- property payloadClass: str | None
The class name of an object that can display the contents of the message.
- property props: PropertiesStore
Returns the Properties instance of the attachment.
- property renderingPosition: int | None
The offset, in rendered characters, to use when rendering the attachment within the main message text.
A value of
0xFFFFFFFF
indicates a hidden attachment that is not to be rendered.
- property shortFilename: str | None
The short file name of the attachment, if it exists.
- property treePath: List[weakref.ReferenceType[Any]]
A path, as a tuple of instances, needed to get to this instance through the MSGFile-Attachment tree.
- abstract property type: AttachmentType
An enum value that identifies the type of attachment.
- class extract_msg.Message(path, **kwargs)[source]
Bases:
MessageBase
Parser for Microsoft Outlook message files.
Supports all of the options from
MSGFile.__init__()
with some additional ones.- Parameters:
recipientSeparator – Optional, separator string to use between recipients.
deencapsulationFunc – Optional, if specified must be a callable that will override the way that HTML/text is deencapsulated from the RTF body. This function must take exactly 2 arguments, the first being the RTF body from the message and the second being an instance of the enum
DeencapType
that will tell the function what type of body is desired. The function should return a string for plain text and bytes for HTML. If any problems occur, the function must either returnNone
or raise one of the appropriate exceptions fromextract_msg.exceptions
. All other exceptions must be handled internally or they will not be caught. The original deencapsulation method will not run if this is set.
- filename: str | None
- class extract_msg.MSGFile(path, **kwargs)[source]
Bases:
object
Base handler for all .msg files.
- Parameters:
path – Path to the MSG file in the system or the bytes of the MSG file.
prefix – Used for extracting embedded MSG files inside the main one. Do not set manually unless you know what you are doing.
parentMsg – Used for synchronizing instances of shared objects. Do not set this unless you know what you are doing.
initAttachment – Optional, the method used when creating an attachment for an MSG file. MUST be a function that takes 2 arguments (the
MSGFile
instance and the directory in the MSG file where the attachment is) and returns an instance ofAttachmentBase
.delayAttachments – Optional, delays the initialization of attachments until the user attempts to retrieve them. Allows MSG files with bad attachments to be initialized so the other data can be retrieved.
filename – Optional, the filename to be used by default when saving.
errorBehavior – Optional, the behavior to use in the event of certain types of errors. Uses the
ErrorBehavior
enum.overrideEncoding – Optional, an encoding to use instead of the one specified by the MSG file. If the value is
"chardet"
and you have thechardet
module installed, an attempt will be made to auto-detect the encoding based on some of the string properties. Do not report encoding errors caused by this.treePath – Internal variable used for giving representation of the path, as a tuple of objects, of the
MSGFile
. When passing, this is the path to the parent object of this instance.insecureFeatures – Optional, an enum value that specifies if certain insecure features should be enabled. These features should only be used on data that you trust. Uses the
InsecureFeatures
enum.dateFormat – Optional, the format string to use for dates.
datetimeFormat – Optional, the format string to use for dates that include a time component.
- Raises:
InvalidFileFormatError – The file is not an OLE file or could not be parsed as an MSG file.
StandardViolationError – Some part of the file badly violates the standard.
IOError – There is an issue opening the MSG file.
NameError – The encoding provided is not supported.
PrefixError – The prefix is not a supported type.
TypeError – The parent is not an instance of
MSGFile
or a subclass.ValueError – The path is invalid.
It’s recommended to check the error message to ensure you know why a specific exception was raised.
- filename: str | None
- exists(filename: str | List[str] | Tuple[str], prefix: bool = True) bool [source]
Checks if the stream exists in the MSG file.
- Parameters:
prefix – Bool, whether to search for the entry at the root of the MSG file (
False
) or look in the current child MSG file (True
). (Default:True
)
- sExists(filename: str | List[str] | Tuple[str], prefix: bool = True) bool [source]
Checks if string stream exists in the MSG file.
- Parameters:
prefix – Bool, whether to search for the entry at the root of the MSG file (
False
) or look in the current child MSG file (True
). (Default:True
)
- existsTypedProperty(_id: str, location=None, _type=None, prefix: bool = True, propertiesInstance: PropertiesStore | None = None) Tuple[bool, int] [source]
Determines if the stream with the provided id exists in the location specified.
If no location is specified, the root directory is searched. The return of this function is 2 values, the first being a boolean for if anything was found, and the second being how many were found.
Because of how this method works, any folder that contains it’s own “__properties_version1.0” file should have this method called from it’s class.
- Parameters:
prefix – Bool, whether to search for the entry at the root of the MSG file (
False
) or look in the current child MSG file (True
). (Default:True
)
- export(path) None [source]
Exports the contents of this MSG file to a new MSG files specified by the path given.
If this is an embedded MSG file, the embedded streams and directories will be added to it as if they were at the root, allowing you to save it as it’s own MSG file.
This function pulls directly from the source MSG file, so modifications to the properties of of an
MSGFile
object (or one of it’s subclasses) will not be reflected in the saved file.- Parameters:
path – A path-like object (including strings and
pathlib.Path
objects) or an IO device with a write method which accepts bytes.
- fixPath(inp: str | List[str] | Tuple[str], prefix: bool = True) str [source]
Changes paths so that they have the proper prefix (should :param prefix: be
True
) and are strings rather than lists or tuples.
- getMultipleBinary(filename: str | List[str] | Tuple[str], prefix: bool = True) List[bytes] | None [source]
Gets a multiple binary property as a list of
bytes
objects.Like
getStringStream()
, the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00011102” then the filename would simply be “__substg1.0_0001”.- Parameters:
prefix – Bool, whether to search for the entry at the root of the MSG file (
False
) or look in the current child MSG file (True
). (Default:True
)
- getMultipleString(filename: str | List[str] | Tuple[str], prefix: bool = True) List[str] | None [source]
Gets a multiple string property as a list of
str
objects.Like
getStringStream()
, the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00011102” then the filename would simply be “__substg1.0_0001”.- Parameters:
prefix – Bool, whether to search for the entry at the root of the MSG file (
False
) or look in the current child MSG file (True
). (Default:True
)
- getNamedAs(propertyName: str, guid: str, overrideClass: Type[_T] | Callable[[Any], _T]) _T | None [source]
Returns the named property, setting the class if specified.
- Parameters:
overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s
__init__
method or the function itself, if that is what is provided. If the value isNone
, this function is not called. If you want it to be called regardless, you should handle the data directly.
- getNamedProp(propertyName: str, guid: str, default: _T | None = None) Any | _T [source]
instance.namedProperties.get((propertyName, guid), default)
Can be override to create new behavior.
- getPropertyAs(propertyName: int | str, overrideClass: Type[_T] | Callable[[Any], _T]) _T | None [source]
Returns the property, setting the class if found.
- Parameters:
overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s
__init__
method or the function itself, if that is what is provided. If the value isNone
, this function is not called. If you want it to be called regardless, you should handle the data directly.
- getPropertyVal(name: int | str, default: _T | None = None) Any | _T [source]
instance.props.getValue(name, default)
Can be overridden to create new behavior.
- getSingleOrMultipleBinary(filename: str | List[str] | Tuple[str], prefix: bool = True) List[bytes] | bytes | None [source]
Combination of
getStream()
andgetMultipleBinary()
.Checks to see if a single binary stream exists to return, otherwise tries to return the multiple binary stream of the same ID.
Like
getStringStream()
, the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00010102” then the filename would simply be “__substg1.0_0001”.- Parameters:
prefix – Bool, whether to search for the entry at the root of the MSG file (
False
) or look in the current child MSG file (True
). (Default:True
)
- getSingleOrMultipleString(filename: str | List[str] | Tuple[str], prefix: bool = True) str | List[str] | None [source]
Combination of
getStringStream()
andgetMultipleString()
.Checks to see if a single string stream exists to return, otherwise tries to return the multiple string stream of the same ID.
Like
getStringStream()
, the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_0001001F” then the filename would simply be “__substg1.0_0001”.- Parameters:
prefix – Bool, whether to search for the entry at the root of the MSG file (
False
) or look in the current child MSG file (True
). (Default:True
)
- getStream(filename: str | List[str] | Tuple[str], prefix: bool = True) bytes | None [source]
Gets a binary representation of the requested stream.
This should ALWAYS return a
bytes
object if it was found, otherwise returnsNone
.- Parameters:
prefix – Bool, whether to search for the entry at the root of the MSG file (
False
) or look in the current child MSG file (True
). (Default:True
)
- getStreamAs(streamID: str | List[str] | Tuple[str], overrideClass: Type[_T] | Callable[[Any], _T]) _T | None [source]
Returns the specified stream, modifying it to the specified class if it is found.
- Parameters:
overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s
__init__
method or the function itself, if that is what is provided. If the value isNone
, this function is not called. If you want it to be called regardless, you should handle the data directly.
- getStringStream(filename: str | List[str] | Tuple[str], prefix: bool = True) str | None [source]
Gets a string representation of the requested stream.
Rather than the full filename, you should only feed this function the filename sans the type. So if the full name is “__substg1.0_001A001F”, the filename this function should receive should be “__substg1.0_001A”.
This should ALWAYS return a string if it was found, otherwise returns
None
.- Parameters:
prefix – Bool, whether to search for the entry at the root of the MSG file (
False
) or look in the current child MSG file (True
). (Default:True
)
- getStringStreamAs(streamID: str | List[str] | Tuple[str], overrideClass: Type[_T] | Callable[[Any], _T]) _T | None [source]
Returns the specified string stream, modifying it to the specified class if it is found.
- Parameters:
overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s
__init__
method or the function itself, if that is what is provided. If the value isNone
, this function is not called. If you want it to be called regardless, you should handle the data directly.
- listDir(streams: bool = True, storages: bool = False, includePrefix: bool = True) List[List[str]] [source]
Replacement for
OleFileIO.listdir
that runs at the current prefix directory.- Parameters:
includePrefix – If
False
, removes the part of the path that is the prefix.
- slistDir(streams: bool = True, storages: bool = False, includePrefix: bool = True) List[str] [source]
Replacement for OleFileIO.listdir that runs at the current prefix directory. Returns a list of strings instead of lists.
- saveAttachments(skipHidden: bool = False, **kwargs) None [source]
Saves only attachments in the same folder.
- Parameters:
skipHidden – If
True
, skips attachments marked as hidden. (Default:False
)
- property areStringsUnicode: bool
Whether the strings are Unicode encoded or not.
- property attachments: List[AttachmentBase] | List[SignedAttachment]
A list of all attachments.
- property attachmentsDelayed: bool
Returns
True
if the attachment initialization was delayed.
- property attachmentsReady: bool
Returns
True
if the attachments are ready to be used.
- property classified: bool
Indicates whether the contents of this message are regarded as classified information.
- property classType: str | None
The class type of the MSG file.
- property commonEnd: datetime | None
The end time for the object.
- property commonStart: datetime | None
The start time for the object.
- property contactLinkEntry: ContactLinkEntry | None
A class that contains the list of Address Book EntryIDs linked to this Message object.
- property contacts: List[str] | None
Contains the display name property of each Address Book EntryID referenced in the value of the contactLinkEntry property.
- property currentVersion: int | None
Specifies the build number of the client application that sent the message.
- property currentVersionName: str | None
Specifies the name of the client application that sent the message.
- property dateFormat: str
The format string to use when converting dates to strings.
This is used for dates with no time component.
- property datetimeFormat: str
The format string to use when converting datetimes to strings.
This is used for dates that have time components.
- property errorBehavior: ErrorBehavior
The behavior to follow when certain errors occur.
Will be an instance of the ErrorBehavior enum.
- property importance: Importance | None
The specified importance of the MSG file.
- property importanceString: str | None
The importance string to use for saving.
If the importance is medium then it returns
None
. Mainly used for saving.
- property initAttachmentFunc: Callable[[MSGFile, str], AttachmentBase]
The method for initializing attachments being used, should you need to use it externally for whatever reason.
- property insecureFeatures: InsecureFeatures
An enum specifying what insecure features have been enabled for this file.
- property kwargs: Dict[str, Any]
The kwargs used to initialize this message, excluding the prefix.
This is used for initializing embedded MSG files.
- property named: Named
The main named properties storage.
This is not usable to access the data of the properties directly.
- Raises:
ReferenceError – The parent
MSGFile
instance has been garbage collected.
- property namedProperties: NamedProperties
The NamedProperties instances usable to access the data for named properties.
- property overrideEncoding: str | None
None
if the encoding has not been overridden, otherwise the encoding used for string streams.
- property path
The message path if generated from a file, otherwise the data used to generate the
MSGFile
instance.
- property prefix: str
The prefix of the
MSGFile
instance.Intended for developer use.
- property prefixLen: int
The number of elements in the prefix.
Dividing by 2 will typically tell you how deeply nested the MSG file is.
- property prefixList: List[str]
The prefix list of the Message instance.
Intended for developer use.
- property props: PropertiesStore
The
PropertiesStore
instance used by theMSGFile
instance.
- property retentionDate: datetime | None
The date, in UTC, after which a Message Object is expired by the server.
If
None
, the Message object never expires.
- property retentionFlags: RetentionFlags | None
Flags that specify the status or nature of an item’s retention tag or archive tag.
- property sensitivity: Sensitivity | None
The specified sensitivity of the MSG file.
- property sideEffects: SideEffect | None
Controls how a Message object is handled by the client in relation to certain user interface actions by the user, such as deleting a message.
- property stringEncoding: str
- property treePath: List[weakref.ReferenceType[Any]]
A path, as a list of weak reference to the instances needed to get to this instance through the MSGFile-Attachment tree.
These are weak references to ensure the garbage collector doesn’t see the references back to higher objects.
- class extract_msg.Named(msg: MSGFile)[source]
Bases:
object
Class for handling access to the named properties themselves.
- exists(filename: str | List[str] | Tuple[str]) bool [source]
Checks if stream exists inside the named properties folder.
- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- get(propertyName: Tuple[str, str], default: _T | None = None) NamedPropertyBase | _T [source]
Tries to get a named property based on its key.
Returns :param default: if not found. Key is a tuple of the name and the property set GUID.
- getPropNameByStreamID(streamID: int | str) Tuple[str, str] | None [source]
Gets the name of a property (as a key for the internal dict) that is stored in the specified stream.
Useful for determining if a stream/property stream entry is a named property.
- Parameters:
streamID – A 4 hex character identifier that will be checked. May also be an integer that can convert to 4 hex characters.
- Returns:
The name, if the stream is a named property, otherwise
None
.- Raises:
InvalidPropertyIdError – The Stream ID is invalid.
TypeError – The Stream ID is not a valid type.
- getStream(filename: str | List[str] | Tuple[str]) bytes | None [source]
Gets a binary representation of the requested stream.
This should ALWAYS return a
bytes
object if it was found, otherwise returnsNone
.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- items() Iterable[Tuple[Tuple[str, str], NamedPropertyBase]] [source]
- values() Iterable[NamedPropertyBase] [source]
- property dir
Returns the directory inside the MSG file where the named properties are located.
- property msg: MSGFile
Returns the Message instance the attachment belongs to.
- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- property namedProperties: Dict[Tuple[str, str], NamedPropertyBase]
Returns a copy of the dictionary containing all the named properties.
- class extract_msg.NamedProperties(named: Named, streamSource: MSGFile | AttachmentBase)[source]
Bases:
object
An instance that uses a Named instance and an extract-msg class to read the data of named properties.
- Parameters:
named – The Named instance to refer to for named properties entries.
streamSource – The source to use for acquiring the data of a named property.
- get(item: Tuple[str, str] | NamedPropertyBase, default: _T | None = None) Any | _T [source]
Get a named property, returning the value of :param default: if not found. Item must be a tuple with 2 items: the name and the GUID string.
- Raises:
ReferenceError – The associated instance for getting actual property data has been garbage collected.
- class extract_msg.OleWriter(rootClsid: bytes = b'\x0b\r\x02\x00\x00\x00\x00\x00\xc0\x00\x00\x00\x00\x00\x00F')[source]
Bases:
object
Takes data to write to a compound binary format file, as specified in [MS-CFB].
- addEntry(path: str | List[str] | Tuple[str], data: bytes | SupportsBytes | None = None, storage: bool = False, **kwargs) None [source]
Adds an entry to the OleWriter instance at the path specified, adding storages with default settings where necessary. If the entry is not a storage, :param data: must be set.
- Parameters:
path – The path to add the entry at. Must not contain a path part that is an already added stream.
data – The bytes for a stream or an object with a
__bytes__
method.storage – If
True
, the entry to add is a storage. Otherwise, the entry is a stream.clsid – The CLSID for the stream/storage. Must a a bytes instance that is 16 bytes long.
creationTime – An 8 byte filetime int. Sets the creation time of the entry. Not applicable to streams.
modifiedTime – An 8 byte filetime int. Sets the modification time of the entry. Not applicable to streams.
stateBits – A 4 byte int. Sets the state bits, user-defined flags, of the entry. For a stream, this SHOULD be unset.
- Raises:
OSError – A stream was found on the path before the end or an entry with the same name already exists.
ValueError – Attempts to access an internal item.
ValueError – The data provided is too large.
- addOleEntry(path: str | List[str] | Tuple[str], entry: OleDirectoryEntry, data: bytes | SupportsBytes | None = None) None [source]
Uses the entry provided to add the data to the writer.
- Raises:
OSError – Tried to add an entry to a path that has not yet been added, tried to add as a child of a stream, or tried to add an entry where one already exists under the same name.
ValueError – The data provided is too large.
- deleteEntry(path) None [source]
Deletes the entry specified by :param path:, including all children.
- Raises:
OSError – If the entry does not exist or a part of the path that is not the last was a stream.
ValueError – Attempted to delete an internal data stream.
- editEntry(path: str | List[str] | Tuple[str], **kwargs) None [source]
Used to edit values of an entry by setting the specific kwargs. Set a value to something other than None to set it.
- Parameters:
data – The data of a stream. Will error if used for something other than a stream. Must be bytes or convertable to bytes.
clsid – The CLSID for the stream/storage. Must a a bytes instance that is 16 bytes long.
creationTime – An 8 byte filetime int. Sets the creation time of the entry. Not applicable to streams.
modifiedTime – An 8 byte filetime int. Sets the modification time of the entry. Not applicable to streams.
stateBits – A 4 byte int. Sets the state bits, user-defined flags, of the entry. For a stream, this SHOULD be unset.
To convert a 32 character hexadecial CLSID into the bytes for this function, the _unClsid function in the ole_writer submodule can be used.
- Raises:
OSError – The entry does not exist in the file.
TypeError – Attempted to modify the bytes of a storage.
ValueError – The type of a parameter was wrong, or the data of a parameter was invalid.
- fromMsg(msg: MSGFile) None [source]
Copies the streams and stream information necessary from the MSG file.
- fromOleFile(ole: OleFileIO, rootPath: str | List[str] | Tuple[str] = []) None [source]
Copies all the streams from the proided OLE file into this writer.
NOTE: This method does not handle any special rule that may be required by a format that uses the compound binary file format as a base when extracting an embedded directory. For example, MSG files require modification of an embedded properties stream when extracting an embedded MSG file.
- Parameters:
rootPath – A path (accepted by
olefile.OleFileIO
) to the directory to use as the root of the file. If not provided, the file root will be used.- Raises:
OSError – If :param rootPath: does not exist in the file.
- getEntry(path: str | List[str] | Tuple[str]) DirectoryEntry [source]
Finds and returns a copy of an existing DirectoryEntry instance in the writer. Use this method to check the internal status of an entry.
- Raises:
OSError – If the entry does not exist.
ValueError – If access to an internal item is attempted.
- listItems(streams: bool = True, storages: bool = False) List[List[str]] [source]
Returns a list of the specified items currently in the writter.
- Parameters:
streams – If
True
, includes the path for each stream in the list.storages – If
True
, includes the path for each storage in the list.
- renameEntry(path: str | List[str] | Tuple[str], newName: str) None [source]
Changes the name of an entry, leaving it in it’s current position.
- Raises:
OSError – If the entry does not exist or an entry with the new name already exists,
ValueError – If access to an internal item is attempted or the new name provided is invalid.
- walk() Iterator[Tuple[List[str], List[str], List[str]]] [source]
Functional equivelent to
os.walk
, but for going over the file structure of the OLE file to be written. Unlikeos.walk
, it takes no arguments.- Returns:
A tuple of three lists. The first is the path, as a list of strings, for the directory (or an empty list for the root), the second is a list of the storages in the current directory, and the last is a list of the streams. Streams and storages are sorted caselessly.
- class extract_msg.PropertiesStore(data: bytes | None, type_: PropertiesType, writable: bool = False)[source]
Bases:
object
Parser for msg properties files.
Reads a properties stream or creates a brand new
PropertiesStore
object.- Parameters:
data – The bytes of the properties instance. Setting to
None
or empty bytes will cause the properties instance to not be valid unless writable is set toTrue
. If that is the case, the instance will be setup for creating a new properties stream.type – The type of properties stream this instance represents.
writable – Whether this properties stream should accept modification.
- addProperty(prop: PropBase, force: bool = False) None [source]
Adds the property if it does not exist.
- Parameters:
prop – The property to add.
force – If
True
, the writable property will be ignored. This will not be reflected when converting tobytes
if the instance is not readable.
- Raises:
KeyError – A property already exists with the chosen name.
NotWritableError – The method was used on an unwritable instance.
- get(name: str | int, default: _T | None = None) PropBase | _T [source]
Retrieve the property of :param name:.
- Returns:
The property, or the value of :param default: if the property could not be found.
- getProperties(id_: str | int) List[PropBase] [source]
Gets all properties with the specified ID.
- Parameters:
ID – An 4 digit hexadecimal string or an int that is less than 0x10000.
- getValue(name: str | int, default: _T | None = None) Any | _T [source]
Attempts to get the first property
- makeWritable() PropertiesStore [source]
Returns a copy of this PropertiesStore object that allows modification.
If the instance is already writable, this will return the object.
- removeProperty(nameOrProp: str | PropBase) None [source]
Removes the property by name or by instance.
Due to possible ambiguities, this function does not accept an int argument nor will it be able to find a property based on the 4 character hex ID.
- Raises:
KeyError – The property was not found.
NotWritableError – The instance is not writable.
TypeError – The type for :param nameOrProp: was wrong.
- property attachmentCount: int
The number of Attachment objects for the
MSGFile
object.- Raises:
NotWritableError – The setter was used on an unwritable instance.
TypeError – The Properties instance is not for an
MSGFile
object.
- property date: datetime | None
Returns the send date contained in the Properties file.
- property isError: bool
Whether the instance is in an invalid state.
If the instance is not writable and was given no data, this will be
True
.
- property nextAttachmentId: int
The ID to use for naming the next Attachment object storage if one is created inside the .msg file.
- Raises:
NotWritableError – The setter was used on an unwritable instance.
TypeError – The Properties instance is not for an
MSGFile
object.
- property nextRecipientId: int
The ID to use for naming the next Recipient object storage if one is created inside the .msg file.
- Raises:
NotWritableError – The setter was used on an unwritable instance.
TypeError – The Properties instance is not for an
MSGFile
object.
- property recipientCount: int
The number of Recipient objects for the
MSGFile
object.- Raises:
NotWritableError – The setter was used on an unwritable instance.
TypeError – The Properties instance is not for an
MSGFile
object.
- property writable: bool
Whether the instance accepts modification.
- class extract_msg.Recipient(_dir: str, msg: MSGFile, recipientTypeClass: Type[_RT])[source]
Bases:
Generic
[_RT
]Contains the data of one of the recipients in an MSG file.
- exists(filename: str | List[str] | Tuple[str]) bool [source]
Checks if stream exists inside the recipient folder.
- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- sExists(filename: str | List[str] | Tuple[str]) bool [source]
Checks if the string stream exists inside the recipient folder.
- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- existsTypedProperty(id, _type=None) Tuple[bool, int] [source]
Determines if the stream with the provided id exists.
The return of this function is 2 values, the first being a boolean for if anything was found, and the second being how many were found.
- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getMultipleBinary(filename: str | List[str] | Tuple[str]) List[bytes] | None [source]
Gets a multiple binary property as a list of bytes objects.
Like
getStringStream()
, the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00011102” then the filename would simply be “__substg1.0_0001”.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getMultipleString(filename: str | List[str] | Tuple[str]) List[str] | None [source]
Gets a multiple string property as a list of str objects.
Like
getStringStream()
, the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00011102” then the filename would simply be “__substg1.0_0001”.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getPropertyAs(propertyName: int | str, overrideClass: Type[_T] | Callable[[Any], _T]) _T | None [source]
Returns the property, setting the class if found.
- Parameters:
overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s
__init__
method or the function itself, if that is what is provided. If the value isNone
, this function is not called. If you want it to be called regardless, you should handle the data directly.
- getPropertyVal(name: int | str, default: _T | None = None) Any | _T [source]
instance.props.getValue(name, default)
Can be overridden to create new behavior.
- getSingleOrMultipleBinary(filename: str | List[str] | Tuple[str]) List[bytes] | bytes | None [source]
A combination of
getStringStream()
andgetMultipleString()
.Checks to see if a single binary stream exists to return, otherwise tries to return the multiple binary stream of the same ID.
Like
getStringStream()
, the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_00010102” then the filename would simply be “__substg1.0_0001”.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getSingleOrMultipleString(filename: str | List[str] | Tuple[str]) str | List[str] | None [source]
A combination of
getStringStream()
andgetMultipleString()
.Checks to see if a single string stream exists to return, otherwise tries to return the multiple string stream of the same ID.
Like
getStringStream()
, the 4 character type suffix should be omitted. So if you want the stream “__substg1.0_0001001F” then the filename would simply be “__substg1.0_0001”.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getStream(filename: str | List[str] | Tuple[str]) bytes | None [source]
Gets a binary representation of the requested stream.
This should ALWAYS return a
bytes
object if it was found, otherwise returnsNone
.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getStreamAs(streamID: str | List[str] | Tuple[str], overrideClass: Type[_T] | Callable[[Any], _T]) _T | None [source]
Returns the specified stream, modifying it to the specified class if it is found.
- Parameters:
overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s
__init__
method or the function itself, if that is what is provided. If the value isNone
, this function is not called. If you want it to be called regardless, you should handle the data directly.
- getStringStream(filename: str | List[str] | Tuple[str]) str | None [source]
Gets a string representation of the requested stream.
Rather than the full filename, you should only feed this function the filename sans the type. So if the full name is “__substg1.0_001A001F”, the filename this function should receive should be “__substg1.0_001A”.
This should ALWAYS return a string if it was found, otherwise returns None.
- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- getStringStreamAs(streamID: str | List[str] | Tuple[str], overrideClass: Type[_T] | Callable[[Any], _T]) _T | None [source]
Returns the specified string stream, modifying it to the specified class if it is found.
- Parameters:
overrideClass – Class/function to use to morph the data that was read. The data will be the first argument to the class’s
__init__
method or the function itself, if that is what is provided. If the value isNone
, this function is not called. If you want it to be called regardless, you should handle the data directly.
- listDir(streams: bool = True, storages: bool = False) List[List[str]] [source]
Lists the streams and or storages that exist in the recipient directory.
- Returns:
The paths excluding the recipient directory, allowing the paths to be directly used for accessing a file.
- slistDir(streams: bool = True, storages: bool = False) List[str] [source]
Like
listDir()
, except it returns the paths as strings.
- property account: str | None
The account of this recipient.
- property email: str | None
The recipient’s email.
- property entryID: PermanentEntryID | None
The recipient’s Entry ID.
- property formatted: str
The formatted recipient string.
- property instanceKey: bytes | None
The instance key of this recipient.
- property name: str | None
The recipient’s name.
- property props: PropertiesStore
The Properties instance of the recipient.
- property recordKey: bytes | None
The record key of this recipient.
- property searchKey: bytes | None
The search key of this recipient.
- property smtpAddress: str | None
The SMTP address of this recipient.
- property transmittableDisplayName: str | None
The transmittable display name of this recipient.
- property type: _RT
The recipient type.
- property typeFlags: int
The raw recipient type value and all the flags it includes.
- class extract_msg.SignedAttachment(msg, data: bytes, name: str, mimetype: str, node: Message)[source]
Bases:
object
- Parameters:
msg – The MSGFile instance this attachment is associated with.
data – The bytes that compose this attachment.
name – The reported name of the attachment.
mimetype – The reported mimetype of the attachment.
node – The email Message instance for this node.
- save(**kwargs) Tuple[SaveType, List[str] | str | None] [source]
Saves the attachment data.
The name of the file is determined by several factors. The first thing that is checked is if you have provided :param customFilename: to this function. If you have, that is the name that will be used. Otherwise, the name from
name
will be used. After the name to use has been determined, it will then be shortened to make sure that it is not more than the value of :param maxNameLength:.To change the directory that the attachment is saved to, set the value of :param customPath: when calling this function. The default save directory is the working directory.
If you want to save the contents into a
ZipFile
or similar object, either pass a path to where you want to create one or pass an instance to :param zip:. If :param zip: is an instance, :param customPath: will refer to a location inside the zip file.
- saveEmbededMessage(**kwargs) Tuple[SaveType, List[str] | str | None] [source]
Seperate function from save to allow it to easily be overridden by a subclass.
- property asBytes: bytes
- property dataType: Type[type] | None
The class that the data type will use, if it can be retrieved.
This is a safe way to do type checking on data before knowing if it will raise an exception. Returns
None
if no data will be returns or if an exception will be raised.
- property emailMessage: Message
The email Message instance that is the source for this attachment.
- property mimetype: str
The reported mimetype of the attachment.
- property msg: MSGFile
The
MSGFile
instance this attachment belongs to.- Raises:
ReferenceError – The associated
MSGFile
instance has been garbage collected.
- property name: str
The reported name of this attachment.
- property longFilename: str
The reported name of this attachment.
- property shortFilename: str
The reported name of this attachment.
- property treePath: List[weakref]
A path, as a tuple of instances, needed to get to this instance through the MSGFile-Attachment tree.
- property type: AttachmentType
- extract_msg.openMsg(path, **kwargs) MSGFile [source]
Function to automatically open an MSG file and detect what type it is.
Accepts all of the same arguments as the
__init__
method for the class it creates. Extra options will be ignored if the class doesn’t know what to do with them, but child instances may end up using them if they understand them. SeeMSGFile.__init__
for a list of all globally recognized options.- Parameters:
strict – If set to
True
, this function will raise an exception when it cannot identify whatMSGFile
derivitive to use. Otherwise, it will log the error and return a basicMSGFile
instance. Default isTrue
.- Raises:
UnsupportedMSGTypeError – The type is recognized but not suppoted.
UnrecognizedMSGTypeError – The type is not recognized.
- extract_msg.openMsgBulk(path, **kwargs) List[MSGFile] | Tuple[Exception, str | bytes] [source]
Takes the same arguments as openMsg, but opens a collection of MSG files based on a wild card. Returns a list if successful, otherwise returns a tuple.
- Parameters:
ignoreFailures – If this is
True
, will return a list of all successful files, ignoring any failures. Otherwise, will close all that successfully opened, and return a tuple of the exception and the path of the file that failed.