I want to automate desktop activities in Windows environment using Python. How it can be done? Some examples will also be helpful.
By desktop activities, I mean acti
There are different ways of automating user interfaces in Windows that can be accessed via Python (using ctypes
or some of the Python windows bindings):
Raw windows APIs -- Get/SetCursorPos
for the mouse, HWND APIs like GetFocus
and GetForegroundWindow
AutoIt
-- an automation scripting language: Calling AutoIt Functions in Python
Microsoft Active Accessibility (MSAA
) / WinEvent -- an API for interrogating a UI through the accessibility APIs in Win95.
UI/Automation (UIA
) -- a replacement for MSAA
introduced in Vista (available for XP SP3 IIRC).
Automating a user interface to test it is a non-trivial task. There are a lot of gotchas that can trip you up.
I would suggest testing your automation framework in an automated way so you can verify that it works on the platforms you are testing (to identify failures in the automation API
vs failures in the application).
Another consideration is how to deal with localization. Note also that the names for Minimize/Maximize/... are localized as well, and can be in a different language to the application (system vs. user locale)!
In pseudo-code, an MSAA
program to minimize an application would look something like:
window = AccessibleObjectFromWindow(FindWindow("My Window"))
titlebar = [x for x in window.AccessibleChildren if x.accRole == TitleBar]
minimize = [x for x in titlebar[0].AccessibleChildren if x.Name == "Minimize"]
if len(minimize) != 0: # may already be minimized
mimimize[0].accDoDefaultAction()
MSAA
accessible items are stored as (object: IAccessible, childId: int)
pairs. Care is needed here to get the calls correct (e.g. get_accChildCount
only uses the IAccessible
, so when childId
is not 0 you must return 0 instead of calling get_accChildCount
)!
IAccessible
calls can return different error codes to indicate "this object does not support this property"
-- e.g. DISP_E_MEMBERNOTFOUND
or E_NOTIMPL
.
Be aware of the state of the window. If the window is maximized then minimized, restore will restore the window to its maximized state, so you need to restore it again to get it back to the normal/windowed state.
The MSAA
and UIA
APIs don't support right mouse button clicks, so you need to use a Win32 API
to trigger it.
The MSAA
model does not support treeview heirarchy information -- it displays it as a flat list. On the other hand, UIA
will only enumerate elements that are visible so you will not be able to access elements in the UIA
tree that are collapsed.
You can use PyAutoGUI which provide a cross-platform Python way to perform GUI automation.
Here is a simple code to move the mouse to the middle of the screen:
import pyautogui
screenWidth, screenHeight = pyautogui.size()
pyautogui.moveTo(screenWidth / 2, screenHeight / 2)
Related question: Controlling mouse with Python.
Example:
pyautogui.typewrite('Hello world!') # prints out "Hello world!" instantly
pyautogui.typewrite('Hello world!', interval=0.25) # prints out "Hello world!" with a quarter second delay after each character
It provides JavaScript-style message boxes.
And other.
For other suggestions, check: Python GUI automation library for simulating user interaction in apps.
You can lock your PC(Win + L)
import ctypes
ctypes.windll.user32.LockWorkStation()
You can clear your recycle bin
import winshell
winshell.recycle_bin().empty(confirm=False, show_progress=False, sound=True)
You can try Automa.
It's a Windows GUI automation tool written in Python which is very simple to use. For example, you can do the following:
# to double click on an icon on the desktop
doubleclick("Recycle Bin")
# to maximize
click("Maximize")
# to input some text and press ENTER
write("Some text", into="Label of the text field")
press(ENTER)
The full list of available commands can be found here.
Disclaimer: I'm one of Automa's developers.
Have a look at SIKULI.
Sikuli is a visual technology to automate and test graphical user interfaces (GUI) using images (screenshots).
SIKULI uses a very clever combination of taking screenshots, and embedding them into your python (it's jython, actually) script.
Take screenshots:
and use them in your code: