The invention relates to detecting objects or visual features in a series of frames, e.g. video frames. A sensor component (e.g. camera) obtains a plurality of sensor frames captured over time. A detection component 502 is configured to detect objects or features within a sensor frame using a neural network (NN) (200). The neural network comprises a recurrent connection that feeds forward an indication of an object detected in a first sensor frame into one or more layers of the neural network for a second, later sensor frame 504. The output from the first frame could be fed into the input (202) of hidden (204-208) layers of the NN for the second frame. The output could provide an indication of the type of object (e.g. pedestrian, vehicle) and/or its location (302). Using information derived from a previous video frame in the analysis of the next frame can improve object detection over systems that simply treat each frame as an isolated image. The invention could be used in a control system 100 of an autonomous vehicle.